About the Author
Neeraj Sangal is president of Lattix Inc., a company which specializes in software architecture management solutions. Previously, he was president of Tendril Software that pioneered model-driven EJB development and synchronized UML models for Java. Prior to Tendril, Neeraj managed a distributed development organization at Hewlett Packard. Neeraj has spoken at conferences and published and presented papers on architecture. His most recent publication, a joint work on architecture management, was presented at OOPSLA 2005 and is the first to propose the explicit management of dependencies using Dependency Structure Matrices (DSMs).
Spotlight Features

The Rich Engineering Heritage Behind Dependency Injection

Andrew McVeigh takes us on a tour of the rich heritage behind dependency injection, what it represents, and tells us why its here to stay.

Java, the OLPC, and community responsibility

The "One Laptop Per Child" project has a great device ready to ship, but there's no Java on there. Let's think about working together to put Java on OLPC!

Expressing Software Architecture with Inter-module Dependencies


Introduction

Excessive inter-module dependencies have long been recognized as an indicator of poor software design. Highly coupled systems, in which modules have unnecessary dependencies, are hard to work with because modules cannot be understood easily in
isolation, and changes or extensions to functionality cannot be easily localized. Imagine how complex Eclipse plugin development would be if every plugin had a cross-dependency on every other plugin. Maintaining Eclipse would be a nightmare because of the risk that changing one portion could impact everything else. However, it isn't always clear which dependencies are necessary or even desirable and which ones are good candidates for elimination. An understanding of the architecture in terms of the dependencies between modules can go a long way towards dealing with this conundrum.

The biggest problem in terms of thinking about the system in terms of its dependencies has been the difficulty in getting a handle on the sheer number of dependencies that exist between modules. For instance, just the base Eclipse Platform contains nearly 20,000 classes and more than 1.3 million inter-class dependencies. The conventional approach for visualizing dependencies has been to draw directed graphs, more commonly known as "box and arrow" diagrams. The UML diagrams, for instance, are an example of box and arrow diagrams. For large systems, the box and arrow diagrams become impossible to make sense of or to manipulate.

Figure 1: UML Diagram showing inheritance relationships
Figure 1: UML Diagram showing inheritance relationships

Figure 1 shows a typical UML diagram. The figure reflects a tiny subset of a larger application. Classes are represented by boxes and the figure has been filtered to show just the inheritance relationships between these classes.

Here we present a powerful new approach for representing the architecture of large software systems. Instead of using directed graphs it makes use of a matrix representation known as the Dependency Structure Matrix (DSM).  DSM has its origins in systems engineering and has been used by many large companies to model complex processes and organizations.  Our approach is the first application of DSM for the specification of software architectures and the explicit management of inter-module dependencies.

Dependency Structure Matrix

Figure 2A shows a DSM for a system that consists of 4 subsystems labeled Modules A, B, C and D.  In the square matrix, the row and column number represent the same Module (for compactness, only the rows are labeled).   The cells in the grid show the strengths of the interdependencies between each Module. One simple way to compute it is to count the number of classes that each class depends on, this count is then aggregated up to determine the strength of the dependencies between subsystems (packages, jars, or any arbitrary collection of jars, packages, and classes).

The way to read a DSM is to read the dependencies down a column. For instance, column 1 shows that Module A (1) depends on Module C (3) with dependency strength of 7. Correspondingly, reading across row 1 tells us that Module A (1) provides to Module C (3) and Module D (4) with dependency strengths of 6 and 9 respectively.

Figure 2: A Simple DSM before and after Partitioning
Figure 2: A Simple DSM before and after Partitioning
Figure 2A.
Figure 2B.
Figure 2: A Simple DSM before and after Partitioning

Figure 2B shows the DSM after partitioning. Partitioning is a special operation that re-orders and re-groups modules.  The modules are ordered in such a way that those modules which "provide" to other modules are placed at the bottom of the DSM while modules which "depend" on other modules are placed at the top. If there were no dependency cycles, this would yield a lower triangular matrix, i.e. one without any dependencies above the diagonal. Partitioning also groups together those systems which have dependency cycles. In this case, Modules A and C depend on each other and therefore have been grouped together. This form of the matrix is called block triangular because it has been split up into three blocks in which there are no dependencies outside the blocks which are above the diagonal. Layered systems are naturally expressed as lower triangular matrices.

The grouping of modules can also be shown in different ways.  A new compound module can be formed by merging Modules A and C as shown in Figure 3A, after which the matrix becomes lower triangular.  Notice also that Module D now depends upon the new Module A-C with dependency strength of 17, which is an aggregation of Module D's dependency on both Module A and Module C. The purpose of partitioning is to express the design of an application in a layered fashion - more specifically to organize the code in such a way that a lower layer is used by the layers above it, but the lower layer does not use the layers above it. For instance, in Eclipse, the Tools Platform sits on top of the Rich Client Platform (RCP). Thus the Tools Platform uses RCP but RCP does not use the Tools Platform. The benefit is that if changes were made to the Tools Platform they would not affect any of the applications that are built on top of RCP.

Figure 3: The Re-grouped DSM and its Hierarchical Expansion
Figure 3: The Re-grouped DSM and its Hierarchical Expansion
Figure 3A.
Figure 3B.
Figure 3: The Re-grouped DSM and its Hierarchical Expansion

Furthermore, the identities of the basic modules can still be retained, by introducing a hierarchy, as in Figure 3B in which the grouping of A and C is shown by their indentation. The hierarchical decomposition shows that the system has been decomposed into three subsystems: Module D, Module A-C, and Module B. Module A-C is in turn decomposed into Module A and Module C.

We use the term "module" in its broadest sense. It could be a method, a class, a package, a jar or even a collection of jars. Therefore, even massive software can be represented in DSMs that appear to be deceptively small. There is another key benefit to hierarchy. Hierarchy enables succinct definition of Design Rules which are used to specify allowed and disallowed dependencies. Design rules can be used to specify architectural patterns such as layering, componentization, external library usage and other dependency patterns between subsystems. When a DSM is combined with Design Rules, we refer to it as a Lightweight Dependency Model.

Visualizing Architectural Patterns

The DSM representation is uniquely suited for representing certain architectural patterns. Layering is one such pattern. Figure 4A shows a layered system. The figure illustrates that the system consists of 5 subsystems: application, model, domain, framework and util. The DSM shows that the layer at the bottom, util does not depend on any of the other subsystem; framework depends on util; domain depends on framework and util; and so on. The lower triangular nature of the matrix makes it immediately apparent that this is a layered system.  Figure 4B shows a strictly layered system where each layer depends only on the preceding layer.

Finally, Figure 4C shows an imperfectly layered system. Since the DSM is not lower triangular even after partitioning, we know that there are cyclic dependencies. In this case the dependencies in column 5 indicate that util has dependencies on application and model. However, the imbalance between the strength of the dependencies suggests that this is an imperfectly layered system.

Figure 4: Architecture Patterns in a DSM
Figure 4A. Layered Pattern
Figure 4: Architecture Patterns in a DSM
Figure 4B.Strictly Layered Pattern
Figure 4: Architecture Patterns in a DSM
Figure 4C. Imperfectly Layered Pattern
Figure 4: Architecture Patterns in a DSM
Figure 4D. Component Pattern
Figure 4: Architecture Patterns in a DSM

Figure 4D shows private subsystems comp-1, comp-2 and comp-3 within subsystem domain. The DSM reveals that nothing in the system depends on these private subsystems. Furthermore, the DSM illustrates that these private subsystems do not depend on each other. This suggests that it is likely that they could be worked upon in parallel once the framework that they depend on is in place.

Design Rules: Enforcing Architectural Patterns

When design intent in the form of Design Rules is added to a DSM, the result is a Lightweight Dependency Model. The Dependency Model communicates not just what the actual dependencies are but also the allowed and disallowed dependencies. The matrix representation provides a succinct and intuitive visualization for Design Rules. Figure 5A shows a DSM with Design Rules expressed as triangles in the corners of the cells. The upper left triangle (colored green) represents an allowed dependency, while the lower left triangle (colored yellow) represents a disallowed dependency. A violation of a rule is represented with an upper right triangle (colored red).

Figure 5: DSM with Design Rules
Figure 5A.
Figure 5: DSM with Design Rules
Figure 5B. Layering Design Rules
Figure 5: DSM with Design Rules

If the DSM grid represents the design space, the Design Rules qualify that design space by specifying which parts of the design space are allowed to have dependencies and which are disallowed. In a system with 1000 classes, a fully expanded DSM grid has one million cells. Since each cell represents the possibility of a design rule, there are one million possible Design Rules in a system with 1000 classes. Fortunately, classes interact with each other in fairly regular ways. Layers are just one example of how classes within each layer interact with classes in other layers. For a five layer system, just five rules are needed to specify their interaction regardless of the number of classes within the system. Figure 5B shows the Design Rules for enforcing the layers in such a system.  Note that showing only the cannot-use rules tends to make the DSM more readable.

Software degrades from release to release because implicit Design Rules such as layering are violated. Lightweight Dependency Models offer the potential for maintaining the architecture over successive revisions of the life cycle by specifying rules that define the acceptable and unacceptable dependencies between subsystems. In cases, where architecture has evolved and Design Rules need to be changed, violations can actually make architectural evolution explicit for the entire development team.

A Dependency Structure Matrix of the Eclipse Platform

We analyzed the Eclipse platform (Version: 3.1.0, Build id: I20050627-1435) in terms of its dependencies. We used design dependencies which were defined as follows:

Class A depends on Class B if:

  1. Class A inherits from Class B (implements in the case of an interface)
  2. Class A calls a method or a constructor in Class B
  3. Class A refers to a data member in Class B
  4. Class A refers to Class B (e.g. as in an argument in a method)

We selected the jar files that represent the Eclipse platform. We then grouped the jar files together to represent architectural abstractions such as the Rich Client Platform (RCP), jdt, workbench, update etc. Overall we loaded 19,506 classes with 1,313,034 dependencies between them.

The modularity of the eclipse platform becomes apparent immediately. For instance, RCP has no dependencies on the tools-platform or on jdt. We also concluded that what is called the workbench appears to be not just one specific jar file but multiple jars. We also noticed that the tools-platform itself is quite modular with ui.ide, ui.win, ui.editors, debug and compare forming the core of the tools platform along with other tools which use the core. These other tools include search, externaltools, cheatsheets, browser, team, help and refactoring. On the other hand an analysis of the dependencies told us that the modeling framework depends on jdt. We also noticed that ui.workbench and ui.presentations are interdependent on each other as are text and jface.text. Finally, we were also able to observe that the Rich Client Platform does not use any external library outside of the java standard libraries while the tools-platform uses just one external library: org.apache.lucene. Amongst the tools, we noted that jdt does not depend on any external library while ant depends on org.apache.tools.ant, as might be expected.

A complete analysis of the Eclipse platform is beyond the scope of this article. We also point out, by way of caveat, that the abstractions that we have created by grouping jars and by examining dependencies can, in all likelihood, be improved significantly by those with greater expertise of Eclipse. What we have done is just a cursory high level analysis. Furthermore, each of the individual projects is in itself a large software system. Thus jdt or workbench would benefit from an analysis which goes much deeper and would examine the dependencies based not just on jars but also on packages and, perhaps even classes. This also illustrates the power of the DSM approach. A high level analysis can be conducted in parallel with an analysis of the subsystems, each of which can themselves be analyzed in parallel. This lends itself well to large projects which necessitate several teams working independently on their own subsystems.

We also note that the dependency matrix itself is quite sparse. This is the hallmark of good design and clearly not an accident. However, we have not yet attempted to reason about the dependencies. Obviously the lower triangular matrix represents the layered architecture of Eclipse. This is also enforced through the way jar files are created. However, what about the dependency cells below the diagonal? Are they missing because of deliberate design intent or is it just an accident of development? A detailed analysis of dependencies would lead to the creation of Design Rules which would make explicit the dependencies that are allowed and the ones that aren?t.

Figure 6: A DSM for the Eclipse Platform
Figure 6: A DSM for the Eclipse Platform

Lattix LDM: Lightweight Dependency Models within Eclipse

You can create a Lightweight Dependency Model for your software using Lattix LDM. When you install the Lattix LDM for Eclipse plugin, it will create a DSM for the Eclipse project. By default, it only includes classes which are being created by the current Eclipse project. However, you can add jars or even other Eclipse projects through Project Properties. Eclipse and LDM are a powerful combination.

Architectural Visibility for the Entire Team

Everybody benefits from an understanding of the big picture and how the various subsystems inter-relate, even if a developer is only working on a part of the system. As a developer you can view the code and the DSM together inside Eclipse. If you need to understand why some part of the code needs to understand about another part, then you can examine the code right away. The power to see both the DSM and the code at the same time gives you the full range of visibility: from the highest level to the lowest code level.

Figure 7: DSM within Eclipse
Figure 7: DSM within Eclipse

As an architect or a senior developer you can see a snapshot of the dependency model as soon as you load the latest project because the model is instantly updated. This allows the architect to see what the impact of all the current changes has been on the architecture. This affords visibility of changes to the big picture at the earliest possible time. Architecture evolution now becomes explicit.

Catch Architectural Violations

The model is now available to you while you are developing code. If you have specified the rules you will get instant feedback if those Design Rules are violated. This means that you fix architectural problems as soon as they are introduced. It is the easiest and cheapest time to fix problems such as these. Once a product is released, the cost of fixing it is far higher. Furthermore, as most developers know, it is a lot harder to justify refactoring the software architecture for the sake of improving the quality. Indeed, most future refactoring improvements have to be tied into product enhancements.

Figure 8: Rule Violations are identified instantly
Figure 8: Rule Violations are identified instantly

Figure 8 shows the normal Java perspective in Eclipse. Architectural violations can be seen in the Problems tab. Double clicking on the problem takes the user directly to the line in code where the Design Rule violation occurred.

Architectural Refactoring

Some of the most cost effective architectural refactoring suggested by the dependency model approach requires renaming of classes and packages. As you make changes to your architecture, Lattix LDM remembers those changes in a WorkList.

Figure 9: WorkList
Figure 9: WorkList

Semantically, many of these are low risk changes. However, the changes themselves can be tedious. If you move a class from one package to another you may have to add import statements or make changes in numerous other classes. Changing a package name affects every class within that package as well as every class that references those classes. Eclipse source refactoring lets you do this reliably in one step.

Conclusion

The analysis of inter-module dependencies is a powerful tool for understanding software architecture. This new approach offers distinct advantages over current methods:

Precise ? The matrix representation leverages the system hierarchy to aggregate dependencies and provides a precise big picture view. The model can be automatically synchronized to identify changes and architectural violations.

Highly Scalable: The power of the hierarchy and the compact matrix representation enables the LDM to scale from hundreds to tens of thousands of classes. The Lattix LDM approach has been successfully applied to many large commercial systems in a various industries, including financial services and telecommunications.

Easy to Adopt: Lattix LDM automatically extracts dependencies and builds the LDM within seconds, so it is easy to deploy at any time in the software lifecycle. Architectural patterns are easy to discover and enforce in the dependency structure matrix

This approach contains the promise that software architecture, even as it evolves, will remain visible to the entire team and that architectural erosion over time can now be avoided.

Resources

Learn more about the technology:

  1. Basics of the approach: http://www.lattix.com/technology/whatisdsm.htm
  2. OOPSLA '05 paper: http://sdg.lcs.mit.edu/pubs/2005/oopsla05-dsm.pdf
  3. Lattix white papers (registration required): http://www.lattix.com/about/downloadwhitepapers.htm
  4. How DSM gets used in Systems Engineering: http://www.dsmweb.org

Learn more about Lattix LDM:

  1. Flash Demo: http://www.lattix.com/gettingstarted/demo10.htm
  2. Free download to try on your own application (registration required): http://www.lattix.com/gettingstarted/gettingstarted.htm
  3. For a live internet demo or if you need assistance in trying this approach on your own project, please contact Lattix at info@lattix.com.

Lattix LDM for Eclipse will be demonstrated at eclispeCON 2006!