Excessive inter-module dependencies have long been recognized as an
indicator of poor software design. Highly coupled systems, in which
modules have unnecessary dependencies, are hard to work with because
modules cannot be understood easily in
isolation, and changes or extensions to functionality cannot be easily
localized. Imagine how complex Eclipse plugin development would be if
every plugin had a cross-dependency on every other plugin. Maintaining
Eclipse would be a nightmare because of the risk that changing one
portion could impact everything else. However, it isn't always clear
which dependencies are necessary or even desirable and which ones are
good candidates for elimination. An understanding of the architecture
in terms of the dependencies between modules can go a long way towards
dealing with this conundrum.
The biggest problem in terms of thinking about the system in terms
of its dependencies has been the difficulty in getting a handle on the
sheer number of dependencies that exist between modules. For instance,
just the base Eclipse Platform contains nearly 20,000 classes and more
than 1.3 million inter-class dependencies. The conventional approach
for visualizing dependencies has been to draw directed graphs, more
commonly known as "box and arrow" diagrams. The UML diagrams, for
instance, are an example of box and arrow diagrams. For large systems,
the box and arrow diagrams become impossible to make sense of or to
manipulate.
| Figure 1: UML Diagram showing inheritance
relationships |
Figure 1 shows a typical UML diagram. The figure reflects a tiny
subset of a larger application. Classes are represented by boxes and
the figure has been filtered to show just the inheritance relationships
between these classes.
Here we present a powerful new approach for representing the
architecture of large software systems. Instead of using directed
graphs it makes use of a matrix representation known as the Dependency
Structure Matrix (DSM). DSM has its origins in systems
engineering and has been used by many large companies to model complex
processes and organizations. Our approach is the first
application of DSM for the specification of software architectures and
the explicit management of inter-module dependencies.
Figure 2A shows a DSM for a system that consists of 4 subsystems
labeled Modules A, B, C and D. In the square matrix, the row and
column number represent the same Module (for compactness, only the rows
are labeled). The cells in the grid show the strengths of
the interdependencies between each Module. One simple way to compute it
is to count the number of classes that each class depends on, this
count is then aggregated up to determine the strength of the
dependencies between subsystems (packages, jars, or any arbitrary
collection of jars, packages, and classes).
The way to read a DSM is to read the dependencies down a column. For
instance, column 1 shows that Module A (1) depends on Module C (3) with
dependency strength of 7. Correspondingly, reading across row 1 tells
us that Module A (1) provides to Module C (3) and Module D (4) with
dependency strengths of 6 and 9 respectively.
![]() |
![]() |
| Figure 2A. |
Figure 2B. |
| Figure 2:
A Simple DSM before and after
Partitioning |
|
Figure 2B shows the DSM after partitioning. Partitioning is a
special operation that re-orders and re-groups modules. The
modules are ordered in such a way that those modules which "provide" to
other modules are placed at the bottom of the DSM while modules which
"depend" on other modules are placed at the top. If there were no
dependency cycles, this would yield a lower triangular matrix, i.e. one
without any dependencies above the diagonal. Partitioning also groups
together those systems which have dependency cycles. In this case,
Modules A and C depend on each other and therefore have been grouped
together. This form of the matrix is called block triangular because it
has been split up into three blocks in which there are no dependencies
outside the blocks which are above the diagonal. Layered systems are
naturally expressed as lower triangular matrices.
The grouping of modules can also be shown in different ways. A
new compound module can be formed by merging Modules A and C as shown
in Figure 3A, after which the matrix becomes lower triangular.
Notice also that Module D now depends upon the new Module A-C with
dependency strength of 17, which is an aggregation of Module D's
dependency on both Module A and Module C. The purpose of partitioning
is to express the design of an application in a layered fashion - more
specifically to organize the code in such a way that a lower layer is
used by the layers above it, but the lower layer does not use the
layers above it. For instance, in Eclipse, the Tools Platform sits on
top of the Rich Client Platform (RCP). Thus the Tools Platform uses RCP
but RCP does not use the Tools Platform. The benefit is that if changes
were made to the Tools Platform they would not affect any of the
applications that are built on top of RCP.
![]() |
![]() |
| Figure 3A. |
Figure 3B. |
| Figure 3:
The Re-grouped DSM and its
Hierarchical Expansion |
|
Furthermore, the identities of the basic modules can still be
retained, by introducing a hierarchy, as in Figure 3B in which the
grouping of A and C is shown by their indentation. The hierarchical
decomposition shows that the system has been decomposed into three
subsystems: Module D, Module A-C, and Module B. Module A-C is in turn
decomposed into Module A and Module C.
We use the term "module" in its broadest sense. It could be a
method, a class, a package, a jar or even a collection of jars.
Therefore, even massive software can be represented in DSMs that appear
to be deceptively small. There is another key benefit to hierarchy.
Hierarchy enables succinct definition of Design Rules which are used to
specify allowed and disallowed dependencies. Design rules can be used
to specify architectural patterns such as layering, componentization,
external library usage and other dependency patterns between
subsystems. When a DSM is combined with Design Rules, we refer to it as
a Lightweight Dependency Model.
The DSM representation is uniquely suited for representing certain
architectural patterns. Layering is one such pattern. Figure 4A shows a
layered system. The figure illustrates that the system consists of 5
subsystems: application, model, domain, framework and util. The DSM
shows that the layer at the bottom, util does not depend on any of the
other subsystem; framework depends on util; domain depends on framework
and util; and so on. The lower triangular nature of the matrix makes it
immediately apparent that this is a layered system. Figure 4B
shows a strictly layered system where each layer depends only on the
preceding layer.
Finally, Figure 4C shows an imperfectly layered system. Since the
DSM is not lower triangular even after partitioning, we know that there
are cyclic dependencies. In this case the dependencies in column 5
indicate that util has dependencies on application and model. However,
the imbalance between the strength of the dependencies suggests that
this is an imperfectly layered system.
| Figure 4A. Layered Pattern |
| Figure 4B.Strictly Layered Pattern |
| Figure 4C. Imperfectly Layered Pattern |
| Figure 4D. Component Pattern |
| Figure 4:
Architecture Patterns in a DSM |
Figure 4D shows private subsystems comp-1, comp-2 and comp-3 within
subsystem domain. The DSM reveals that nothing in the system depends on
these private subsystems. Furthermore, the DSM illustrates that these
private subsystems do not depend on each other. This suggests that it
is likely that they could be worked upon in parallel once the framework
that they depend on is in place.
When design intent in the form of Design Rules is added to a DSM,
the result is a Lightweight Dependency Model. The Dependency Model
communicates not just what the actual dependencies are but also the
allowed and disallowed dependencies. The matrix representation provides
a succinct and intuitive visualization for Design Rules. Figure 5A
shows a DSM with Design Rules expressed as triangles in the corners of
the cells. The upper left triangle (colored green) represents an
allowed dependency, while the lower left triangle (colored yellow)
represents a disallowed dependency. A violation of a rule is
represented with an upper right triangle (colored red).
| Figure 5A. |
| Figure 5B. Layering Design Rules |
| Figure 5:
DSM with Design Rules |
If the DSM grid represents the design space, the Design Rules
qualify that design space by specifying which parts of the design space
are allowed to have dependencies and which are disallowed. In a system
with 1000 classes, a fully expanded DSM grid has one million cells.
Since each cell represents the possibility of a design rule, there are
one million possible Design Rules in a system with 1000 classes.
Fortunately, classes interact with each other in fairly regular ways.
Layers are just one example of how classes within each layer interact
with classes in other layers. For a five layer system, just five rules
are needed to specify their interaction regardless of the number of
classes within the system. Figure 5B shows the Design Rules for
enforcing the layers in such a system. Note that showing only the
cannot-use rules tends to make the DSM more readable.
Software degrades from release to release because implicit Design
Rules such as layering are violated. Lightweight Dependency Models
offer the potential for maintaining the architecture over successive
revisions of the life cycle by specifying rules that define the
acceptable and unacceptable dependencies between subsystems. In cases,
where architecture has evolved and Design Rules need to be changed,
violations can actually make architectural evolution explicit for the
entire development team.
We analyzed the Eclipse platform (Version: 3.1.0, Build id:
I20050627-1435) in terms of its dependencies. We used design
dependencies which were defined as follows:
Class A depends on Class B if:
We selected the jar files that represent the Eclipse platform. We
then grouped the jar files together to represent architectural
abstractions such as the Rich Client Platform (RCP), jdt, workbench,
update etc. Overall we loaded 19,506 classes with 1,313,034
dependencies between them.
The modularity of the eclipse platform becomes apparent immediately.
For instance, RCP has no dependencies on the tools-platform or on jdt.
We also concluded that what is called the workbench appears to be not
just one specific jar file but multiple jars. We also noticed that the
tools-platform itself is quite modular with ui.ide, ui.win, ui.editors,
debug and compare forming the core of the tools platform along with
other tools which use the core. These other tools include search,
externaltools, cheatsheets, browser, team, help and refactoring. On the
other hand an analysis of the dependencies told us that the modeling
framework depends on jdt. We also noticed that ui.workbench and
ui.presentations are interdependent on each other as are text and
jface.text. Finally, we were also able to observe that the Rich Client
Platform does not use any external library outside of the java standard
libraries while the tools-platform uses just one external library:
org.apache.lucene. Amongst the tools, we noted that jdt does not depend
on any external library while ant depends on org.apache.tools.ant, as
might be expected.
A complete analysis of the Eclipse platform is beyond the scope of
this article. We also point out, by way of caveat, that the
abstractions that we have created by grouping jars and by examining
dependencies can, in all likelihood, be improved significantly by those
with greater expertise of Eclipse. What we have done is just a cursory
high level analysis. Furthermore, each of the individual projects is in
itself a large software system. Thus jdt or workbench would benefit
from an analysis which goes much deeper and would examine the
dependencies based not just on jars but also on packages and, perhaps
even classes. This also illustrates the power of the DSM approach. A
high level analysis can be conducted in parallel with an analysis of
the subsystems, each of which can themselves be analyzed in parallel.
This lends itself well to large projects which necessitate several
teams working independently on their own subsystems.
We also note that the dependency matrix itself is quite sparse. This
is the hallmark of good design and clearly not an accident. However, we
have not yet attempted to reason about the dependencies. Obviously the
lower triangular matrix represents the layered architecture of Eclipse.
This is also enforced through the way jar files are created. However,
what about the dependency cells below the diagonal? Are they missing
because of deliberate design intent or is it just an accident of
development? A detailed analysis of dependencies would lead to the
creation of Design Rules which would make explicit the dependencies
that are allowed and the ones that aren?t.
| Figure 6: A DSM for the Eclipse Platform |
You can create a Lightweight Dependency Model for your software
using Lattix LDM. When you install the Lattix LDM for Eclipse plugin,
it will create a DSM for the Eclipse project. By default, it only
includes classes which are being created by the current Eclipse
project. However, you can add jars or even other Eclipse projects
through Project Properties. Eclipse and LDM are a powerful combination.
Everybody benefits from an understanding of the big picture and how
the various subsystems inter-relate, even if a developer is only
working on a part of the system. As a developer you can view the code
and the DSM together inside Eclipse. If you need to understand why some
part of the code needs to understand about another part, then you can
examine the code right away. The power to see both the DSM and the code
at the same time gives you the full range of visibility: from the
highest level to the lowest code level.
![]() |
| Figure 7: DSM within Eclipse |
As an architect or a senior developer you can see a snapshot of the
dependency model as soon as you load the latest project because the
model is instantly updated. This allows the architect to see what the
impact of all the current changes has been on the architecture. This
affords visibility of changes to the big picture at the earliest
possible time. Architecture evolution now becomes explicit.
The model is now available to you while you are developing code. If
you have specified the rules you will get instant feedback if those
Design Rules are violated. This means that you fix architectural
problems as soon as they are introduced. It is the easiest and cheapest
time to fix problems such as these. Once a product is released, the
cost of fixing it is far higher. Furthermore, as most developers know,
it is a lot harder to justify refactoring the software architecture for
the sake of improving the quality. Indeed, most future refactoring
improvements have to be tied into product enhancements.
| Figure 8: Rule Violations are identified
instantly |
Figure 8 shows the normal Java perspective in Eclipse. Architectural
violations can be seen in the Problems tab. Double clicking on the
problem takes the user directly to the line in code where the Design
Rule violation occurred.
Some of the most cost effective architectural refactoring suggested
by the dependency model approach requires renaming of classes and
packages. As you make changes to your architecture, Lattix LDM
remembers those changes in a WorkList.
| Figure 9: WorkList |
Semantically, many of these are low risk changes. However, the
changes themselves can be tedious. If you move a class from one package
to another you may have to add import statements or make changes in
numerous other classes. Changing a package name affects every class
within that package as well as every class that references those
classes. Eclipse source refactoring lets you do this reliably in one
step.
The analysis of inter-module dependencies is a powerful tool for
understanding software architecture. This new approach offers distinct
advantages over current methods:
Precise ? The matrix representation
leverages the system hierarchy to aggregate dependencies and provides a
precise big picture view. The model can be automatically synchronized
to identify changes and architectural violations.
Highly Scalable: The power of the hierarchy and the compact matrix
representation enables the LDM to scale from hundreds to tens of
thousands of classes. The Lattix LDM approach has been successfully
applied to many large commercial systems in a various industries,
including financial services and telecommunications.
Easy to Adopt: Lattix LDM automatically extracts dependencies and
builds the LDM within seconds, so it is easy to deploy at any time in
the software lifecycle. Architectural patterns are easy to discover and
enforce in the dependency structure matrix
This approach contains the promise that software architecture, even
as it evolves, will remain visible to the entire team and that
architectural erosion over time can now be avoided.
Learn more about the technology:
Learn more about Lattix LDM:
Lattix LDM for Eclipse will be demonstrated at eclispeCON 2006!