A.k.a. Why Enterprise IT is never simple
I have been thinking about this theory for a while now but
it has taken me some time to put it into words. There is a lot more thinking to
be done but I thought I would share the current state of my thinking now rather
than continue to mull over it for another 2–3 years. In a nutshell,
architecture entropy attempts to define the complexity state of an enterprise
architecture.
In thermodynamics, entropy is used as a measure of the
disorder in a closed system, the higher the entropy value the higher the
disorder. This measurement of disorder can carry across to IT architecture. In
thermodynamics, the entropy gain typically comes from an external force such as
energy. In IT architecture the entropy gain comes from change.
Architecture entropy gain is the term I use to describe the
slow design erosion away from a structured, governed and organised solution
state towards a more disordered state as the architectural and structural
integrity of the system are eroded.
The entropy gain typically comes from changes to the system.
These changes, if not managed correctly, increase the architecture entropy
level and therefore the level of disorder in the system. Disorder is bad in
architecture as disorder drives cost.
All systems in a single organisation will eventually reach
equilibrium at a similar level of entropy. Each organisation’s natural state of
entropy will differ from organisation to organisation but it will always
reflect the principles and attitudes of the overall organisation management.
The best way to manage entropy gain is to retain sound
governance over the system and use that governance to measure and manage the
complexity gains and consequences.
The ongoing governance and management of an IT estate is a
complex problem and architecture entropy theory is attempting to put a name to
that complexity in order to start the process of working out how to measure it.
There is a lot more thinking required but I have started the process with this
post.
What is “Architecture
Entropy”?
The dictionary defines architecture as:
architecture
noun.
- The art and science of designing and erecting buildings.
- Buildings and other large structures
- A style and method of design and construction
- Orderly arrangement of parts; structure
- The overall design or structure of a computer system or microprocessor, including the hardware or software required to run it.
- Any of various disciplines concerned with the design or organization of complex systems
If we ignore the definitions in italics then we are left
with a reasonable description of Enterprise Architecture.
The dictionary defines entropy as:
entropy
noun.
- Symbol S For a closed thermodynamic system, a quantitative measure of the amount of thermal energy not available to do work.
- A measure of the disorder or randomness in a closed system.
- A measure of the loss of information in a transmitted message.
- The tendency for all matter and energy in the universe to evolve toward a state of inert uniformity.
- Inevitable and steady deterioration of a system or society.
If we ignore the definitions in italics then we are left
with a sense that entropy is the propensity for something to lean towards
disorder rather than order. Just like my
desk at home!
Therefore, in dictionary terms, I define architecture
entropy as:
architecture entropy
compound noun.
- A measure of the disorder in a computing system.
- The inevitable and steady deterioration of a computing system toward a state of disorder.
Architecture Entropy is a term used to describe the slow
design erosion away from the structured, governed and organised towards a more
disordered state. Regardless of how well designed a computer system is, it will
be subjected to the laws of Architecture Entropy.
Typically, a well designed
system will initially have a low entropy due to the structure and architecture
of the solution. However over time the
system will be subjected to ‘entropy gain’ as the architectural and structural
integrity of the system are eroded.
All systems in a single
organisation will eventually reach equilibrium at a similar level of entropy. Each
organisation’s natural state of entropy will differ from organisation to
organisation but it will always reflect the principles and attitudes of the
overall organisation management.
Architecture Entropy gain cannot be avoided but the levels
of entropy gain can be minimised with appropriate governance and budgeting.
Example Architecture
Entropy in Action
Consider this high level example based on real experiences,
it is not based on one single enterprise but the concepts and outcomes are real.
The graphic below shows a snapshot
of part of a complex enterprise estate.
It is not unusual to see many connections between many components. This many-to-many connectivity leads to
complexity and high cost to change.
Given this situation it is very common to consider a the
creation of a new integration bus. In the graphic below an enterprise service
bus component has been added to provide simplification of connectivity.
Components D and G & H
have been decommissioned and the overall vision architecture, compared to the
original, is structured, organised, tidy, clean. And expensive.
The executives are sold on the vision and delivery
starts. However, during delivery it
becomes hard to justify altering legacy systems that have been running for
years without issue. In addition, some
connections are rationalised but others remain for operational reasons.
As with every delivery, there
are short term pressures to deliver some benefit early so an interim
‘transition architecture’ is developed to provide earlier benefit. The
transition architecture is complex but a later release will ‘tidy things up’. Eventually connections that bypass the ESB
are re-established because they are quicker and cheaper in the short term. The
transition architecture ends up looking like the graphic below.
The outcome of all of this
was:
- The plan was to give the business what was needed as soon as possible and then tidy up the IT in the next release. The cost of later releases couldn’t be justified and so didn’t happen.
- The additional IT complexity increased downstream costs and therefore “quicker” and “cheaper” alternatives to following the strategy were championed by the funding stakeholders.
- The plan was based on rationalising and decommissioning legacy systems. However it was discovered late on that there were many more dependencies on the legacy systems and so it was determined to be too costly to decommission all of the legacy systems.
- The short term “tactical solution” that was only intended to be live for a few months is now many years old and requires a lot of effort to keep it running.
The result was that the
enterprise estate remained complex and expensive. Sound familiar?
Consequences of Entropy Gain
Entropy gain is directly linked to an increase in
costs. The higher the entropy gain, the
higher the overall architecture entropy and the higher the architecture’s
relative operational costs.
The graphic below shows the typical entropy gain causes.
At the end of the day the costs need to be balanced but
there is a tension between the priorities indicated in the graphic below. The enterprise dilemma is which one or two to
focus on because it is impossible to have all three.
Low cost to operate:
- Impact on change costs: Potential inflexibility due to the run costs being optimised around the ‘go live’ state of the system
- Impact on build costs: Increased levels of automation that requires additional design, build and test effort
Low cost to change:
- Impact on operate costs: An increase of overall system complexity to accommodate the flexibility features
- Impact on build costs: Extra effort to design, build and (in particular) test the flexibility features
Low cost to build:
- Impact on operate costs: Risk of overall system fragility if “low cost” means “corners were cut” or elements of the system were left to be performed manually
- Impact on change costs: Possibility of functional duplication as it was cheaper to ‘copy and paste’ function than it was to share and reuse existing. Therefore increases the cost to change
The end goal is to reach, what I call, architectural
equilibrium where we reach a point where the architectural integrity of a
system or enterprise is in balance with the costs. Achievement of this goal is incredibly hard
and arguably one of the holy grails of IT.
However, we should not give up trying our best to balance as best as we
can.
What can we do about
Architecture Entropy?
The level of “entropy gain” is variable. Many factors determine the level of “entropy
gain” of a system:
- Strength of technical governance
- Size of the general investment budget
- Business’s attitude to the complexities of enterprise IT
- Organisational preference to ‘tactical’ vs ‘strategic’
- The ‘background level’ of complexity already inherent in the IT estate
An amount of gain on every project is inevitable due to
pressures on time and budget. In fact, a
small amount of gain may be beneficial to allow a system to reach equilibrium
by taking some overall cost out for very little impact.
The amount of gain and downstream impact can be minimised
with appropriate governance and management. Ultimately it is the IT
department’s relationship with the business stakeholders that determines the
entropy levels.
I see three steps to keep entropy in check:
- Measure
- Manage
- Minimise
Measure
The simplest way to measure entropy gain is to focus on the
downstream costs of a particular cost. Don’t
just focus the business case on the cost to implement; look also at a portfolio
of common business change scenarios and the 5 year cost. Research the actual
long term ‘lights on cost’ that the enterprise has accrued over time.
In addition, when comparing solution options and when
‘tactical’ vs ‘strategic’ consider the average annual cost rather than the
upfront cost when comparing options.
Manage
A few considerations of how to manage entropy gain:
- Strengthen governance of system change to minimise the risk of short term changes causing long term costs.
- Create a change checklist to ensure that solution designers are considering the full life cycle changes.
- Keep focus on the cost case for the solution.
- Tightly manage deviations and exceptions from the solution architecture as if the system was being created from new
Minimise
A few suggestions on how to minimise entropy gain:
- Make sure that each solution release provides value to the business and is not ‘just’ IT benefit
- Use establish facts based on history and current costs
- Use ‘tactical solutions’ with caution
- Have a strong exit plan to get off the tactical solution
- Calculate the full lifecycle costs of the tactical solution
- Overall though, be pragmatic!
- Every solution has an equilibrium point where the balance between the architecture purity and the overall costs is met
Conclusion
Aiming for low entropy
is a good thing. To do this we need to
create strong business and technical governance who look at the full lifecycle design
and total cost of ownership considerations when making decisions. There will always be exceptions and short
term urgency so there needs to be a managed exception processes so that
exceptions to the standards can be achieved with managed consequences.
Conversely slipping
into a high entropy state is a bad thing.
The consequences are that the medium to long term operational cost
increase and it becomes incrementally slower and more expensive to change
systems. When the entropy gain gets out
of hand there is a real risk of fragility in the enterprise as systems get more
and more unstable. Finally, the higher
the entropy gain, the more it costs to ‘keep the lights on’ in the data centre.
To summarise:
- Architecture Entropy will always exist
- Nothing can be done to prevent entropy gain
- Awareness of the existence of Architecture Entropy should help to minimise entropy gain
- Invest effort to measure the impacts of decisions, especially in the longer term
- Use the measurements to manage better outcomes
- Minimise short term behaviours that can negatively impact an enterprise’s Architecture Entropy
Most of this thinking is captured in the slide deck I have
put onto SlideShare and embedded below.