5 November 2015

Architecture Entropy

A.k.a. Why Enterprise IT is never simple

I have been thinking about this theory for a while now but it has taken me some time to put it into words. There is a lot more thinking to be done but I thought I would share the current state of my thinking now rather than continue to mull over it for another 2–3 years. In a nutshell, architecture entropy attempts to define the complexity state of an enterprise architecture.

In thermodynamics, entropy is used as a measure of the disorder in a closed system, the higher the entropy value the higher the disorder. This measurement of disorder can carry across to IT architecture. In thermodynamics, the entropy gain typically comes from an external force such as energy. In IT architecture the entropy gain comes from change.

Architecture entropy gain is the term I use to describe the slow design erosion away from a structured, governed and organised solution state towards a more disordered state as the architectural and structural integrity of the system are eroded.

The entropy gain typically comes from changes to the system. These changes, if not managed correctly, increase the architecture entropy level and therefore the level of disorder in the system. Disorder is bad in architecture as disorder drives cost.

All systems in a single organisation will eventually reach equilibrium at a similar level of entropy. Each organisation’s natural state of entropy will differ from organisation to organisation but it will always reflect the principles and attitudes of the overall organisation management.

The best way to manage entropy gain is to retain sound governance over the system and use that governance to measure and manage the complexity gains and consequences.

The ongoing governance and management of an IT estate is a complex problem and architecture entropy theory is attempting to put a name to that complexity in order to start the process of working out how to measure it. There is a lot more thinking required but I have started the process with this post.

What is “Architecture Entropy”?

The dictionary defines architecture as:

architecture
noun.
  1. The art and science of designing and erecting buildings.
  2. Buildings and other large structures
  3. A style and method of design and construction
  4. Orderly arrangement of parts; structure
  5. The overall design or structure of a computer system or microprocessor, including the hardware or software required to run it.
  6. Any of various disciplines concerned with the design or organization of complex systems

If we ignore the definitions in italics then we are left with a reasonable description of Enterprise Architecture.

The dictionary defines entropy as:

entropy 
noun. 
  1. Symbol S For a closed thermodynamic system, a quantitative measure of the amount of thermal energy not available to do work.
  2. A measure of the disorder or randomness in a closed system.
  3. A measure of the loss of information in a transmitted message.
  4. The tendency for all matter and energy in the universe to evolve toward a state of inert uniformity.
  5. Inevitable and steady deterioration of a system or society.


If we ignore the definitions in italics then we are left with a sense that entropy is the propensity for something to lean towards disorder rather than order.  Just like my desk at home!

Therefore, in dictionary terms, I define architecture entropy as:

architecture entropy
compound noun.
  1. A measure of the disorder in a computing system.
  2. The inevitable and steady deterioration of a computing system toward a state of disorder.

Architecture Entropy is a term used to describe the slow design erosion away from the structured, governed and organised towards a more disordered state. Regardless of how well designed a computer system is, it will be subjected to the laws of Architecture Entropy.

Typically, a well designed system will initially have a low entropy due to the structure and architecture of the solution.  However over time the system will be subjected to ‘entropy gain’ as the architectural and structural integrity of the system are eroded.

All systems in a single organisation will eventually reach equilibrium at a similar level of entropy. Each organisation’s natural state of entropy will differ from organisation to organisation but it will always reflect the principles and attitudes of the overall organisation management.

Architecture Entropy gain cannot be avoided but the levels of entropy gain can be minimised with appropriate governance and budgeting.

Example Architecture Entropy in Action

Consider this high level example based on real experiences, it is not based on one single enterprise but the concepts and outcomes are real.

The graphic below shows a snapshot of part of a complex enterprise estate.  It is not unusual to see many connections between many components.  This many-to-many connectivity leads to complexity and high cost to change.



Given this situation it is very common to consider a the creation of a new integration bus. In the graphic below an enterprise service bus component has been added to provide simplification of connectivity.

Components D and G & H have been decommissioned and the overall vision architecture, compared to the original, is structured, organised, tidy, clean.  And expensive.



The executives are sold on the vision and delivery starts.  However, during delivery it becomes hard to justify altering legacy systems that have been running for years without issue.  In addition, some connections are rationalised but others remain for operational reasons.

As with every delivery, there are short term pressures to deliver some benefit early so an interim ‘transition architecture’ is developed to provide earlier benefit. The transition architecture is complex but a later release will ‘tidy things up’.  Eventually connections that bypass the ESB are re-established because they are quicker and cheaper in the short term. The transition architecture ends up looking like the graphic below.



The outcome of all of this was:

  •  The plan was to give the business what was needed as soon as possible and then tidy up the IT in the next release.  The cost of later releases couldn’t be justified and so didn’t happen.
  • The additional IT complexity increased downstream costs and therefore “quicker” and “cheaper” alternatives to following the strategy were championed by the funding stakeholders.
  • The plan was based on rationalising and decommissioning legacy systems.  However it was discovered late on that there were many more dependencies on the legacy systems and so it was determined to be too costly to decommission all of the legacy systems.
  • The short term “tactical solution” that was only intended to be live for a few months is now many years old and requires a lot of effort to keep it running.

The result was that the enterprise estate remained complex and expensive.  Sound familiar?

Consequences of Entropy Gain

Entropy gain is directly linked to an increase in costs.  The higher the entropy gain, the higher the overall architecture entropy and the higher the architecture’s relative operational costs.

The graphic below shows the typical entropy gain causes.



At the end of the day the costs need to be balanced but there is a tension between the priorities indicated in the graphic below.  The enterprise dilemma is which one or two to focus on because it is impossible to have all three.



Low cost to operate:

  • Impact on change costs: Potential inflexibility due to the run costs being optimised around the ‘go live’ state of the system
  • Impact on build costs: Increased levels of automation that requires additional design, build and test effort

Low cost to change:

  • Impact on operate costs: An increase of overall system complexity to accommodate the flexibility features
  • Impact on build costs: Extra effort to design, build and (in particular) test the flexibility features

Low cost to build:

  • Impact on operate costs: Risk of overall system fragility if “low cost” means “corners were cut” or elements of the system were left to be performed manually
  • Impact on change costs: Possibility of functional duplication as it was cheaper to ‘copy and paste’ function than it was to share and reuse existing.  Therefore increases the cost to change

The end goal is to reach, what I call, architectural equilibrium where we reach a point where the architectural integrity of a system or enterprise is in balance with the costs.  Achievement of this goal is incredibly hard and arguably one of the holy grails of IT.  However, we should not give up trying our best to balance as best as we can.



What can we do about Architecture Entropy?

The level of “entropy gain” is variable.  Many factors determine the level of “entropy gain” of a system:

  • Strength of technical governance
  • Size of the general investment budget
  • Business’s attitude to the complexities of enterprise IT
  • Organisational preference to ‘tactical’ vs ‘strategic’
  • The ‘background level’ of complexity already inherent in the IT estate

An amount of gain on every project is inevitable due to pressures on time and budget.  In fact, a small amount of gain may be beneficial to allow a system to reach equilibrium by taking some overall cost out for very little impact.

The amount of gain and downstream impact can be minimised with appropriate governance and management. Ultimately it is the IT department’s relationship with the business stakeholders that determines the entropy levels.

I see three steps to keep entropy in check:

  1. Measure
  2. Manage
  3. Minimise

Measure
The simplest way to measure entropy gain is to focus on the downstream costs of a particular cost.  Don’t just focus the business case on the cost to implement; look also at a portfolio of common business change scenarios and the 5 year cost. Research the actual long term ‘lights on cost’ that the enterprise has accrued over time. 
In addition, when comparing solution options and when ‘tactical’ vs ‘strategic’ consider the average annual cost rather than the upfront cost when comparing options.

Manage
A few considerations of how to manage entropy gain:

  • Strengthen governance of system change to minimise the risk of short term changes causing long term costs.
  • Create a change checklist to ensure that solution designers are considering the full life cycle changes.
  • Keep focus on the cost case for the solution.
  • Tightly manage deviations and exceptions from the solution architecture as if the system was being created from new

Minimise
A few suggestions on how to minimise entropy gain:

  • Make sure that each solution release provides value to the business and is not ‘just’ IT benefit
  • Use establish facts based on history and current costs
  • Use ‘tactical solutions’ with caution
  • Have a strong exit plan to get off the tactical solution
  • Calculate the full lifecycle costs of the tactical solution
  • Overall though, be pragmatic!
  • Every solution has an equilibrium point where the balance between the architecture purity and the overall costs is met

Conclusion
Aiming for low entropy is a good thing.  To do this we need to create strong business and technical governance who look at the full lifecycle design and total cost of ownership considerations when making decisions.  There will always be exceptions and short term urgency so there needs to be a managed exception processes so that exceptions to the standards can be achieved with managed consequences.

Conversely slipping into a high entropy state is a bad thing.  The consequences are that the medium to long term operational cost increase and it becomes incrementally slower and more expensive to change systems.  When the entropy gain gets out of hand there is a real risk of fragility in the enterprise as systems get more and more unstable.  Finally, the higher the entropy gain, the more it costs to ‘keep the lights on’ in the data centre.

To summarise:

  • Architecture Entropy will always exist
  • Nothing can be done to prevent entropy gain
  • Awareness of the existence of Architecture Entropy should help to minimise entropy gain
  • Invest effort to measure the impacts of decisions, especially in the longer term
  • Use the measurements to manage better outcomes
  • Minimise short term behaviours that can negatively impact an enterprise’s Architecture Entropy



Most of this thinking is captured in the slide deck I have put onto SlideShare and embedded below.