5 November 2015

Architecture Entropy

A.k.a. Why Enterprise IT is never simple

I have been thinking about this theory for a while now but it has taken me some time to put it into words. There is a lot more thinking to be done but I thought I would share the current state of my thinking now rather than continue to mull over it for another 2–3 years. In a nutshell, architecture entropy attempts to define the complexity state of an enterprise architecture.

In thermodynamics, entropy is used as a measure of the disorder in a closed system, the higher the entropy value the higher the disorder. This measurement of disorder can carry across to IT architecture. In thermodynamics, the entropy gain typically comes from an external force such as energy. In IT architecture the entropy gain comes from change.

Architecture entropy gain is the term I use to describe the slow design erosion away from a structured, governed and organised solution state towards a more disordered state as the architectural and structural integrity of the system are eroded.

The entropy gain typically comes from changes to the system. These changes, if not managed correctly, increase the architecture entropy level and therefore the level of disorder in the system. Disorder is bad in architecture as disorder drives cost.

All systems in a single organisation will eventually reach equilibrium at a similar level of entropy. Each organisation’s natural state of entropy will differ from organisation to organisation but it will always reflect the principles and attitudes of the overall organisation management.

The best way to manage entropy gain is to retain sound governance over the system and use that governance to measure and manage the complexity gains and consequences.

The ongoing governance and management of an IT estate is a complex problem and architecture entropy theory is attempting to put a name to that complexity in order to start the process of working out how to measure it. There is a lot more thinking required but I have started the process with this post.

What is “Architecture Entropy”?

The dictionary defines architecture as:

  1. The art and science of designing and erecting buildings.
  2. Buildings and other large structures
  3. A style and method of design and construction
  4. Orderly arrangement of parts; structure
  5. The overall design or structure of a computer system or microprocessor, including the hardware or software required to run it.
  6. Any of various disciplines concerned with the design or organization of complex systems

If we ignore the definitions in italics then we are left with a reasonable description of Enterprise Architecture.

The dictionary defines entropy as:

  1. Symbol S For a closed thermodynamic system, a quantitative measure of the amount of thermal energy not available to do work.
  2. A measure of the disorder or randomness in a closed system.
  3. A measure of the loss of information in a transmitted message.
  4. The tendency for all matter and energy in the universe to evolve toward a state of inert uniformity.
  5. Inevitable and steady deterioration of a system or society.

If we ignore the definitions in italics then we are left with a sense that entropy is the propensity for something to lean towards disorder rather than order.  Just like my desk at home!

Therefore, in dictionary terms, I define architecture entropy as:

architecture entropy
compound noun.
  1. A measure of the disorder in a computing system.
  2. The inevitable and steady deterioration of a computing system toward a state of disorder.

Architecture Entropy is a term used to describe the slow design erosion away from the structured, governed and organised towards a more disordered state. Regardless of how well designed a computer system is, it will be subjected to the laws of Architecture Entropy.

Typically, a well designed system will initially have a low entropy due to the structure and architecture of the solution.  However over time the system will be subjected to ‘entropy gain’ as the architectural and structural integrity of the system are eroded.

All systems in a single organisation will eventually reach equilibrium at a similar level of entropy. Each organisation’s natural state of entropy will differ from organisation to organisation but it will always reflect the principles and attitudes of the overall organisation management.

Architecture Entropy gain cannot be avoided but the levels of entropy gain can be minimised with appropriate governance and budgeting.

Example Architecture Entropy in Action

Consider this high level example based on real experiences, it is not based on one single enterprise but the concepts and outcomes are real.

The graphic below shows a snapshot of part of a complex enterprise estate.  It is not unusual to see many connections between many components.  This many-to-many connectivity leads to complexity and high cost to change.

Given this situation it is very common to consider a the creation of a new integration bus. In the graphic below an enterprise service bus component has been added to provide simplification of connectivity.

Components D and G & H have been decommissioned and the overall vision architecture, compared to the original, is structured, organised, tidy, clean.  And expensive.

The executives are sold on the vision and delivery starts.  However, during delivery it becomes hard to justify altering legacy systems that have been running for years without issue.  In addition, some connections are rationalised but others remain for operational reasons.

As with every delivery, there are short term pressures to deliver some benefit early so an interim ‘transition architecture’ is developed to provide earlier benefit. The transition architecture is complex but a later release will ‘tidy things up’.  Eventually connections that bypass the ESB are re-established because they are quicker and cheaper in the short term. The transition architecture ends up looking like the graphic below.

The outcome of all of this was:

  •  The plan was to give the business what was needed as soon as possible and then tidy up the IT in the next release.  The cost of later releases couldn’t be justified and so didn’t happen.
  • The additional IT complexity increased downstream costs and therefore “quicker” and “cheaper” alternatives to following the strategy were championed by the funding stakeholders.
  • The plan was based on rationalising and decommissioning legacy systems.  However it was discovered late on that there were many more dependencies on the legacy systems and so it was determined to be too costly to decommission all of the legacy systems.
  • The short term “tactical solution” that was only intended to be live for a few months is now many years old and requires a lot of effort to keep it running.

The result was that the enterprise estate remained complex and expensive.  Sound familiar?

Consequences of Entropy Gain

Entropy gain is directly linked to an increase in costs.  The higher the entropy gain, the higher the overall architecture entropy and the higher the architecture’s relative operational costs.

The graphic below shows the typical entropy gain causes.

At the end of the day the costs need to be balanced but there is a tension between the priorities indicated in the graphic below.  The enterprise dilemma is which one or two to focus on because it is impossible to have all three.

Low cost to operate:

  • Impact on change costs: Potential inflexibility due to the run costs being optimised around the ‘go live’ state of the system
  • Impact on build costs: Increased levels of automation that requires additional design, build and test effort

Low cost to change:

  • Impact on operate costs: An increase of overall system complexity to accommodate the flexibility features
  • Impact on build costs: Extra effort to design, build and (in particular) test the flexibility features

Low cost to build:

  • Impact on operate costs: Risk of overall system fragility if “low cost” means “corners were cut” or elements of the system were left to be performed manually
  • Impact on change costs: Possibility of functional duplication as it was cheaper to ‘copy and paste’ function than it was to share and reuse existing.  Therefore increases the cost to change

The end goal is to reach, what I call, architectural equilibrium where we reach a point where the architectural integrity of a system or enterprise is in balance with the costs.  Achievement of this goal is incredibly hard and arguably one of the holy grails of IT.  However, we should not give up trying our best to balance as best as we can.

What can we do about Architecture Entropy?

The level of “entropy gain” is variable.  Many factors determine the level of “entropy gain” of a system:

  • Strength of technical governance
  • Size of the general investment budget
  • Business’s attitude to the complexities of enterprise IT
  • Organisational preference to ‘tactical’ vs ‘strategic’
  • The ‘background level’ of complexity already inherent in the IT estate

An amount of gain on every project is inevitable due to pressures on time and budget.  In fact, a small amount of gain may be beneficial to allow a system to reach equilibrium by taking some overall cost out for very little impact.

The amount of gain and downstream impact can be minimised with appropriate governance and management. Ultimately it is the IT department’s relationship with the business stakeholders that determines the entropy levels.

I see three steps to keep entropy in check:

  1. Measure
  2. Manage
  3. Minimise

The simplest way to measure entropy gain is to focus on the downstream costs of a particular cost.  Don’t just focus the business case on the cost to implement; look also at a portfolio of common business change scenarios and the 5 year cost. Research the actual long term ‘lights on cost’ that the enterprise has accrued over time. 
In addition, when comparing solution options and when ‘tactical’ vs ‘strategic’ consider the average annual cost rather than the upfront cost when comparing options.

A few considerations of how to manage entropy gain:

  • Strengthen governance of system change to minimise the risk of short term changes causing long term costs.
  • Create a change checklist to ensure that solution designers are considering the full life cycle changes.
  • Keep focus on the cost case for the solution.
  • Tightly manage deviations and exceptions from the solution architecture as if the system was being created from new

A few suggestions on how to minimise entropy gain:

  • Make sure that each solution release provides value to the business and is not ‘just’ IT benefit
  • Use establish facts based on history and current costs
  • Use ‘tactical solutions’ with caution
  • Have a strong exit plan to get off the tactical solution
  • Calculate the full lifecycle costs of the tactical solution
  • Overall though, be pragmatic!
  • Every solution has an equilibrium point where the balance between the architecture purity and the overall costs is met

Aiming for low entropy is a good thing.  To do this we need to create strong business and technical governance who look at the full lifecycle design and total cost of ownership considerations when making decisions.  There will always be exceptions and short term urgency so there needs to be a managed exception processes so that exceptions to the standards can be achieved with managed consequences.

Conversely slipping into a high entropy state is a bad thing.  The consequences are that the medium to long term operational cost increase and it becomes incrementally slower and more expensive to change systems.  When the entropy gain gets out of hand there is a real risk of fragility in the enterprise as systems get more and more unstable.  Finally, the higher the entropy gain, the more it costs to ‘keep the lights on’ in the data centre.

To summarise:

  • Architecture Entropy will always exist
  • Nothing can be done to prevent entropy gain
  • Awareness of the existence of Architecture Entropy should help to minimise entropy gain
  • Invest effort to measure the impacts of decisions, especially in the longer term
  • Use the measurements to manage better outcomes
  • Minimise short term behaviours that can negatively impact an enterprise’s Architecture Entropy

Most of this thinking is captured in the slide deck I have put onto SlideShare and embedded below.

27 October 2015

Enterprise Applications are not yet ‘Born on the Cloud’

The term ‘Born on the cloud’ means that an application or a package was designed and built from the ground up to take advantage of a cloud platform. For instance, those applications that can make use of cloud infrastructure such as auto-scaling, auto node provisioning, automated instance restarts, etc.

The current elephant in the room is that born on the cloud currently implies “build not buy” as there are very few packaged applications that are truly born on the cloud.  "Build not buy" is the direct opposite of many organisation’s IT strategies. This is a problem when you consider the types of non-trivial packaged vertical applications that I am talking about: CRM, order management, billing, finance, HR, etc.  There are also the higher order middleware platforms to consider: ESB, BPM, MDM, etc. The cloud argument might be that an enterprise should be using SaaS versions of those software products.  However not all enterprises are ready, willing or able to move to SaaS.

Why are enterprise applications not born on the cloud?  One reason is because they almost all rely on some traditional middleware such as relational databases and application servers to underpin their capability.  It is this reliance on the traditional underpinnings that reduces an application’s ability to take full advantage of the cloud platform.

In time, middleware will transform to be more cloud aware.  For instance, packaged applications may move away from traditional databases and Java application servers and move more towards standardised platforms as a service such as Bluemix, Cloud Foundry and Azure. This may then allow the applications to be designed to take advantage of the cloud based infrastructure they run on.

Deploying an enterprise application to cloud is certainly feasible today and many organisations do just that.  It is just that the application is not able to take full advantage of the cloud infrastructure.  Rather it is a ‘normal’ deployment with ‘normal’ change management regimes.  Therefore what is the advantage of cloud?

There are advantages. There are the rapid provisioning and pay for for what you use type benefits.  These cloudy features can really really help during development and test phases of a delivery project.  Outsourcing IT infrastructure operations to a cloud provider can also reduce in house run costs and provide operational cost savings.

One other wild card to consider is the impact of container technology such as Docker. If enterprise applications are containerised with all of their dependents included in the deployment package then it may allow the package to take better advantage of the cloud infrastructure. Containerisation of an enterprise application is potentially a large undertaking and so it may be years before we see containers making a difference in this area.

Maybe we will see a new generation of true born on the cloud vertical packages that are built from scratch to run on a cloud platform.  Just as enterprise applications are built to support multiple middleware options from the likes of IBM, Oracle and Microsoft, maybe the enterprise applications in the future will be built to support multiple cloud platform options to allow organisations the choice to deploy the applications either on an internal cloud and an external one.

To summarise, enterprise packaged applications are currently limited in their ability to take advantage of a cloud platform.  When either the middleware is transformed or the package is redesigned around cloud and/or containers then we will see the difference at the enterprise level.  I predict that this transformation will take a number of years.

28 August 2015

Cloud integration is less flexible than bespoke complex systems integration

When you write it like that it kind of makes sense.  Cloud offerings tend to be fixed in a certain pattern or deployment, typically this is to achieve a level of commonality and consistency.  This consistency is where some of the cost benefits of cloud come from.

However, in the absence of mature and widely supported interoperability standards then joining two or more cloud infrastructures together is actually quite hard.  Each provider has their way of doing things and their own supported standards for non-functional areas such as service level agreements, user management, network connection security and service management reporting.

For instance, trying to obtain a single service management dashboard view of a system that is split between two cloud providers is quite hard. I found that it involves bespoke design and some negotiation to work out how to get system and security events out of the cloud and into a system’s central service management incident desk tooling.

Some cloud providers don’t do infrastructure events and only offer service management events at the application level.  This opacity means there is a level of trust (and SLA management) that underlying infrastructure events will be managed before they impact the application.  That platform management as a service is what you pay for of course.  However you do risk losing some of the early warning indicators that might impact an application’s availability for it actually impacts the application.

On reflection, it sometimes feels like that cloud integration today is similar to how packaged application integration was like 15 years ago.  Back then, each package had its own way of doing things and a fairly restricted set of APIs to allow access to non-functional capabilities.

The good news is that I think cloud integration will catch up quite quickly as the general integration challenges have been solved in the recent past and the SAML, SNMP, VPN, LDAP, etc standards and patterns all exist.  We just need to wait for the dust to settle as different vendors agree on the same standards and patterns between them.

26 June 2015

Data Residency Challenges

I recently ran into an issue with a data residency ruling and thought I would share my thoughts here.  I am working with a UK client for who we are building a brand new enterprise application.  The application is hosted on a combination of IBM's Bluemix and SoftLayer platforms.

We are using Bluemix Dedicated which allows us to choose a SoftLayer location for the system and associated data.  Our requirements were that client data must reside in the EU and nobody outside of the EU is to have access to the data.  Pretty standard.  Therefore we chose London as our prime data centre and Paris as our disaster recovery site.

During the latter stages of the solution design work the client came to us to ask if we could keep a small subset of the data resident in the UK - i.e. don't replicate it to Paris.

This presented us with an issue:

  • Bluemix is a cloud solution and only works in SoftLayer cloud data centres
  • There is only one UK SoftLayer data centre
  • Our solution was built around Bluemix and so moving away from Bluemix would be a very big deal

We considered a number of different scenarios, including approaching SoftLayer to see if they would build us a new data centre!  In the end, the simplest solution is what we went for.

Bluemix uses an IBM database technology called Cloudant to store data.  Cloudant is based on Apache CouchDB and retains its API and interfaces.  After some Googling I discovered a blog about Cloudant replication that explained that it is possible to replicate from Cloudant to any database that supports the API; CouchDB included.

As the data to retain in the UK was a very very small subset of the total data it became viable for us to standup what I call "Bluemix-on-a-box" standalone servers in one of our other UK data centres (the solution had additional non-cloud DCs for payment processing) that run WebSphere Liberty and CouchDB.  It is then a case of changing the replication of the instances of Cloudant that needed to be replicated to the UK.

Here is a pic:
Data Residency Solution Overview

29 May 2015

Public Sector Cloud

Implementing cloud solutions in a Public Sector environment is tricky.  There are challenges with:
  • Security
  • Data residency
  • Data protection
  • Law
Yet Governments continue to see cloud technology as a way forward to simplify their IT estates and reduce costs.

In order to achieve a viable solution, let alone one that meets the stated benefit objectives we need some sort of reference architecture.

I have had a go at creating one that I have put onto Slideshare and embedded here:

1 May 2015

Mobile in Government

I was asked last week about my point of view on mobile in Government.  Handily enough I was able to point them at a white paper I co-wrote a few years ago.  It is a couple of years old but I think still relevant.

The paper talks about the different use cases of mobile and looks at aspects not only from a Business to Consumer (B2C) perspective but also from a Business to Employee (B2E) point of view.

You can download the PDF from the IBM web site.

30 April 2015

Balancing Agile with the Enterprise

Agile is all the rage in enterprise circles at the moment.  Everything has to be agile.  Project managers are updating their lexicon and calling themselves scrum masters.  Developers are arguing that documentation slows them down (no change there then) and architects are getting frustrated because there is a risk they are being bypassed.

This this trendiness in the enterprise space probably means that 'agile' has jumped the shark in terms of a way of delivering projects.

I had some thoughts on how to blend agile with the more traditional world of enterprise applications. I captured them in this slide deck that I put on SlideShare.

Embedded version:

27 April 2015

Read all about it

I do a lot of thinking in my job.  In fact thinking *is* my job to be honest. I have been 'encouraged' to start sharing my thinking with a wider audience.

Let me introduce myself...

My name is Simon Greig and I am an Executive IT Architect working for IBM Global Business Services.  I have worked for IBM pretty much all of my career since the small company I worked for when I first left university was bought by the mega corporation known as IBM very shortly after I started working.

I started my working life as a C++ developer. I enjoyed the challenge of development but soon realised that in a corporate world the developers tend to have less opportunity to be creative than they might do working in a smaller environment.  In order to get more control and influence over the technical creativity process I decided I wanted to be an IT Architect.

I very quickly fell into a sub category of architecture focussing on integration.  I found integration fascinating.  Specifically the problem solving aspects of connecting a number of heterogenous systems, applications and platforms together.  It is never as simple and clean as you want it to be.

I have been doing complex systems integration properly since about 1999 and am still at it. I have learned a lot along the way.  In the mid 2000s I was a thought leader in designing, creating and managing Enterprise Service Bus solutions.  I led the creation of what was known inside IBM as the "ESB Asset" which was a solution pattern and turned into a bit of a defacto standard way to create an ESB with WebSphere Message Broker.  Used 20-30 times globally and earned me and the team a lot of recognition within the company.

By the late 2000s I was working in Chief Architect roles on large complex systems integrations projects.  I say projects in the plural but it really a couple a decade.  It is a bit tricky to work on a vast number of diverse projects when each project lasts 3-5 years!

My current area of interest is cloud integration.  Specially how the blazes do we integrate all that legacy good stuff in the enterprise to sub-systems and partners running on cloud platforms.