10 June 2016

Running Containers in a Secure Environment

Running Containers in a Secure Environment

It is common to see container demos or videos that demonstrate how quick and easy it is to take a container from a public registry, extend it with custom stuff and deploy it in multiple locations. A wonderful step forward in our industry to be sure. However, just as the technology advances before it, it is important not to get carried away and forget about the complexity that non-functional requirements bring to a solution. Often the non-functional requirements completely change the way a solution works due to the constraints that they bring. This article is going to focus on one such non-functional requirement: Security Vulnerability Management.

As you will see, this one requirement modifies the original functional solution significantly. Security vulnerability management is only one of the non-functional ‘lenses’ that an enterprise IT system needs to be reviewed against to ensure it works. See Figure 1 below for some other lenses that need to be applied to the solution to validate that it is fit for purpose.

Figure 1: Enterprise IT Non-Functional Lenses

Consider the simple use case, deployment of a container. This is the example that is seen time and time again on videos and demos. The archetype pattern for containers. In an environment where application development and hosting are separate functions (i.e. most of the time) the process might look similar to that in shown in Figure 2 below.

Figure 2: Container Deployment

The application development team is responsible for packaging up the container and the operating team is responsible for installing it on a host (or cloud platform) and, typically, managing it to achieve and retain certain service levels. The difference between the demonstration and the real world deployment is that the non-functional requirements are an absolute reality. Not only a reality, often non-negotiable in a mature IT organisation who have the experience of when things go bad! The next section looks into a little more detail about what security controls might be needed to ensure that the deployed containers don’t compromise the security of the IT estate.

Container Security

Container security is mostly focussed with spotting, containing and managing security vulnerabilities. Vulnerabilities come from a variety of sources:
  • Vulnerabilities in containers we make ourselves
  • Vulnerabilities in containers we reuse from external sources
  • Vulnerabilities in the underlying host operating system
It is a common misconception that containers are inherently secure because security is built in. It is true that containers at the application level appear to be more secure as by default containers don’t open any connection ports thereby reducing the attack surface of the container. However, containers (at the time of writing) need to run at root level privileges on the host machine and there is currently very limited protection to stop a malicious container from exploiting this feature to its advantage.

One of the differences between virtual machines (VM) and containers is that VM technology has matured to a level where VM separation can be protected (though not ensured) by hardware enforced separation. In other words, computer CPUs contain instruction sets to enable VM separation and greatly reduce the attack surface between VMs on the same physical host. Containers do not have this level of maturity yet. See the section below on deployment options for more discussion on this topic.


One of the features of containers is that once you have built them they are immutable. This means that you can move containers between environments and, except for a few environment variables, they do not need to change. This immutability means that once a container has been tested to work on the development environment then it can be moved to other environments with a higher chance of it working. There are still the standard risks of moving between environments such as:
  • External dependencies — things that the container is dependent on but not inside the container (e.g. an API) are installed at different versions between environments
  • Global settings in the environment having a material difference and causing different behaviour (e.g. OS level settings such as memory or disk access methods)
  • Data differences — typically in the areas such as reference data or interface data
Despite the above risks, containers are easier to move between environments than with previous methods.

The core concept of container immutability is that once a container is composed that it always be treated as a locked box. A container must never, ever be modified in any other place other than where it was originally built. This integrity is key to container portability and can be validated via container signing. Signing is a whole different topic and not one that will be explored further here.

The immutability is an advantage from a functional perspective but it is a very large disadvantage from a security management perspective. Imagine the case where a security vulnerability is identified in one of the components inside a production deployed container. Rather than conducting the cardinal sin of patching the deployed container in situ, the container contents need to be patched at source by the application development team.

Once the patching of the container contents is completed, the container needs to be rebuilt, tested and then re-deployed. Finally, the system as a whole needs to be regression tested in a staging environment to make sure something didn’t break. This is where an automated build pipeline helps significantly as it will take a lot of the effort out of this process.

The conclusion here is that the immutable containers, in perfect world, will be frequently refreshed from source in order to make sure that the latest patches are included in the live containers. In reality it is unlikely that there would be the need or desire to do this refresh except when there is a direct need. For example, when a security vulnerability has been found in a production component. Therefore it is very important that live running containers as well as development containers are frequently checked for new vulnerabilities as new vulnerabilities may have been discovered (and patched) after the container was deployed to production.

Public Containers

The other source of security risks is the re-use of containers from public container registries. Public container repositories are a very important source of innovation to the code base of systems. They allow developers to share and reuse to improve productivity and reduce functional risk. However public containers are also an excellent source of security vulnerabilities.

In order to provide an element of control it is worth considering adding a vetting process to create a trusted source repository of security checked ‘parent’ containers. The trusted source would be the master repository for deployments to controlled environments rather than the public registry. This process would provide an element of control for security but it is recognised that it can constrain the velocity of a project being delivered via agile methods. Where the middle ground is will depend on the organisation’s Architecture Entropy level (see further reading).

Container Deployment Process

Applying the suggested modifications to the process shown in Figure 2, of course, adds in complexity. The core pattern of deployment still exists but the increased controls and process dwarf the original pattern. This is illustrated in Figure 3 below.

Remember this complexity arose from just applying one non-functional lens to the deployment. Many of the other non-functional lenses will add their own complexity to the process and may fundamentally change the pattern due to the constraints the non-functional requirements bring.

Figure 3: Secure Container Deployment Process

The next section goes on to explain the deployment rules and policy aspects of determining the most appropriate hosting locations for containers.

Deployment Options

Container deployment requires careful management and control. This might be:
  • To apply organisation policy to enforce a delivery model — for example supplier X is not allowed to deploy in environment Y
  • More granular control — for example containers with data storage are not allowed to be deployed on a host situated in a DMZ network zone
  • A combination
Regardless of the situation there needs to be some policy and control. The deployment is all about balancing security and flexibility. For instance Figure 4 shows perhaps the most secure container deployment option. Clearly this is taken to the extreme but the policy decision was that no cohabitation between containers is allowed on the same host. The physical separation enforces security but also removes all of the benefits of containers.

Figure 4: High Security Container Deployment

Going to the other extreme it is feasible that all containers could be deployed on any node in any combination. In other words there are no constraints on cohabitation. Figure 5 shows this type of flexible configuration. Although it is an extreme example it is highly likely that there are a number of deployments in the real world that look like this in production. This is not a good idea for a secure system as can be seen by the variety of security threats at each level of the stack (Figure 7).

Figure 5: Highly Flexible Container Deployment

The reality is that there will be a compromise between security separation and the need for flexibility. The cohabitation rules will likely be different for different environments depending on the security levels that environment is running. The good news is that a container shouldn’t care what host it is deployed on as long as there is a network path to the containers and resources that it needs to communicate with.

To determine the rules it is important to understand, for each security level, what constraints are in place for splitting virtualisation layers across different boundaries. For instance can VMs (or container hosts) in different network zones be on the same physical node? If so, in what security classical levels is this allowed?

Figure 6: Hybrid Container Deployment

Figure 7: Security Levels

Differentiating Container Security Levels

Containers themselves are immutable but their security status is constantly changing. Unchecked containers are validated, previously ok containers are found to have vulnerabilities and problems are fixed.

To keep track of the security position it is required to allocate a security status to the containers. It is proposed that the following levels are used:
  • Grey — unknown
  • Blue — validated public
  • Red — ready for production verification
  • Amber — formerly suitable for production but currently ‘in doubt’
  • Green — suitable for production operation

The relationships between the statuses are shown in Figure 8 below. The status is not a static concept and therefore the containers must be continuously validated.

Figure 8: Container Security Status Levels

Security Management Process

The continuous security testing performed as part of the validation process will be altering the security state of containers. The larger the estate, the higher the frequency of the change. It is important to put in place a container security management process to keep on top of the security problems in the estate.

A very high level process is shown in Figure 9 below. The concept is that there is a continuous container security management capability sitting between the application development and the operations teams. The security management capability is responsible to ensuring that only “green” containers are running in production and that any containers that go “amber” are fixed as soon as possible.
The security management capability’s workload is helped by instrumentation that is mandated to be added to the containers. The inclusion of the instrumentation is checked in the continuous security testing process and only containers with the instrumentation implemented correctly will be allowed to go “green”.

The instrumentation helps by automatically sending callback notifications to a central deployment register. These callbacks are sent by the containers themselves at key points:
  • When a container is deployed to a node
  • When a container is started (or stopped) on a node
  • When a container is suspended on a node
The callbacks allow the deployment register, a form of CMDB, to keep up to date on what containers are deployed where. More importantly it also tracks when containers are started, stopped and suspended. This latter part is important for policy enforcement to ensure that requests to suspend or stop a container found to have a problem with it have been complied with.

Figure 9: Container Security Management Process


Pulling together all of the threads mentioned in this paper means that it is possible to start to understand the end-to-end aspects of secure container management. Of course, containers still need hosts to run on and so similar processes are required to manage and check the host operating systems. It is interesting to see how much complexity is added when looking through a single non-functional lens. Imagine the complexity of a design when all of the non-functional lenses have been applied to a design. This is one of the reasons why enterprise IT is actually very hard to do well!
Do containers simplify everything? Not really when you look at the big picture. They certainly help simplify certain areas and of course create new areas of complexity. I am watching with interest if “Containers as a Service” (CaaS) takes off in the market. A CaaS host will worry about a lot of this for you. Hopefully!

Further Reading

  1. NCC Group whitepaper: “Understanding and Hardening Linux Containers”; 29 June 2016
  2. Chris Milsted: “Patching, Anti-Virus and Configuration management when adopting docker format containers”; April 2016
  3. Simon Greig blog: “Architecture Entropy”; November 2015

11 April 2016

Micro Services Will Not Change the World!

Micro services will not change the world; like those tech trends that came before micro services (SOA, object oriented, object brokers, XML, client/server, etc ) didn't either. That all said I think, like the trends that came before it, micro services will make an incremental improvement into how we design and build software and systems. Just not fundamentally change it.  I don’t have anything against micro services, in fact I like the model, but this is my two pence worth point of view.

At the moment there is a lot of talk about micro services being the answer to all known problems in IT including ‘fixing' big, complex, hard to change enterprise systems. Hereafter referred to as 'monoliths'. In my eyes these monoliths are in fact made up of micro services - except the services are called modules, packages, components and scripts.  The monolith is really the fact that all of those modules, packages, components, scripts are written, integrated and tested by a single system integrator supplier in a typically opaque manner.

I suggest that the fix to the monolith is not the adoption of micro services but a change to the way the system is contracted. However, it is worth noting that those single contract delivery models, although seen as cumbersome, do provide a lot of intangible benefits.  The largest benefit is the ability to delegate making sure that the non-functional aspects of the system (performance, capacity, stability, guaranteed service levels, etc) are covered and put measurable targets in place to get a consistent level of service.

Service Contracts

Micro services will provide the ability to make changes quickly and, if the overall architecture is supportive, those changes will be isolated to small areas of the system thereby limiting the testing.  Right?

A number of years ago I worked on a large integrated system with tricky NFRs around performance and availability.  The system was based on a services oriented architecture and designed to be flexible.  There is a strong argument that says that micro services are just an evolution of SOA and that the same principles apply.  I tend to agree.  In this system we could process a change request and make the technical change in 5 minutes.  Great eh?  However it would take us up to 100 days effort to regression test and deploy a change such as this – much to everyone’s annoyance (mine included).

The problem was not that we were poor at testing; it was because the contract included service penalties of up to £1,000,000 a month if performance and availability of the system did not meet the contracted targets.  Numbers like that focus the mind and drive a risk averse behaviour when it comes to implementing changes!

Where there are risks such as the service penalties then the mitigation is to add rigorous controls and processes. The consequence is that these change processes and testing regimes impact the speed and agility that changes can be implemented in the live environment. Couple that with industry regulation, third party assessments and licensing then it makes changing any live and business (or safety or national security) critical IT system, no matter how small a change, quite a risky undertaking.

What does this have to do with micro services?

The system referred to above was built around an SOA architecture and when we do a traditional SOA based design we aim for coarse-grained abstract services that encapsulate the complexity of the back end.  Micro services are similar but tend to be much finer grained and therefore there are going to be much more micro services in a system compared to the equivalent in an SOA platform.

In other words, micro services generate more moving parts.  Of course this code logic needs to exist regardless of it being exposed as a micro service or not but the point of the micro service is that it is visible, flexible and reusable. Therefore, if a code function is moved from being wrapped and protected by its surroundings to being visible and usable in a variety of ways, this exposure creates moving parts. As every engineer knows, the more moving parts a system has, the more fragile a system will get.

It is this fragility that will mean that the unbounded flexibility promised by micro services is at risk of not being realised.  Where fragility exists, then change uncertainty and change risk start to materialise.  The obvious mitigation to this risk is to up the amount of testing. Automated testing will help but there will still need to be an element of manual testing to mitigate the risks in high stakes systems.

Additionally, because micro services create a loosely coupled environment with potentially many alternate paths the change impacts are less understood. This is because the interactions between services are not always evident and in some cases not consistent. This uncertainty is amplified in an environment where changes are split between multiple suppliers and/or contract boundaries.  Therefore, as we saw in SOA, it will be possible to make changes quickly but will still take time and effort to assure all of the stakeholders and service management teams that those changes didn’t break something important.

I still think micro services are good however!

What micro services have done; like object oriented design, SOA and others before it; is formalise leading application architecture thinking into simple patterns that everyone can understand, debate and share. Micro services also push the envelope and increase the significance and importance of good APIs. When I mean 'good' I mean usable and reasonably static in terms of change. API changes are the source of the most complex impact assessments. At the end of the day though, we are building complex IT systems and there is always going to be a level of risk generated from that complexity. No matter how simple it is to change the code, the higher the criticality of the system the more rigorous the demands of the users and sponsors of the system to making sure that changes are impact assessed properly and the risks are mitigated with sufficient testing.

Using DevOps means that the automation allows for small changes to be deployed more frequently, therefore reducing the risk of individual changes causing unknown outcomes and failures.  This is true but most organisations are not yet in the place where they can safely implement little and often changes with no manual test requirements.

The point I am trying to make is that IT technology by itself cannot be seen as the solution to all IT problems. The commercial and legislative aspects also need to be modified as well in order to achieve more agility. Some organisations are doing this by taking the integration responsibility back in house and essentially letting suppliers off service levels (because it is just too hard to identify route cause and therefore who is at fault).  I wouldn’t be surprised to hear these organisations claiming that it is micro services that have allowed them to be more agile when in fact it is more likely that the contractual goal posts have moved with their supply chain.

So I don’t think micro services will change the world.  Complex IT will stay complex and every time we create a new way of doing things with a new style, paradigm or technology we make it just a little bit more complicated!