Running Containers in a Secure Environment

It is common to see container demos or videos that demonstrate how quick and easy it is to take a container from a public registry, extend it with custom stuff and deploy it in multiple locations. A wonderful step forward in our industry to be sure. However, just as the technology advances before it, it is important not to get carried away and forget about the complexity that non-functional requirements bring to a solution. Often the non-functional requirements completely change the way a solution works due to the constraints that they bring. This article is going to focus on one such non-functional requirement: Security Vulnerability Management.

As you will see, this one requirement modifies the original functional solution significantly. Security vulnerability management is only one of the non-functional ‘lenses’ that an enterprise IT system needs to be reviewed against to ensure it works. See Figure 1 below for some other lenses that need to be applied to the solution to validate that it is fit for purpose.

Figure 1: Enterprise IT Non-Functional Lenses

Consider the simple use case, deployment of a container. This is the example that is seen time and time again on videos and demos. The archetype pattern for containers. In an environment where application development and hosting are separate functions (i.e. most of the time) the process might look similar to that in shown in Figure 2 below.

Figure 2: Container Deployment

The application development team is responsible for packaging up the container and the operating team is responsible for installing it on a host (or cloud platform) and, typically, managing it to achieve and retain certain service levels. The difference between the demonstration and the real world deployment is that the non-functional requirements are an absolute reality. Not only a reality, often non-negotiable in a mature IT organisation who have the experience of when things go bad! The next section looks into a little more detail about what security controls might be needed to ensure that the deployed containers don’t compromise the security of the IT estate.

Container Security

Container security is mostly focussed with spotting, containing and managing security vulnerabilities. Vulnerabilities come from a variety of sources:

Vulnerabilities in containers we make ourselves
Vulnerabilities in containers we reuse from external sources
Vulnerabilities in the underlying host operating system

It is a common misconception that containers are inherently secure because security is built in. It is true that containers at the application level appear to be more secure as by default containers don’t open any connection ports thereby reducing the attack surface of the container. However, containers (at the time of writing) need to run at root level privileges on the host machine and there is currently very limited protection to stop a malicious container from exploiting this feature to its advantage.

One of the differences between virtual machines (VM) and containers is that VM technology has matured to a level where VM separation can be protected (though not ensured) by hardware enforced separation. In other words, computer CPUs contain instruction sets to enable VM separation and greatly reduce the attack surface between VMs on the same physical host. Containers do not have this level of maturity yet. See the section below on deployment options for more discussion on this topic.

Immutability

One of the features of containers is that once you have built them they are immutable. This means that you can move containers between environments and, except for a few environment variables, they do not need to change. This immutability means that once a container has been tested to work on the development environment then it can be moved to other environments with a higher chance of it working. There are still the standard risks of moving between environments such as:

External dependencies — things that the container is dependent on but not inside the container (e.g. an API) are installed at different versions between environments
Global settings in the environment having a material difference and causing different behaviour (e.g. OS level settings such as memory or disk access methods)
Data differences — typically in the areas such as reference data or interface data

Despite the above risks, containers are easier to move between environments than with previous methods.

The core concept of container immutability is that once a container is composed that it always be treated as a locked box. A container must never, ever be modified in any other place other than where it was originally built. This integrity is key to container portability and can be validated via container signing. Signing is a whole different topic and not one that will be explored further here.

The immutability is an advantage from a functional perspective but it is a very large disadvantage from a security management perspective. Imagine the case where a security vulnerability is identified in one of the components inside a production deployed container. Rather than conducting the cardinal sin of patching the deployed container in situ, the container contents need to be patched at source by the application development team.

Once the patching of the container contents is completed, the container needs to be rebuilt, tested and then re-deployed. Finally, the system as a whole needs to be regression tested in a staging environment to make sure something didn’t break. This is where an automated build pipeline helps significantly as it will take a lot of the effort out of this process.

The conclusion here is that the immutable containers, in perfect world, will be frequently refreshed from source in order to make sure that the latest patches are included in the live containers. In reality it is unlikely that there would be the need or desire to do this refresh except when there is a direct need. For example, when a security vulnerability has been found in a production component. Therefore it is very important that live running containers as well as development containers are frequently checked for new vulnerabilities as new vulnerabilities may have been discovered (and patched) after the container was deployed to production.

Public Containers

The other source of security risks is the re-use of containers from public container registries. Public container repositories are a very important source of innovation to the code base of systems. They allow developers to share and reuse to improve productivity and reduce functional risk. However public containers are also an excellent source of security vulnerabilities.

In order to provide an element of control it is worth considering adding a vetting process to create a trusted source repository of security checked ‘parent’ containers. The trusted source would be the master repository for deployments to controlled environments rather than the public registry. This process would provide an element of control for security but it is recognised that it can constrain the velocity of a project being delivered via agile methods. Where the middle ground is will depend on the organisation’s Architecture Entropy level (see further reading).

Container Deployment Process

Applying the suggested modifications to the process shown in Figure 2, of course, adds in complexity. The core pattern of deployment still exists but the increased controls and process dwarf the original pattern. This is illustrated in Figure 3 below.

Remember this complexity arose from just applying one non-functional lens to the deployment. Many of the other non-functional lenses will add their own complexity to the process and may fundamentally change the pattern due to the constraints the non-functional requirements bring.

Figure 3: Secure Container Deployment Process

The next section goes on to explain the deployment rules and policy aspects of determining the most appropriate hosting locations for containers.

Deployment Options

Container deployment requires careful management and control. This might be:

To apply organisation policy to enforce a delivery model — for example supplier X is not allowed to deploy in environment Y
More granular control — for example containers with data storage are not allowed to be deployed on a host situated in a DMZ network zone
A combination

Regardless of the situation there needs to be some policy and control. The deployment is all about balancing security and flexibility. For instance Figure 4 shows perhaps the most secure container deployment option. Clearly this is taken to the extreme but the policy decision was that no cohabitation between containers is allowed on the same host. The physical separation enforces security but also removes all of the benefits of containers.

Figure 4: High Security Container Deployment

Going to the other extreme it is feasible that all containers could be deployed on any node in any combination. In other words there are no constraints on cohabitation. Figure 5 shows this type of flexible configuration. Although it is an extreme example it is highly likely that there are a number of deployments in the real world that look like this in production. This is not a good idea for a secure system as can be seen by the variety of security threats at each level of the stack (Figure 7).

Figure 5: Highly Flexible Container Deployment

The reality is that there will be a compromise between security separation and the need for flexibility. The cohabitation rules will likely be different for different environments depending on the security levels that environment is running. The good news is that a container shouldn’t care what host it is deployed on as long as there is a network path to the containers and resources that it needs to communicate with.

To determine the rules it is important to understand, for each security level, what constraints are in place for splitting virtualisation layers across different boundaries. For instance can VMs (or container hosts) in different network zones be on the same physical node? If so, in what security classical levels is this allowed?

Figure 6: Hybrid Container Deployment

Figure 7: Security Levels

Differentiating Container Security Levels

Containers themselves are immutable but their security status is constantly changing. Unchecked containers are validated, previously ok containers are found to have vulnerabilities and problems are fixed.

To keep track of the security position it is required to allocate a security status to the containers. It is proposed that the following levels are used:

Grey — unknown
Blue — validated public
Red — ready for production verification
Amber — formerly suitable for production but currently ‘in doubt’
Green — suitable for production operation

The relationships between the statuses are shown in Figure 8 below. The status is not a static concept and therefore the containers must be continuously validated.

Figure 8: Container Security Status Levels

Security Management Process

The continuous security testing performed as part of the validation process will be altering the security state of containers. The larger the estate, the higher the frequency of the change. It is important to put in place a container security management process to keep on top of the security problems in the estate.

A very high level process is shown in Figure 9 below. The concept is that there is a continuous container security management capability sitting between the application development and the operations teams. The security management capability is responsible to ensuring that only “green” containers are running in production and that any containers that go “amber” are fixed as soon as possible.
The security management capability’s workload is helped by instrumentation that is mandated to be added to the containers. The inclusion of the instrumentation is checked in the continuous security testing process and only containers with the instrumentation implemented correctly will be allowed to go “green”.

The instrumentation helps by automatically sending callback notifications to a central deployment register. These callbacks are sent by the containers themselves at key points:

When a container is deployed to a node
When a container is started (or stopped) on a node
When a container is suspended on a node

The callbacks allow the deployment register, a form of CMDB, to keep up to date on what containers are deployed where. More importantly it also tracks when containers are started, stopped and suspended. This latter part is important for policy enforcement to ensure that requests to suspend or stop a container found to have a problem with it have been complied with.

Figure 9: Container Security Management Process

Conclusion

Pulling together all of the threads mentioned in this paper means that it is possible to start to understand the end-to-end aspects of secure container management. Of course, containers still need hosts to run on and so similar processes are required to manage and check the host operating systems. It is interesting to see how much complexity is added when looking through a single non-functional lens. Imagine the complexity of a design when all of the non-functional lenses have been applied to a design. This is one of the reasons why enterprise IT is actually very hard to do well!
Do containers simplify everything? Not really when you look at the big picture. They certainly help simplify certain areas and of course create new areas of complexity. I am watching with interest if “Containers as a Service” (CaaS) takes off in the market. A CaaS host will worry about a lot of this for you. Hopefully!

IMHO

Search