Docker is the talk of the software development industry right now. Though some, like @monkchips, think that the smart money is on Otto — users and vendors alike are looking at containerization. Early adopters of Docker are experiencing a mix of fervor, frustration, and ultimately, for some, enlightenment.
I explore three Docker anti-patterns I stumbled on in my own journey towards Docker nirvana. These discoveries all stem from a core misunderstanding of containerization. My goal is to provide readers with a solid understanding of the reason for and the nature of containers.
Docker as a virtual host – Multiple concerns per container
Docker containers are not virtual machines; they are units of encapsulation that do one job well. If you follow the micro-services trend, you’re likely to adopt an organization blueprint matching that architecture. Docker containers are the deployment infrastructure equivalent to micro-service architecture and its single concern philosophy.
Many engineers enjoy running virtual machines on our workstations. They allow us to run old versions of Internet Explorer (or any version for us Mac users) without having to have a dedicated physical machine for it. We can run various Linux distributions, test software, etc. Virtualization in data centers (and AWS) has allowed us to take more control of our infrastructure, our scalability options, and more.
To the untrained eye Docker and containerization each look a lot like virtualization, but there are differences that are impactful.
Beyond Dockerfile syntax, we need to understand Docker’s philosophy of composition and how containerization is different from virtualization. For this purpose, it is helpful to think of containers as role-based VMs.
The first Docker containers I wrote exhibited a typical anti-pattern, which can be described as ‘multiple-concerns’ per container. For example, one container would install and use MySQL, WordPress, possibly even phpmyadmin, nginx and an ssh daemon. That sounds a lot like a virtual machine, one you can set up with a config file (wait, isn’t that what Vagrant does for developers?), and that’s essentially what that container was.
IMPORTANT: Containers are meant to host the smallest possible software set to address one concern, isolating features as micro-services (for more on that topic, read Microservices — Not a free lunch!) to allow for scaling horizontally.
In today’s increasingly complex architectures, the ability to update discrete items independently of others is key.
Each container should do one thing, such as:
- host a database
- run an app
- backup the database
- ship the logs, etc
Each container in turn should be stitched together via shared volumes, linked containers, etc. Having a single-concern per container doesn’t necessarily mean that each container will only run one main process, but rather that each container will have only one role.
We’re very likely to succumb to this anti-pattern on a developer workstation because Docker offers the best way to run an application (and its dependent services) without dirtying up your system. So we look at Docker as a way to create a VM (not everyone knows about or is comfortable with Vagrant) and run it with all the services nicely packaged up in one container.
Proper containerization is still preferable, possible, and advisable — especially as Docker is getting better at going all the way from developer workstations to production environments.
Takeaway: As you build your systems on multiple single-concern containers, you’ll need a tool to orchestrate/launch these containers: learn Docker-compose
Latest doesn’t mean best
An advantage of containerization is the ability to quickly re-build your images (in the case of a security issue, for example) and roll out a whole new set of containers quickly. And because containers are single-concern, you’re not redeploying your whole infrastructure every time.
As a beginner or intermediate Docker user, it’s important to understand and decide which OS versions and dependencies you’re using.
It is incredibly tempting when writing your Dockerfile to grab the latest version of every dependency. Most public images available on the Docker Hub have one tag: latest, and even when they have more deterministic versioning, we, the inexperienced, still tend to use latest.
The golden rule though is to create containers with known and stable versions of the system and dependencies that you know your software will work on.
Let’s look at what the NodeBB/NodeBB Dockerfile used to look like:
FROM node:0.10-onbuild (Ignore the onbuild for now)
node:0.10-onbuild Dockerfile has a FROM of
node:0.10.40.onbuild (think of
node:0.10.x-onbuild). Reading its Dockerfile you’ll see that
node:0.10.40 does a good job of version pinning, installing node.js 0.10.40 (très specific) and npm 2.11.3 (equally specific) based on a system build by
FROM buildpack-deps:jessie which is a recent stable release of Debian (Jessie is Debian 8.x).
Recently a pull request came in, titled “newer node = faster node” which started with:
The pull request seems innocuous and well intentioned as newer versions of node have been markedly faster than 0.10.x. However,
node links to
node:latest since latest is used by default.
node:latest, as we can see on the Docker Hub, currently points to node:5.0 (it is an inherently moving target, a pointer to the latest of node, stable or not). And 5.0 is the unstable branch of node.js, while 4.x is the most recent LTS (Long Term Support) version of node.js. So the results is that
FROM node and its cousin
node:latest break version pinning.
Versions numbers are important. The Semver versioning model is “MAJOR.MINOR.PATCH” but it helps to think of it as BREAKING.FEATURE.FIX. For 99% of production and testing tasks, you want BREAKING (MAJOR), FEATURE (MINOR) to be static and only make changes to FIX (PATCH) once tested.
I don’t need or want to test my app on all the versions of node.js out there (some are really broken). This is not front-end browser code where I have no control over the customer platform of choice; I can have a deterministic approach to the environment where I will run my software. That includes the Linux distribution, all dependencies (libraries, node modules & others) as well as my node.js runtime & npm versions.
A failure cause is pulling a latest directly or indirectly by using a fixed version image that uses a latest. Another reason is when you indiscriminately pull various software versions in your Dockerfile, something you’ll want to be careful with. A better alternative would be to use FROM node:4-onbuild in the Dockerfile which will pull the latest version of the stable version of node.js 4.x release, something I am more comfortable with.
So in summary, when you build your Dockerfile, do knowingly decide what version of what you want to pull in (called version pinning, typically through RUN instructions) and what base image (FROM) it uses.
Takeaway: When you’re looking to base your image on another image, climb up the FROM chain to really understand what each pulls, all the way to the scratch.
Docker containers with SSH
A related and equally unfortunate practice is to bake an ssh daemon into your image. This is a normal impulse as we feel powerless at the thought of not being able to ssh into our “VM” and tinker, make last minutes change, etc. However this goes against the idea of immutable infrastructure, as having an ssh daemon in your container may encourage you to make undocumented, untraceable changes to your infrastructure without the means to re-play and automate them. This makes sense with VMs (though obviously you have to lock things up pretty tight), but here we’re talking about single-concern containers, right?
Let’s explore a few use cases for being able to SSH into your container:
- Update the OS, services or dependencies
- Git pull or update your application in some other fashion
- Check logs
- Backup some files
- Restart a service
Try to determine if you really need ssh or if you could do the following instead:
- Make the change in the container Dockerfile, rebuild image, deploy the container
- Use an environment variable or configuration files accessible via volume sharing to make the change and possibly restart the container
- As a last resort, use docker exec
Takeaway: Abstain from allowing access to your container via ssh altogether and find a better way to do what you need to do. You’ll discover better patterns along the way.
For more thoughts on Docker by Erik Dasque, check out his blog.