I started getting into Docker just under a year ago. It obviously had promise, but I couldn’t find many people using it successfully. Since then Docker has matured, and I’ve been recommending it to everyone doing CI/CD web-services.
When the IT services industry first went to the dynamic virtual machine resources that we now call “the cloud,” I realized that there is more than just the benefits to separation, isolation, and organization. I realized that the dynamic resources allow us to scale as needed as opposed to specifying an environment a year in advance. Then our infrastructure-as-code alleviated the need for developers to follow a 20-page install guide when they onboarded, and alleviated the problem of most of the production environment being impossible to reproduce.
My primary goal, from an Ops perspective, is to increase our confidence in our product and our deployments. If you aren’t thinking about using Docker containers to do this, then it’s time to. The first reason people wanted to move from standard virtual machines to Docker containers is they are more resource efficient. At the very least we can reduce the amount of overhead we lost by decomposing large computers into smaller clusters. So, some think of Docker as an even thinner VM of just your application, and that is true, but it is more. Because it is so much smaller [on-disk], and so much faster to start-up/shutdown, it is going to enable different paradigms.
Differences between Docker and a normal VM
1. Docker runs one process
When you create your VM, your kernel hands off control to one process, but that is the init system (pid 1) which then kicks off a whole host of other processes. One of the ways Docker helps us is by reducing a lot of duplication at that OS level. In Docker we are going to try and only run a single process. From a design standpoint, the “Docker way” forces us to evolve in the way we think and better organize our container roles. In addition, since we are not going to start sshd, it does cause us to think about security differently than the current linux-cloud world where security tools like Nessus will ssh into all your boxes to scan then from the inside.
2. Docker is less stateful
Docker is like having a VM which you reset back to snapshot instead of stopping or shutting down (this should seem very familiar to vagrant users). This feels painful at first, but it just forces us to have a mature infrastructure-as-code (chef/ansible) set up to build them. We can have stateful data, but in the simplest cases we either map a container mount to a host directory or the application communicates to an external database directly.
TIP: Think of Docker-izing as you would think about organizing your environment into different git repositories (it builds one “thing”). A Docker container does one thing well: it runs one process. So, great uses of Docker include application servers (like Tomcat) and message servers backed by external databases (like Kafka / RabbitMQ ).
How to start using Docker?
Once you’ve decided to actually build a Docker image you need to:
- Install the tools on your machine
- Decide on your base image that your container will inherit ( don’t start from scratch! )
- Read/bookmark the Dockerbuild reference documentation
I recommend playing with building and running images to familiarize yourself with the toolset. Some introductory things to do are:
- Sign up for a Docker hub account and push an image there
- Start your image and
- Learn how to map disk space to the host machine
- Learn how to map ports to the host machine
- “Log in” to your image by starting bash interactively inside it
Immediate Docker Challenges you will face
Just because you start using Docker doesn’t mean you won’t run into problems. Here are a couple of common ones to look out for:
1. Image bloat
Docker images are impressively small, but when your “one application” ends up depending on half of the OS, you really haven’t saved much. You have to be wary of programs that shell out to external processes or depend on common services. On a minimal installed Operating System, these programs might already be there, but you might drastically increase your footprint in a Docker image. The most newbie mistake is to not clean out your cache before committing and publishing that container.
2. Stateful Data
Our databases must persist, but even if we’ve solved our database problem, we still need to think about all the other parts of our environment that have stateful data. Our web application is usually the easiest part (from this standpoint) but in every other environment I’ve worked in, I’ve seen infrastructure that has persistent data or initialization needs.
TIP: Learn your package managers & clean your caches. In Red Hat, “yum install …; yum clean all” can save you a lot of space.
The Extended Docker Ecosystem
If you are still hesitating over adopting Docker, take a look at all the supporting tooling that has come out. These are service companies rising to the increased demand for a Docker aware ecosystem:
- Red Hat has switched to Docker/Kubernetes for their PaaS framework
- Jenkins has some cool plugins for supporting a Docker based workflow
- The most basic of Docker clusters is now quite easy with Docker Compose
- Sonatype’s Nexus Repository has support for Docker containers, so now you can have a Private Docker Registry
- Artifactory ( the only other repository I’ve seen people use ) has support too
- New Relic (our local monitoring software) has support
Hit me up next month for some tips on reducing Docker image bloat!