Site menu Docker: featherweight virtual machines

Docker: featherweight virtual machines

In technical terms, Docker is a manager of "featherweight virtual machines". They are so light, that novel methods of software development and distribution have been stemming from this technology.

A VM host like VMWare ESXi allows to run several virtual machines on a single physical computer. This is very useful and saves a lot on hardware. But there are some disvantages. The VMs impose a performance hit, and RAM reserved for each VM cannot be reused in case the VM is idle.

The Docker equivalent of a VM is a container. It is virtualized at feature level: filesystems, network connections, CPUs, permissions, etc. A container shares the operating system kernel with the host.

The virtualization and isolation features of a container are features already present in Linux for a long time, like cgroups and unionfs. This means that, in principle, both the host and the containers must run Linux. (Docker for Windows and for Mac run the Linux host as a traditional virtual machine.)

The Linux-only containers are, in principle, a major limitation that traditional VMs are free of. But most cloud services offer Linux-only virtual machines, so everybody has been accustomed with them.

Figure 1: First run of an Ubuntu container, in a Mac. The `ubuntu` image is automatically downloaded from Docker Hub if not found locally.

Compared to traditional VMs, containers are much "lighter" and use the hardware much more efficiently. There is no need to reserve RAM or define a maximum RAM amount. No container fails because the preallocated RAM was not enough.

Many apps, mostly network servers like Web or e-mail servers, do use the same kernel features to self-contain themselves, for security purposes. But Docker offers a general, easy to use and well-documented interface.

The extra mile of Docker

Though containers are a powerful concept, it is not terribly difficult to create one from scratch without Docker. The thing about Docker is the standard interface and the "batteries included" approach, that is, the extra tools that are very useful for IT and developers alike.

Tool Advantages
A Docker container can be created and run in all major operating systems. The developer can run Linux apps in their own machine, whatever is their preferred operating system.

Windows servers can readily run Linux apps and services.
Docker implements a script language (Dockerfile) that standardizes the image generation. (Image is the initial filesystem contents of a container.) It is still possible to configure a container as if it were a pet server, but this is discouraged. The container generation may be part of the app's source code and unit testing routines. It can be delegated to the developers, and can be automated, therefore lowering IT workload.

Delegating the containerization to the developer mitigates the "works for me" problem: an app runs in development enviroment but fails in production environment.
Docker Hub is a public image repository, allowing easy download and publishing of images. All major apps and Linux distros already make official images available through Docker Hub.

Most images and containers are largely based on third-party images found on Docker Hub.
Many services that would be complicated to configure, offer "ready to run" images, which means near-zero IT effort to make such services available.

The smart developer will choose the most appropriate third-party image as basis of their own image, so the effort spent in writing the Dockerfile is the smallest.
An image may be based on another image, and the disk size of the child image is just the difference between parent and child. Likewise, a container just uses disk space as its contents diverge from the initial image. It is feasible to keep all old versions of a container, with few worries about disk space, just in case it is necessary to return an older version to production.

If we needed to synthesize the whole table in a single sentence: Docker follows the trend "infrastructure-as-code", empowering the developer and unchaining apps from specific production environments.

Docker philosophy

In my opinion, the most opinionated characteristic of a Docker container is having a single main process per container. Yes, there are tricks that allow to run many processes on one container, but they are discouraged and have no tooling support.

If your app needs a Web server, a database server and runs two scripts periodically, there will be four containers: two running all the time, two intermittent. The log of a container is simply the standard output of the main process.

Old-style apps may take some contorcionism to adapt to Docker. For example, if a service has many independent processes that exchange data via files (Qmail is an example that comes to mind), we need to create a Docker volume. This volume is mounted by all containers that need to "see" the same folder.

By default, the contained process runs as root inside the container. This is not an immediate security threat because the process is "jailed". Of course, it is still good practice to forfeit unnecessary permissions, and world-class software like Apache, NGINX, MySQL, etc. do this even when running inside a container.

We have already mentioned images and containers. The difference between them is a bit confusing. In essence, an image is the model for a container, and a container is an image put to run.

Figure 2: Images in my Docker. The admin interface is Portainer, it is not part of Docker bundle but it is easily installed, since it is a third-party container.
Figure 3: Containers in my Docker. (Find the Easter Egg.) Docker gives random human-readable names to containers, in case we had not named them.

This relationship between images and containers have some wrinkles:

Figure 4: Querying active and inactive containers via command line.
Figure 5: Reusing a stopped Ubuntu container. It is necessary to use the parameters -i -t to get an interactive prompt.

Even though it is possible to stop and restart a container as many times as needed, keeping its contents, the container should be considered volatile and disposable. Files that need to "survive" the container should be in volumes, or in a database.

Except by files in volumes, the ideal dockerized app never writes data on files. This way, the effective container size is zero, since its contents stay equal to the initial image.

Servers as cattle

The handling of containers as disposable things is part of the greater trend: treat servers as cattle, worthless as individuals, managed automatically. Historically, the rule has been to treat servers as pets: individually installed and maintained by dedicated system administrators, with expectations of long life.

Obviously, a Docker containers needs an underlying physical computer to run, but it is way easier to configure a single Docker server than to configure fifteen servers, even virtual ones. And of course many physical servers have given way to cloud services, some of them can run containers directly.

Personally, I don't see this trend as a IT-killer, it is more like an absorption of some IT tasks by development process. Instead of the traditional separation of development and production environments, the source code itself describes the infrastructure it needs.

Clusters and orchestras

Docker offers tools to create and manipulate containers, but it does not take administrative decisions. Yes, it is possible to flag a container for continuous running, even across reboots (--restart always), but that's how far Docker can go. If this container is moved to another Docker server, the flag is lost and must be reinstated.

Likewise, scheduling the execution of intermitent containers must be done in some external tool, like crontab.

The ideal would be: to be able to buy a factory-installed Docker server, or use a cloud service, and just upload a "recipe" listing which containers are to be run, and when.

Another challenge with similar roots is scalability. It may be necessary to have a cluster of Docker servers to handle the workload and/or run several instance of the same container (in separate machines, or separate CPUs) to handle the many users of a given service.

This set of problems is called orchestration. There are basically to options to consider: Docker Swarm (written by the same team as Docker) and Kubernetes (written by Google).

Kubernetes is compatible with many other container and VM technologies, and it is capable of orchestrating heterogeneous environments. Perhaps because of that, it is becoming the industry standard; even Docker for Mac bundles Kubernetes.