Docker, Alpine, Garbage and You

Today I had the misfortune of wasting some time reading this panicked attempt at spin-doctoring.

For those who are on the sidelines and haven’t been using Docker much yet, allow me to provide a bit of context and clarification before moving on to demonstrate how Canonical’s “business and technical point of contact with Docker Inc” manages to do a poor job of the business task of PR (spin) due in no small part to a number of significant technical errors.

The Pain of Base-Layer Bloat

A significant operational concern that people seem to encounter over and over is filesystem bloat due to a huge array of enormous base images upon which various application containers – both third-party and in-house – get built. The primary cause of this is because so many images use Ubuntu (or, to a lesser extent, Debian) images as their starting point. After all, if you can get a package installed on Ubuntu, you can build a container of that application!

Postfix may only be a few megabytes by itself, but when the most widely used image on DockerHub is built on top of Ubuntu, the reality is that almost all the code in that mail server winds up being… not the mail server!

Here’s a short list of some of the practical problems that arise as a result:

  • Docker hosts gradually running out of disk space unless you are incredibly proactive about GC-ing no-longer-referenced image layers. Every time your developers build a new version of their image, they risk pulling in a new version of the base image – unless they’re locked to a specific base image, which most aren’t.
  • Download speed can be atrocious if you are using a private registry. Private registries are important for security, reliability, and reproducability.
  • Greatly amplified legal-compliance workload when checking the licenses of installed software packages. Remember: Your developers are building their images against, say ubuntu:latest (or perhaps something like ubuntu:14.10), so every release of their image runs the risk of you having a different base image of Ubuntu that you need to start over on auditing.
  • Greatly amplified and greatly complicated security workload when ensuring installed software is current with security patches. It’s the same nightmare as the legal-compliance problem, but with the added complexity of forcing you to inject your operations concerns into developers’ workflows as the only clean entrypoint for you to apply patches is at your developers’ Dockerfiles. This interferes with the exact separation-of-concerns between development and operations that containerization is meant to help with!

In short: Base-layer bloat presents one of the largest and most pressing practical pain-points in the adoption of containerization today, especially for larger organizations looking to make their developers’ lives easier with containerization.

A Light on the Horizon

Thankfully, Docker Inc recently announced they’re moving “official” Docker images to a foundation of Alpine Linux.

The Alpine Linux images hover in the ~5MiB range. They’re like the BusyBox image, but include a package manager and a significant array of available packages. If you want to strip them down to reduce the code available for an attacker to take advantage of, there’s simple far far less to remove than in Ubuntu-land.

Misleading To The Point of Dishonesty?

So let’s talk about what Mr. Kirkland – Canonical’s rep – has to say about the announcement, point by point.

  1. “Check DockerHub and you may notice that while Busybox (Alpine Linux) has surpassed Ubuntu in the number downloads (66M to 40M), Ubuntu is still by far the most "popular” by number of “stars” — likes, favorites, +1’s, whatever, (3.2K to 499).“
    1. BusyBox is not Alpine. I’m kinda blown away that Canonical’s "technical” contact to Docker Inc would be so unaware of such an important distinction.
    2. In fact, the former’s DockerHub page links to the latter’s when talking about alternatives!
    3. Trying to conflate “likes” (stars) with downloads is disingenuous. If you’re looking for a robust project, do you care about how many people saw it and thought it looked interesting – or do you care about how many people have used it?
  2. “Ubuntu’s compressed, minimal root tarball is 59 MB, which is what is downloaded over the Internet. That’s different from the 188 MB uncompressed root filesystem, which has been quoted a number of times in the press.”
    1. That’s still ~30x Alpine’s download size!
    2. Alpine Linux has, at the time of this writing, 5 different release tags on DockerHub and 2 floating tags. Ubuntu has 69 release tags and – I am not making this up – 25 floating tags. This may not be a totally fair comparison: The Alpine Linux folks might be treating their release tags as floating tags. That said, even if they are it’s a lot easier for me to deal with legal compliance, security maintenance, and disk bloat with 7 lines of development than 25.
  3. “The real magic of Docker is such that you only ever download that base image, one time! And you only store one copy of the uncompressed root filesystem on your disk! Just once, sudo docker pull ubuntu, on your laptop at home or work, and then launch thousands of images at a coffee shop or airport lounge with its spotty wifi. Build derivative images, FROM ubuntu, etc. and you only ever store the incremental differences.”
    1. Is Mr. Kirkland under the assumption that Docker is only used on laptops?
    2. That download-reuse is great – right up until you’re spinning up an instance and the current release of your developers’ application uses different base image layers than the ones baked into the machine image. Suddenly that fleet of 50 spot instances you spun up are gonna spend a decidedly non-trivial amount of time just getting to running code again. Or your application cluster is just gonna have to struggle with that spike in load while new instances take considerably longer to spin up.
    3. This problem tends to be amplified – a lot – in context of private registries as I noted earlier.
  4. “Importantly, that includes a dedicated security team that has an outstanding track record over the last 12 years, keeping Ubuntu servers, clouds, desktops, laptops, tablets, and phones up-to-date and protected against the latest security vulnerabilities. I don’t know personally Natanael, but I’m intimately aware of what a spectacular amount of work it is to maintain and secure an OS distribution, as it makes its way into enterprise and production deployments. Good luck!”
    1. This is just plain FUD. While it’s true Canonical has a ton of expertise and talent in this area, it’s due in no small part to the fact that they need to have a ton of labor available because of the immense amount of code they are curating. As a rule, less code means less things that need to be kept up to date and fewer possible interactions. Further, both Canonical and the Alpine project benefit in very large part from the security work of upstream application owners. Security work for a distro really boils down to three things:
      1. Staying on top of, and integrating security updates from upstream sources – while validating the fixes don’t break the applications or assumptions elsewhere in the system about those applications. This one is, perhaps, a wash – fewer packages means less to track and integrate, which means less labor is needed.
      2. Fixing security issues that result from how the system is integrated (SELinux configuration, for example). This is primarily an issue for host OSs, not containerized userlands. Or rather, it shouldn’t be but given how much of Ubuntu is bundled into their image, it could very well be a significant source of risk that is virtually non-existent for Alpine.
      3. Authoring original security patches when upstream patches aren’t available. This is the only area where Canonical may in fact have a real edge, but anything they author is also available to the Alpine project – albeit briefly delayed.
  5. “There are currently 5,854 packages available via apk in Alpine Linux (sudo docker run alpine apk search -v). There are 8,862 packages in Ubuntu Main (officially supported by Canonical), and 53,150 binary packages across all of Ubuntu Main, Universe, Restricted, and Multiverse, supported by the greater Ubuntu community. ”
    1. Prefab package count is vastly less important for base container images than it is for a host OS: I can more easily do a source install of a thing I need that isn’t available in package form. As I will tend to have a single logical application in a given image, I have far less to worry about due to build-time complexities caused by the needs of other components of the application. Frankly, I only really care that I have a decent set of build tools available via package.

Ultimately though, Kirkland’s piece is a desperate attempt to keep people building images atop the Ubuntu base image – to keep it relevant.

Bottom line: Ubuntu has been dragging Docker down. Now, everyone knows it, and how to fix it.

devops, security 1489 words, est. time: 297 seconds.

« Monkey Bread Pitfalls of a Monorepo »


Copyright © 2021 - Jon Frisby - Powered by Octopress