X-47 Industries

Exploring tools and infrastructure for systems administration.

Driving Docker Safely

Arguably, the golden age of car design was the 1950s and 1960s. Tail-finned automobiles were cast as rocket-ships, blasting us boldly into the future. And this was fine, as long as that future had no unforeseen, unplanned incidents. “Unsafe at Any Speed” by Ralph Nader painted the gruesome picture of an industry that failed to provide adequate safety mechanisms.

Docker offers a similar promise today: wrap up your app so you can accelerate from the starting line, going from 1 to 100 instances with little more than a nudge on the throttle; easily shift gears from the development on-ramp to the fast lanes of production. That’s a nice enough premise, if you’re only worried about getting from point A to point B. However, this picture fails to contend with a host of operational issues. Let’s look at three:

Measuring application performance will be an ongoing challenge. This is akin to needing to know at what speed your car is travelling. Container frameworks, like Mesosphere and Kubernetes, give readings of resource utilization across the fleet, but they cannot answer the question of how many queries-per-second the application is serving. For web applications, some of this can be done at the load-balancer level, but beware, for it’s not safe to assume that instances are operating uniformly well.

Logging is another concern. Whether for compliance reasons or for later troubleshooting, the ephemeral ideal of containers spinning-up and spinning-down on demand exacerbate the challenges of log collection.

Debugging is probably the hardest challenge. Some insight into application internals is provided with basic monitoring and logging. But that insight is only available to the extent that the designers anticipated as necessary; all other application state is lost. Likewise, because of the minimized nature of the container environment, tooling for ad-hoc investigations are often missing.

When installing applications into a generic operating system environment, default choices for most of these challenges exist: syslog, sysstat, tcpdump, and top all offer some ability to address these concerns. However, in reimagining our infrastructure as containers, we have to reinvent the wheel. Some tools already enable these capabilities: Sysdig Cloud offers container monitoring and alerting along with trace-driven troubleshooting; Gitlab is working hard to provide an application lifecycle approach to containers, starting with code development and orchestration, and in future releases integrating Prometheus monitoring into deployed containers.

With Nader’s book, the ensuing legislation and oversight; all lead to safer transportation. Similarly, in delivering containers, let’s not only meet the goal of delivering them quickly and efficiently, but let us operate them in an easier, safer, more sustainable way.