X-47 Industries

Exploring tools and infrastructure for systems administration.

Driving Docker Safely

Arguably, the golden age of car design was the 1950s and 1960s. Tail-finned automobiles were cast as rocket-ships, blasting us boldly into the future. And this was fine, as long as that future had no unforeseen, unplanned incidents. “Unsafe at Any Speed” by Ralph Nader painted the gruesome picture of an industry that failed to provide adequate safety mechanisms.

Docker offers a similar promise today: wrap up your app so you can accelerate from the starting line, going from 1 to 100 instances with little more than a nudge on the throttle; easily shift gears from the development on-ramp to the fast lanes of production. That’s a nice enough premise, if you’re only worried about getting from point A to point B. However, this picture fails to contend with a host of operational issues. Let’s look at three:

Measuring application performance will be an ongoing challenge. This is akin to needing to know at what speed your car is travelling. Container frameworks, like Mesosphere and Kubernetes, give readings of resource utilization across the fleet, but they cannot answer the question of how many queries-per-second the application is serving. For web applications, some of this can be done at the load-balancer level, but beware, for it’s not safe to assume that instances are operating uniformly well.

Logging is another concern. Whether for compliance reasons or for later troubleshooting, the ephemeral ideal of containers spinning-up and spinning-down on demand exacerbate the challenges of log collection.

Debugging is probably the hardest challenge. Some insight into application internals is provided with basic monitoring and logging. But that insight is only available to the extent that the designers anticipated as necessary; all other application state is lost. Likewise, because of the minimized nature of the container environment, tooling for ad-hoc investigations are often missing.

When installing applications into a generic operating system environment, default choices for most of these challenges exist: syslog, sysstat, tcpdump, and top all offer some ability to address these concerns. However, in reimagining our infrastructure as containers, we have to reinvent the wheel. Some tools already enable these capabilities: Sysdig Cloud offers container monitoring and alerting along with trace-driven troubleshooting; Gitlab is working hard to provide an application lifecycle approach to containers, starting with code development and orchestration, and in future releases integrating Prometheus monitoring into deployed containers.

With Nader’s book, the ensuing legislation and oversight; all lead to safer transportation. Similarly, in delivering containers, let’s not only meet the goal of delivering them quickly and efficiently, but let us operate them in an easier, safer, more sustainable way.

Json, JQ and Gron

Json has become a lingua franca. The modern world of api-driven, -as-a-Service platforms put json front and center, using it as a default structure for passing around large data objects. If, like me, you live in unix-land full-time, you may have been able to ignore it. The AWS cli, in particular, gives options for ‘text’ and ‘table’ in the ‘–option’ flag; and these go a long way toward delaying the need to read json directly, but the day will come and what’s a unix-speaker to do?

JQ is the common tool for this. It follows the unix precept of small tool, doing one thing well, that is to be chained with other tools. It has its own language for specifying what part of the json structure to parse, and that language itself echoes unix’s step-wise, composable nature. For example, the command jq -r ‘.[] | .name’ listing.json; here .name is simply the attribute we want to read from whatever’s stored in listing.json. While I do think it’s a pretty easy tool to get started with, there are better sources to walk you through the in’s and outs of jq: JQ is sed for json; Bash that json.

The challenge I find in working with large, highly nested json objects is that it’s very easy to get lost. When the nesting is five or six levels deep, and when screens of data have paged by, all the pretty-printing in the world doesn’t help me keep track of where I’m at. In those instances, I reach for Gron. While it’s stated purpose is to make json greppable (and it does a fine job of that), I appreciate it more because it describes the structure of the json. For example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
$ gron flintstones.json
json = [];
json[0] = {};
json[0].name = “fred”;
json[0].surname = “flintsone”;
json[1] = {};
json[1].name = “wilma”;
json[1].surname = “flintstone”;
json[2] = {};
json[2].name = “barney”;
json[2].surname = “rubble”;
json[3] = {};
json[3].name = “betty”;
json[3].surname = “rubble”;

Here, we have a json array with datapoints, arrays of names and surnames. In a trivial example like this, it’s not hard to write to write the jq filter ‘.[].name’ to list just the first names. However, with gron, I can look at any of the characters, and see in an instant, how the structure is nested. Writing the needed jq is almost cut-and-paste.

Hopefully, you are beginning to see how json can work alongside everything else in the world of unix. JQ is a powerful, accessible tool for manipulating and parsing json. Gron, too, is helpful; especially if you’re getting lost in highly nested json. These tools quickly help you wrangle the gnarliest of json, bringing it into the realm of your tried and true unix toolset.

The Wide World of Tools

There’s a long list of technologies operations staff have to know. Networking, storage, web-servers, email, DNS, kernels, monitoring: this list is far from complete. This is a formidable list, and enough to keep anyone busy, but, let’s take a moment to look in some other corners; there, we’ll find other topics that will well reward the investment of study.

You probably already know a programming language (more likely, multiple languages) but why invest time here? Automation is a great reason to learn Bash, Powershell, or Python. Being able to encapsulate common remediation steps into a repeatable formula means never having to worry about a problem again. Atop that, serious study of a programming language will reward you with some insight into how larger applications are constructed. This aids you when applications inevitably fail, and you’re the one who has to figure out why.

Pay attention to shell you’re in and the tools it offers. In linux, a knowledge of readline will have you more efficiently navigating the command line. Vi, with its near universal availability, offers powerful editing capabilities. Finally, openssh, is not just a mechanism for reaching servers, it offers interesting and fun capabilities for accessing remote ports and redirecting traffic.

Finally, pay attention to diagnostics. The chaos of a broken production system is not the best time for learning how things work. You’re unlikely to get a clear picture, and too, you’re often motivated to fix things as quickly as possible. Rather, explore diagnostic tools like lsof and others before the fact; learn the wide range in which they can be used.

We started off acknowledging a formidable list of concepts to know. If I’ve done my job, you’ve added an item or two to follow-up on. However, a word of warning before you do: treat these all as seasonings — add a little at a time. Vi, itself is Turing complete, so don’t try and learn it all at once. Start with one or two little snippets, and every other week or so, learn one or two more. This cuts down on the overload, but also deepens the learning.

There’s plenty going on in an admin’s life. In the hustle and bustle, it’s easy to just focus on the task directly in front of you. However, periodically take a moment to look up, investigate the wider system you habitat; getting comfortable with these other tools makes life easier.

What’s in an Editor?

I’m young enough that I missed the editor wars. I’ve heard the tale of the righteous crusades where mobs rallied around banners of Emacs or Vi and set aflame large swaths of Usenet. Assuming they weren’t just curmudgeons fighting for irascibility’s sake alone; why would anyone get worked up about a text editor? I imagine three main reasons.

First, there is the accomplishment. These tools have many esoteric and cryptic key sequences that, by themselves, sound like nonsense, but for the advanced user represent powerful incantations. Mastery of these sequences is hard-won knowledge, built over years. There is pride in having learned it; even the master once opened VI and had no idea how to exit the editor. Now, they think nothing of the issuing a Vi key sequence such as ”bct)<ctrl-r>a.

Secondly, a powerful editor allows an expressiveness and precision when working text. While commands for these editors may sound like arcane utterances, these commands represent an ability to fluidly manipulate text. This skill is especially important because editing code is not writing an essay. Your high-school English teacher may have marked up your whole essay in red-ink, but a compiler is more finicky yet, dying with the code ID-10-T on the first error and insisting you fix it before telling you about any remaining errors. In such an environment, making effective changes saves hours of later frustration.

These days, we’re used to talking about computer commands, but at one point ‘command’ was just a neologism chosen because it was descriptive. Literally, we command the program to do something: delete a word, move text, &c. These strings of characters are the command language of the editor. While ”bct)<ctrl-r>a looks rather cryptic it instructs the editor to cut-paste text, with the added side effect of saving the string thus replaced. Being a language, this sequence is composed of several parts, each part adding something to the overall command. Understanding this command language has the programmer fluently editing code, transcending the mechanics of any particular key sequence. And this is my final point: these programs have idioms, that once mastered, reward people that thinking in a certain way. For once they think in this way, they can create arbitrary sequences to perform new actions at will.

Accomplishment, precision, and composable, idiomatic language: this is a heady mix. These combine giving a strong sense of power over the editor. And isn’t that what has caused most wars, disagreements over who wields power?

Deciphering DevOps

I had the opportunity, recently, to attend AutomaCon, an exciting new conference dedicated to all things automation. It was three days of sharing and discussion around systems automation. Attending, one question that floated in the back of my mind was: what is DevOps?

I don’t subscribe to the school of thought that DevOps is a job title. Likewise, devops can’t be distilled to a set of tools (though certainly its practitioners espouse configuration management, continuous delivery and performance-based monitoring). To say that it is a job title, or a set of tools, divorces it from the true change it’s trying to accomplish.

“Empathy” is one distillation for the goal of DevOps. This ideal comes from bridging the divide between developer and operations staff to make sure software delivery is a harmonious process. It’s a fine goal; noble in intent. However, I think it fails in providing helpful, practical direction. Witness: though I lead with one word, it took this paragraph’s second sentence — twenty-one words — to explain how it relates to our industry’s technical endeavors.

Rather, I prefer this distillation: DevOps seeks to reduce the penalty of failure to as near minimum as possible. Why is this important? Developers represent innovation; operations staff represent stability — usually these are at odds. By acknowledging each and building a system that copes with the inevitable tension, we enable each to perform better.

Buzzwords abound in tech. Our job as technologists is to sift the useful from the distracting and apply them to our benefit. DevOps is a revolution in our industry, but making it relatable and practical is still a challenge. I’ve shared a definition of DevOps that’s useful to me; what, in DevOps, is useful to you?

Cascadia & Test Driven Systems Administration

There’s a spirit of adventure at conferences. It’s putting on a pith helmet and encountering big, hairy beasts as strange, new industry problems are described. It’s foraging deep into the jungle of new ideas and bringing back delicious, new fruits. It’s discovering new friends and allies, similarly interested in mapping the trails.

Seattle hosts Cascadia IT Conference, and poetry aside, my experience there was fruitful. It’s a journey I’ve made twice now, and each has been rewarding. This year the big idea I bagged was offered up by Paul English: Test Driven Systems Administration.

Developers offer us new code to deploy, yet we refuse to host it unless it passes certain qualifications. As systems administrators, we want to know it operates to a minimum standard, and is free of previously identified bugs. Yet, what similar discipline do we practice demonstrating we hold ourselves to the same standard? This the first reason we should operate our systems with testing in mind: uniform standards of quality.

More importantly, testing should give us peace of mind. How often do we take the offered code and just implement it, trusting that we’ve logged in to the right server, or that dependencies are in place? Testing reduces failures. It’s a lesson we should have learned from The Checklist Manifesto; this is a much-needed reminder.

The most intriguing advantage from the testing world is the freedom to refactor. Unless we specify a minimum operating environment (e.g. services on ports, page availability, or even, requests per second), how do we guarantee that the changes we make do not harm the production environment? And certainly, better admins verify that the website is still up after a change, but an automated test takes the drudgery out of this.

To most shops today these practices seem a fabulous, outrageous tale, but let us return from the wilds with these better practices that, to me, resonate with common sense. I’m glad to have visited the Cascadia IT Conference this year, to find these words, and I look forward to returning next year.

Sysadmins Play Defense

On my drive in this morning, a couple of recent conversations merged together into one startling, succinct fact: sysadmins play defense. Given our desire for high availability, and redundancy, we have to. How else are we going to save the users from themselves? But we also do this in management, and this, I feel, is a mistake.

Let me note two particular cases: money and reputation.

A good friend recently said to me, “Jess, saving the company money, is a limited proposition. There’s a bound to how much you can save.” He went on to point out that owners, CEOs and Presidents want to hear ideas that generate growth; done right, growth potential is unlimited. I’ve heard more than one gripe from fellow sysadmins about how money is being wasted; given the cost of our equipment, it’s right to invest capital prudently. However, to make management salivate, let’s illustrate to them how our tech choices can actually grow the business and offer new capabilities.

Reputation is a precious commodity. As sysadmins we are custodians of the email and of the network. Every outage tarnishes; minor outages may just dull the gleam, but major outages will blacken it with ash. Likewise, good reputation is earned around the water cooler or over lunch; those little bits of shared humanity that allow our colleagues to see us and to enjoy our company. A defensive attitude focuses on minimizing outages so that good reputation is never completely depleted. The alternate strategy, turning to offense, is to check on colleagues and make sure they’re doing ok; ask if they have issues and find easy solutions. Likewise, engage with other departments, listen to their pain and work on addressing their needs.

It’s easy to get bogged down in the day to day cares of machines. As we do, we cede initiative and let the conversation slip out of our hands. When we do talk, we become a curmudgeon in rags, standing at a busy city street corner, holding a sign that reads, “The end is nigh.” Let’s not be that guy. Let’s choose to be the friendly neighbor that has tools to borrow; that has advice on how to get a greener lawn, that pitches in when you need help replacing the roof. Let’s play offense, rather than defense.

A Look at Server Provisioning

At work, I’ve had the opportunity to look at Cobbler. It’s magical software that makes great ease of repository management and server deployment. It offers the latter by providing a webUI for managing a kickstart server. The result? Your server install experience should be: rack hardware, run cables, work with the network team for appropriate access and then boot the machine. You’ll then be greeted with a kickstart menu to pick the appropriate OS to install. Your kickstart file will run, leaving you wherever it finishes. At some point, we’ll then want to tie the newly built system into configuration management.

For me, I don’t have hostnames or mac addresses automatically generated/harvested. I see these as convenient unique identifiers, that if I had, I could use to automate OS selection and configuration management install. Cobbler works with neither of these. With effort, kickstart could be made to interpret these, but then you’re off building your own infrastructure, nurturing careful pieces of kickstart to work with an external database to appropriately build systems. Automation is our goal, and so building this kickstart-level magic would be a noble effort if ready-built systems are not already in place.

TheForeman is one such system. It is a webUI for provisioning and managing systems, but it does so with one big assumption: puppet. The software is really focused on complete lifecycle management, and it does it well. If you can accept this integration, then our chain is complete: power on a box, give it some time, and then log into a fully configured system. (Though with everything installed and configured, why log in?)

Also available is Razor. This also comes with strings into puppet; literally, as its a joint project from Puppet Labs and EMC. It provides your running puppetmaster a way to instance server creation. It’s all command line, and meant more as an API to automate resource provisioning. This is all well and good; I like tools that do one thing, and do them well. That said, I’m going to let this lie for a little bit. Ultimately my end-goal for automation is a set of buttons I can turn over to non-sysadmins so they can provision new resources without me. Quite a bit of work is needed to leverage Razor for this.

Both these tools are from the puppet camp. I’m ignorant of tools for other configuration management platforms.

The challenge of automated provisioning is keenly felt in the cloud arena. Cloudstack breaks this problem in twain: orchestration and provisioning. My understanding comes from the slides at Alex Huang’s presentation. Orchestration seems to cover all the coordinatingevents to allow a machine to work (storage assignment, network assignments, &c). Provisioning the actual machine seems to be pushed down to the hypervisor. There are accomodations for bare metal machines, so using this as the base of a deploy process is feasible. However, you’re left with your configuration management platform reaching out and using the cloudstack API.

There’s more research to be done here, obviously; but I wanted to get my understanding down. As well, there doesn’t seem to be much discussion about specific tools for this need. I’m left to believe each cloud platform and each configuration management product is doing it all for itself. That’s a shame, as it doesn’t allow for interoperability or code reuse.

Build Your Own Repos

Repos are good. They elevate civilized operating systems above the neolithic systems that expect you to hunt down software and its dependenies, club them on their heads and then drag them back to your server for the final install. Truly, we’re fortunate to beyond such primitive practices.

But what happens if you’re behind a firewall, isolated to what you can dig from out the earth? What happens if you’ve limited bandwidth, and wish to feed a large population of servers, each hungry for the latest and great software updates?

Rsync is the traditional tool, and it does a grand job. It performs intelligently, analyzing the remote source and the local copy, picking out only the differences; thus it is bandwidth efficient. Also because of this, it can resume if interupted. Finally, passed the right options, it’ll cache all its updates until done, so that the repo remains consistent until the everything is in place (--delay-updates).

Unfortuneately, there are times when the remote repo you wish to mirror doesn’t offer rysnc access. The yum-utils rpm package already has a solution to this: reposync. It’s a straight-forward little utility, that’s easy to use. It’s smart about it’s work, detecting when local copies already exist. But coming from yum-utils, it carries some assumptions with it. These assumptions manifest as quirks we’ll have to learn to live with.

Assumption one: it takes a repoid as source. As such, the local system you’re operating from will need to have the repo installed. This assumption bothers me for two reasons. First, hygiene: what if you want to mirror a repo on a system that doesn’t need the software? Yes, the repo can be set to disabled, but still, I like to keep things clean; I’d much rather not install it in the first place. Second, portability: what if you’re constructing a centralized repo server in a heterogenous environment? What if I wanted to use ubuntu as the base OS? Yeah, I could probably mess around with alien to get the software installed, but that’s a lot of effort for an operation that should be relatively simple.

This second point, is probably more my assumption than the utility’s. With a name like reposync, it seems that given a repo it should output a repo. It does not. It dumps to a directory; and so we’ve got a little bit of prep work to do. As such, yum install createrepo.

So what did I do with all of the above? I generated a local copy of VMware’s OS Specific Packages.

Cheat Sheet:

rpm -ivh http://packages.vmware.com/tools/esx/latest/repos/vmware-tools-repo-RHEL6-9.0.5-1.el6.x86_64.rpm
reposync -r vmware-tools-collection -p /var/www/html/vmware/
createrepo /var/www/html/vmware/vmware-tools-collection/
cp /etc/yum.repos.d/vmware-osps.repo /var/www/html/yum.repos.d/vmware-osps-local.repo
vi vmware-osps-local.repo

As a last step, create a cron job to pull in updates.

Can You Ever Go Home Again?

Terminal sessions are easy enough to generate if you’re in a GUI, but pretend for a moment you’re on a server; no X. You’ve SSH’ed a couple servers away from your starting point before you realize that you need to attend to something on that first server.

SSH gives you two base options. They’re accessible through escape characters. By default, this is ~, but you can customize it should you find it interferes with some other key combination.

Our first option is to suspend the ssh session entirely. Hit ~^Z. That’s two sequences, a ‘~’ + ‘<ctrl>-Z’. This backgrounds the session; at this point, you can control it as you would any other job: fg, bg, jobs -l. Thus, when done with whatever commands needed attending, simply return to the ssh session with fg.

The second option relies on the command mode of SSH. Hit ~C. You’re presented with an ssh comand line prompt. From here, you can invoke commands on the local machine; simply preceed the command with !. For example:

1
2
ssh> !hostname -f
startingpoint.example.com

If you can’t get your business done in a one-liner, try: !bash. You’re now in a complete subshell on the original host. Do what needs doing and when you’ve completed your business, exit as normal and you’re back inside the ssh process and at the remote host.

The premise of this exercise suggested we were a couple servers away. The first SSH session will hold on to the ~, making communicating with subsequent shells arduous. You can hit ~~, which will cause the first shell to pass the character on through to the next machine. To pass on a tilde to the third shell in line, you’d have to hit ~~~~. This quickly becomes impractical. Consider using screen or tmux.

And that’s it, you can go home again; if just to visit.