An appeal for simple infrastructure architectures

It’s two in the morning. Your phone has vibrated itself off the nightstand — dozens of notifications, none of them particularly helpful.

You stumble into the living room, stubbing your toe on the couch. You curse and hobble over to where you left your laptop.

It wakes from sleep mode. You login, pull up monitoring. The app is down. Everything is down. The world is on fire and you have no idea where to start throwing water.

Past you screwed present you. Past you thought it would be neat to use containers in production. Past you thought it would be awesome to build a 1000-piece delivery pipeline. Past you bolted every tool you could find onto the world’s dumbest sysadmin robot.

Past you forgot to keep it simple, stupid.


I keep seeing charts of “DevOps” tools that look like the linnaean tree of life.

While having dozens of options for each step of a config management or a CI/CD pipeline is awesome, it doesn’t come without issues. These aren’t the fault of the tools themselves, but how they are used.

As it’s become code and tool sets have matured, infrastructure (particularly delivery-focused infra) has been kneecapped by the same problems that plague code, dependency hell being one. Ops engineers tend to forget that every piece added is not one dependency, but a family of dependencies. It’s turtles all the way down, and soon you will find yourself troubleshooting into infinity.

Often the push back to use more tools is that the current tool isn’t “great” at something. This is usually true, but is it good enough? Nine times out of ten, using one “not great” tool is going to be less of a headache than adding another tool that you’ll have to monitor, troubleshoot, and manage.

Some attempt to build “best of breed” solutions, which is misguided to begin with. There is no “best of breed” for CI/CD or reliability engineering. “Best” is what works for your particular apps and what you can count on to not lose its mind in the middle of the night.

Search Google and you will find architectures that leverage 500 different tools to get code from commit to live production and keep the app running.

You’ll see Puppet stacked on Chef, stacked on Docker, stacked on Kubernetes (I have seriously seen this in wild.), via CloudFormation templates generated by Troposphere, being fed by Jenkins, Artifactory, and Subversion, stacked on Rundeck, stacked on ServiceNow, plugged into countless other things.

This is insane. Please don’t mirror these architectures. They are the fever dreams of engineers who build automation for the sake of automation. This is how people end up designing in thousands of hidden failure points while trying to stamp out single points of failure.

Most of all, it is infrastructure and deployment that is not reliable, which counters the main goal of building automation in the first place. There are too many moving pieces, too many opportunities for something to go wrong. It’s also usually too complex for any one person to really understand.

Infrastructure-as-code also inherited a tendency towards over-abstraction. It’s one thing to use IaaS or PaaS, another to build black-box abstractions that you have to support on your own. There are trade-offs, of course, but they’re rarely considered when there’s too strong a focus on “neat” or “novel”. Containers (which have some excellent use cases) are a good example:

To be reliable, your tool set and configurations have to be, not necessarily simple, but as simple as possible.

The Space Shuttles weren’t simple, they required multiple layers of redundancy and had inherent complexity, but I guarantee you NASA engineers weren’t adding extra sprockets just because they read a blog about them one time.

An as-simple-as-possible solution might look like… just Jenkins. It might be Ansible, Jenkins, and CodeDeploy. It might be 10 well-justified tools, but it certainly isn’t 50.

Any pride from an architecture you’ve designed should come from how little you use, not how much. Building simple is hard, way harder than leveraging all the shortcuts that layering tools on top of each other provides. But, unless you want to build monsters that wake you up in the middle of the night, simplicity is required.