An appeal for simple infrastructure architectures

It’s two in the morning. Your phone has vibrated itself off the nightstand — dozens of notifications, none of them particularly helpful.

You stumble into the living room, stubbing your toe on the couch. You curse and hobble over to where you left your laptop.

It wakes from sleep mode. You login, pull up monitoring. The app is down. Everything is down. The world is on fire and you have no idea where to start throwing water.

Past you screwed present you. Past you thought it would be neat to use containers in production. Past you thought it would be awesome to build a 1000-piece delivery pipeline. Past you bolted every tool you could find onto the world’s dumbest sysadmin robot.

Past you forgot to keep it simple, stupid.

I keep seeing charts of “DevOps” tools that look like the linnaean tree of life.

While having dozens of options for each step of a config management or a CI/CD pipeline is awesome, it doesn’t come without issues. These aren’t the fault of the tools themselves, but how they are used.

As it’s become code and tool sets have matured, infrastructure (particularly delivery-focused infra) has been kneecapped by the same problems that plague code, dependency hell being one. Ops engineers tend to forget that every piece added is not one dependency, but a family of dependencies. It’s turtles all the way down, and soon you will find yourself troubleshooting into infinity.

Often the push back to use more tools is that the current tool isn’t “great” at something. This is usually true, but is it good enough? Nine times out of ten, using one “not great” tool is going to be less of a headache than adding another tool that you’ll have to monitor, troubleshoot, and manage.

Some attempt to build “best of breed” solutions, which is misguided to begin with. There is no “best of breed” for CI/CD or reliability engineering. “Best” is what works for your particular apps and what you can count on to not lose its mind in the middle of the night.

Search Google and you will find architectures that leverage 500 different tools to get code from commit to live production and keep the app running.

You’ll see Puppet stacked on Chef, stacked on Docker, stacked on Kubernetes (I have seriously seen this in wild.), via CloudFormation templates generated by Troposphere, being fed by Jenkins, Artifactory, and Subversion, stacked on Rundeck, stacked on ServiceNow, plugged into countless other things.

This is insane. Please don’t mirror these architectures. They are the fever dreams of engineers who build automation for the sake of automation. This is how people end up designing in thousands of hidden failure points while trying to stamp out single points of failure.

Most of all, it is infrastructure and deployment that is not reliable, which counters the main goal of building automation in the first place. There are too many moving pieces, too many opportunities for something to go wrong. It’s also usually too complex for any one person to really understand.

Infrastructure-as-code also inherited a tendency towards over-abstraction. It’s one thing to use IaaS or PaaS, another to build black-box abstractions that you have to support on your own. There are trade-offs, of course, but they’re rarely considered when there’s too strong a focus on “neat” or “novel”. Containers (which have some excellent use cases) are a good example:

To be reliable, your tool set and configurations have to be, not necessarily simple, but as simple as possible.

The Space Shuttles weren’t simple, they required multiple layers of redundancy and had inherent complexity, but I guarantee you NASA engineers weren’t adding extra sprockets just because they read a blog about them one time.

An as-simple-as-possible solution might look like… just Jenkins. It might be Ansible, Jenkins, and CodeDeploy. It might be 10 well-justified tools, but it certainly isn’t 50.

Any pride from an architecture you’ve designed should come from how little you use, not how much. Building simple is hard, way harder than leveraging all the shortcuts that layering tools on top of each other provides. But, unless you want to build monsters that wake you up in the middle of the night, simplicity is required.

The Dreyfus Model of Ops Engineering

You spend the first part of your career implementing simple designs, because that’s all you know how to do. It’s what you learned on a blog. It’s how the senior engineer taught you.

You get frustrated by how long it takes you to do stuff that others around you are flying through. You feel like you’re drawing in crayon.

Then, as you learn, you get faster and bolt on complexity. Checkbox here, change from the default there. You’re getting the hang of this.

You start to feel confident, bordering on cocky. Your diagrams have more lines and boxes. Soon, you’re teaching others and talking about the bleeding edge of the field.

Look at you, you genius, you’re building technology for the ages, even though the old guy in the corner of the room keeps saying “This seems a little complex…” Screw him. He doesn’t understand your brilliance.

Then, maybe, your world slowly gets bigger. Maybe you aren’t so brilliant after all. You start thinking more about consequences and downstream effects, margins and trade-offs. You see your Rube Goldberg machines floating in a sea of chaos.

Your approach changes again.

Maybe one less box would be OK. Maybe we don’t need that line. Your designs begin to look more like the ones you started your career with. You solve for 80% and push back on things that shouldn’t be fixed with technology instead of saying “yes” just so you can come up with something impressive.

You spend more time removing things from your Visio drawings than adding them.

You nod your head and smile when someone draws their precious maze on a white board. You sit in the corner of the room and occasionally ask “Do you think this might be a little too complex?”

Required reading

3 Questions to ask before automating

If you’re a DevOps engineer who is constantly fighting fires and trying to keep your head down, it’s easy to get stuck in a rut of “You ask, I build”. Heck, it’s easy to think that way even when everything is running smoothly.

But that’s not what the job should be, or at least what the job can be. Not asking the right questions leads to more firefighting down the road and smooth operations devolving into mediocrity.

If you have a desire to be better at your job and build truly helpful, effective tools, before any code is written, there are (at least!) three questions that must be answered.

1. Are we solving the right problem?

Unfortunately, this question almost never gets asked and when it is asked, the answer is often “no”. A Dev or PM will send an email that says “We need to automate X.” and the DevOps engineer will asks a couple of technical questions before starting the build.

If you’re a DevOps engineer and you aren’t asking questions about process and goals, you are a bit pusher, not an engineer. You have unique insight into how the systems you work on interoperate and feed one another, you think in terms of systems and dataflows— you have a contribution to make that’s more than writing playbooks and scripts.

If the teams you’re working with aren’t volunteering their goals, you need to ask for them. “What are you trying to accomplish?” Having this context is critical if you want to provide real value. What you were asked to build is often not the right solution to the problem that needs solving.

Before you build tools, build relationships with the developers you work with and get in the habit of asking questions that aren’t just “what version of this plugin do you need?” Ask questions even when you’re offering feedback.“What if we did it this way? Would that work?”

2. Is building a tool the right solution?

If you’re confident in the process, the next question you should ask is “Is building a tool the solution to this problem?” In a lot of cases the answer is “yes”, but thousands of custom tech fixes have been built to solve problems that could have been solved more appropriately with an email, a Google form, or God forbid, two people actually talking to one another.

This is when it helps to get someone in the room who isn’t an engineer and can provide a sanity check against the “when you have a hammer, every problem looks like a nail” problem.

Being competent with Python or Ruby is not an excuse to try and solve every problem with a script. I don’t know how many discussions I’ve sat in on where people were arguing about how to build a custom tech fix and I wanted to scream “JUST HAVE THEM GO SIGN UP FOR A DOCUSIGN ACCOUNT! WE DON’T NEED TO SOLVE THIS PROBLEM WITH CODE!” or something similar.

Find someone who will call you on your BS and tell you when you’re being myopic. That person is your new best friend.

3. Who is this tool for?

Hopefully the answer to this question will come out of defining the process, but that’s not always the case.

If you find that you’re only building tools that you or your DevOps team can use, you have a problem. There are certainly circumstances where having an internal-only tool is appropriate, but the majority of what you build should be usable by the people you’re trying to help.

Don’t build a script that you run on your local machine to deploy servers that the developers ask for. Build a portal that allows the devs to deploy without your involvement. That local script may be a first step, but fight tooth and nail to remove you and your team as a dependency for getting something done.

Some engineers think they’re being helpful by responding to requests with “Sure, I’ll build that.” But if you’re doing the same thing over and over (deployments, updates, etc.) you are not being helpful. Likely, you are slowing everyone down and have the potential to bring the dev process to a complete halt.

If you want to be a SysAdmin, keep on keepin’ on. If you want to be a DevOps engineer, build tools that others can use, then get out of the way.

Originally posted on

Automation is not DevOps

A few years ago, if you would have asked me to define DevOps, my answer would have sounded something like “Mumble mumble automation, mumble, automation, mumble mumble, infrastructure-as-code, mumble mumble, strategery.”

Thinking that DevOps equated to automation had mostly to do with the fact that most of the DevOps people I talked to and articles I read really only spoke about automation with only indirect references to anything else.

Implementing automation is definitely a part of what it means to be practicing DevOps, but it’s maybe 5-10%.

The reason that so much of what’s been written about DevOps is focused on automation is that it’s easy to write and talk about tools. Relative to implementing a true DevOps practice, automation is the easy part.

What DevOps actually represents

I’ll use Gartner’s definition to provide a common basis:

DevOps represents a change in IT culture, focusing on rapid IT service delivery through the adoption of agile, lean practices in the context of a system-oriented approach. DevOps emphasizes people (and culture), and seeks to improve collaboration between operations and development teams. DevOps implementations utilize technology — especially automation tools that can leverage an increasingly programmable and dynamic infrastructure from a life cycle perspective. —

Notice that automation is the last thing mentioned. The main components of DevOps are people and process.

This is the hard stuff, the boring stuff, the stuff no one likes talking about because it involves dealing with people and managing work. But it’s also the stuff that gets shit done.

Automation is a force-multiplier, a lever. That’s all computers and software are in general — levers. Without people and process, automation is just a rusted-up socket-wrench— you can use it as a blunt instrument, but you’re not getting its full utility.

Automation alone can have an impact, but mostly in that it allows you to do stupid stuff faster.


At the heart of a successful DevOps practice is a culture that accepts failure, doesn’t focus blame, and demands collaboration and knowledge sharing.

If your team members are afraid of how their manager reacts when they make a mistake, they will not attempt new ways of doing things and the team and business will not move forward. Full stop.

Screwing up is a necessary part of learning and improving, but most companies have a culture that doesn’t allow even minor failure. I don’t know how many times I’ve heard executives earnestly say “Failure is not an option.” completely missing that they were driving their company into the ground as the industry changed and competitors sprinted past them, having embraced failure as an opportunity to learn and do something new.

Accepting failure doesn’t mean not having high expectations or accepting mediocrity. Accepting failure prevents mediocrity.

The expectation for a DevOps team member should be: “Mistakes happen, don’t try to hide your screw ups. Communicate what happened, work on fixing it, and most of all… learn from it.”

Accepting failure directly feeds into reducing the desire to place blame. Worrying about the repercussions of failure and working to deflect blame take away from doing actual work and fixing the original problem, aside from poisoning the work environment.

When co-workers trust that they’re not going to get thrown under the bus, they collaborate and are more productive. Playing the politics of avoiding and placing blame are distractions that need to be snuffed out. If you’re a manager and you allow your team to sit around pointing fingers at one another, shame on you. If you encourage it (and oh, God have I met those managers), I hope your house gets invaded by bees.

If your goal with implementing DevOps was to speed up delivery and become more agile, you have to aggressively remove roadblocks. Anything that stands in the way of the team collaborating with the business, developers, and each other needs to be bulldozed. Fear, politics, mistrust — gone.

It might feel like wrangling a kindergarten class, but you have to get your team to share. There is no such thing as “too busy to train” or “too busy to document”. If only one person knows how to do a thing, they become a bottleneck that can shutdown your DevOps factory.

Knowledge sharing has to be the expectation. If there is someone on your team who can’t be coached to not keep things secret, they need to work somewhere else.


You can have amazing automation and culture and still not be practicing DevOps, because practicing DevOps is almost entirely about process.

If you want to “do DevOps”, process is where you start and will give you a much higher return on investment than the automation that follows. Without process, there will be too much chaos to maintain a healthy culture and any tooling you build will likely be solving the wrong problems.

Start by getting visibility to the work your team is being asked to perform. This isn’t just for managers, the entire team needs insight into the work backlog, if for nothing else than to prevent duplication of work (“I already have a script for that.”) and provide time-saving context (“Doing X will cause Y to fail.”)

If there are team members who won’t share what they’re working on or only offer vague details, that has to end. Letting people slide by with “I’m working on server things…” during your daily standups (and yes, you should be doing standups), doesn’t fly. “I’m doing X, Y, and Z, today.” is the answer you’re looking for.

Without that transparency, work goes into a vortex of suck. If work is (or isn’t) being done, the team needs to know, because every unknown status compounds delays, reduces quality, and plants the seeds of distrust. A successful DevOps practice requires accountability, not for the sake of holding someone’s feet to the fire, but just so everyone knows WTF is going on and can plan accordingly.

All this requires commitment and continual reinforcement from teammates and management. There is no “We set up a kanban and no one used it so we stopped.” Failure is an option. Not knowing the answer is an option. Working ad-hoc and being lazy about process is NOT an option.

To summarize, if you are not managing work in an Agile way, using Scrum or something similar, you are not practicing DevOps.

It turns out that there are lots of great resources on the non-automation facets of DevOps. Two good places to start are The Phoenix Project and Effective DevOps. The reviews for Effective DevOps are particularly fun because most of the complaints are “the author didn’t talk about automation”, which means she got it right.

The trick when you’re doing research on DevOps is to not search for “DevOps”, because you’ll mostly get 1.) articles that are about automation, and 2.) “DevOps” job postings that are really just sysadmin jobs.

Instead, read about Agile and Lean. Read about Personal Kanban. Read things that make you a better person who can treat other humans with empathy. That’s what will make you a “DevOps Ninja”. Focusing entirely on automation just makes you a code monkey.

Originally posted on

A week with Puppet

Prior to the last week, I hadn’t done much with Puppet. Most of my config management experience is with Microsoft tools and Ansible.

Puppet was a contender the last time I was involved in picking a CM tool, but was ultimately ruled out. Compared to some of the newer CM tools, it felt clunky and, compared to Ansible specifically, the Puppet documentation sucks.

A week in, I can’t say that I’m a fan yet, but I’m starting to see some of Puppet’s strengths more clearly.

So far, the things I like:

Extensibility. It appears that you can integrate pretty much anything with Puppet (and that pretty much everything has been integrated with Puppet).

You don’t have to be a ruby expert to use it. Enough said.

Model-driven. This is personal preference. I get why people like procedural config, but I feel like I have to spend way more time figuring out what is going on in a Chef cookbook or SCCM/SCOM task sequence vs Puppet or Ansible.

ERB templates. None of the jinja2 crap that Ansible uses.

Some things I don’t like:

No stop on failure. If a step in your Ansible playbook throws an error, the whole playbook stops. I like this, it gives me more confidence that the end state has actually been achieved. I’m sure you can probably integrate something with Puppet to mirror this behavior, but straight out of the box if something errors, it just keeps rolling.

Random ordering. Ansible plays run from the top of the YAML doc down. Puppet just tries everything in random order unless you explicitly chain tasks together.

Sub-par cloud modules. Ansible’s modules for AWS and Azure are easier to use and seem more mature, which is odd considering how much older Puppet is. Defining and configuring a cloud stack in Ansible is more intuitive to me than what I’ve found with Puppet.

Sometimes hard to follow. As long as you’re just referencing facter data (Puppet’s inventory) or variables within Puppet manifests, it’s pretty simple to figure out what’s going on. Throw in Hiera, Puppet’s key/value DB, which may in turn be referencing other data sources and things start to get confusing.

If I was building something from scratch, I still think I’d use Ansible, but (again, only a week in) Puppet is starting to feel like a better option than it has in the past.

Reading things like Lyft’s experience with Puppet and moving away from it have dampened my expectations somewhat, but I’m hopeful I’ll find more to like than dislike as I get further along.

Originally published on

Three days, two tech conferences

It is 104 degrees, 120 on the sidewalk, but less humid than I am used to, which is nice.

As always, Las Vegas’ kaleidoscope of people is disorienting.

It’s one of the most interesting places in the world for people watching— desperate to prove Bill Langewiesche’s “You should not see the desert simply as some faraway place of little rain. There are many forms of thirst.”

I am always uncomfortable here.

I know myself well enough to know I can’t go straight into a convention and not experience psychic pain. If I just leap into it, the ‘peopling’ parts of my brain throw sparks and scream like twisted steel. So I practiced being social from the time I left my house.

I chatted up the airport employees, my Lyft driver, the hotel staff — everyone who presented me with an opportunity for dialog. It gets easier with each person, but never frictionless.

Part of it is my personality. Part of it is in reaction to the empty (often passive-aggressive) small talk of the South I grew up surrounded by. Part of it is a battle between curiosity and a desire to “mind my own business”, both of which have served me well.

By the time I’ve got my badge and swag bag, I can approximate the social skills of a normal, functioning adult. This trip I actually have two badges, because I am attending two conventions at the same time.

This is stupid. Never do this. It will leave you exhausted and hurting, even without following the Hunter S. Thompson event playbook.

I spend three days bouncing back and forth between Mandalay Bay and the Aria for VMWorld and Oktane, respectively.

Walking around VMWorld’s vendor floor and listening in on keynotes confirms a thought I had on the plane ride — this will be my last VMWorld, there’s nothing here for me anymore. That’s partially because of where I’m focusing my career (cloud) and partially because of VMware.

There are groups within VMware doing interesting things (or at least wanting to), but the company as a whole struggles to execute and is moving much too slowly (and randomly) to remain relevant. Their leadership communicates a new idea of “who VMware is” every year even as the company hasn’t meaningfully aligned to whatever identity they were supposed to be several years prior.

While Pat Gelsinger was telling his audience that the tipping point in enterprise cloud is still five years away, Google’s Diane Greene (one of the founders of VMware, ironically), was telling the Oktane crowd that the tipping point has already come.

Obviously they each have their reasons for spinning a specific vision of the market, but one of those visions is “come on everybody, it’s time to move”, the other is “we’ll catch up with you later”.

Watching other VMWorld attendees furiously take notes about news and technical concepts that would be quaint or old hat somewhere like AWS ReInvent seems to support Gelsinger’s take, that VMware is right in slowly building bridges to the future. But they may be building the wrong bridges.

With all the talk of VMware enabling customers to migrate their existing VMs to the cloud, I can’t shake the sense that VMware management either really doesn’t understand cloud or is hoping customers don’t.

Moving VMs from on-prem to cloud or between clouds isn’t a thing people should be doing. It’s OK as a short-term tactic, but migrating VMs is really just moving old problems and creating new ones; yet VMware seems to have focused a significant portion of their latest strategy around the idea.

At this point, it feels like VMware is throwing spaghetti at the wall and hoping the long tail of legacy tech lasts longer than anyone is expecting. This isn’t just a VMware problem (Look at the entire new DellEMC federation, for example.), but it does make me a little sad, because VMware had an opportunity to lead and be more than the shrinking funnel for hardware sales that they’ll become.

I spend most of my time at Oktane, talking to other customers and the more future-focused vendors there.

The first part of Oktane’s opening keynote runs long before they bring Malcolm Gladwell onstage, with what I assume is the hope that he will compress his talk into what remains of the keynote timeslot.

He does not. Malcolm Gladwell does not care about “only having 15 minutes left”. Malcolm Gladwell is honey badger, and provides the spectacle of watching hundreds of people who need to be somewhere else fidget and anxiously figure out what to do.

Gladwell gives a 30 min talk that leads with a description of childhood leukemia in the 1950s and the explosively hemorrhagic deaths of small children. In this moment I forgive him for his past half-baked theories.

Those extra 15 minutes have the effect of throwing the entire rest of the morning off.

A customer panel I am part of starts with the presenters scrambling to set up their A/V. Nothing works right, and one of the presenters starts the talk only to get flustered and abandon the podium, looking desperately at his co-presenter to save him.

I feel bad for both of them. Fortunately, the heckling is kept to a minimum.

This is the fun stuff you see at smaller conferences.

Okta does a good job of building on the vision they shared at last year’s Oktane, where you could see the rough shape of something coming together.

They want to be the glue that ties SaaS services together and extend their platform further into devices and infrastructure. It’s a good plan, and no one else is really executing on it in a similar way. There are API integration platforms(Mulesoft, Apigee, whatever) that let companies easily plug all their apps together, but Okta is doing it with identity.

They’re becoming the Active Directory of the cloud, which is impressive considering that Microsoft literally makes an Active Directory product for the cloud.

Where VMWorld felt like the past struggling to reach into the future, Oktanewas the future.

After three days of having to be “on”, I am worn out. I make a last sprint of being social on the car ride to the airport, spending what is left of my socializing fuel. I can’t imagine what the people running vendor booths must feel like after a week of feigning interest and pitching their product.

I used to think the point of going to conventions was to learn things. Then I started going to conventions and figured out there really wasn’t much there to learn outside of customer-led sessions.

It’s easy to wander from session to session, never engaging with anyone, but there’s little value in that. As much as I hate the concept of “networking” as it relates to people, it is necessary.

Establishing relationships with other customers gives you resources to help solve problems and get advice. Strengthening relationships with vendors helps you get things done, especially with the big vendors to which you are by default just an account number.

If it weren’t for forcing myself to be social I wouldn’t know as many escalation managers, product managers, and engineers as I do now. These relationships are invaluable, because they’re the people who can actually help you if you’re trying to get traction with a support ticket or feature request.

Meeting these people is what makes going to cities you don’t particularly like and getting out of your social comfort zone worth it. In many cases, these aren’t just relationships of utility either. You’ll meet a lot of legitimately interesting people doing interesting things. Some of them may even become friends.

Originally published on

Automation isn’t just for scale

A few years back, in a sidebar discussion at a tech conference, one of Netflix’s engineering managers asked me if I was using any automation tools at work.

I said, “Not really. It’s a small environment and we’re not delivering any web apps that require automation for scale.”

She gave me an amused/sympathetic look and replied, “Dealing with scale isn’t the only reason to automate things.” It wasn’t condescension; she was being kind — dropping some knowledge, but I didn’t know how to respond.

A little embarrassed, I mumbled some other excuses for why automation wasn’t a good fit, said ‘nice to meet you’, and wandered off.

I cycled through my excuses, trying to figure out if they were valid. Most of the automation and config management stuff I had used in the past had been imperative, task-sequence based stuff, like what you’d find in Microsoft System Center. When you have to do the “walk forward five steps, now extend left hand at 30 degrees, close fingers around peanut butter jar”- programming game for smaller, legacy environments, it definitely feels “not worth it.”

Days after, the conversation still bugged me. “Why do people automate their infra? Why, really?” Even after reading a ton of articles, blog posts, and whitepapers, I still couldn’t come up with anything that wasn’t ultimately a scale use-case.

I had confirmed my bias and probably would have stopped there in similar circumstances, but what the Netflix employee said had a feeling of truth that I couldn’t let go of. I kept digging.

In order to understand the benefits and justification for automation, I started automating things.

Turns out, that engineering manager had a gift for understatement.

Livestock, not pets

I grew up in a culture of IT where servers, even PCs, were treated as special snowflakes. It took a long time to reinstall Windows + drivers + software, so you did a lot of care, feeding, and troubleshooting to make sure you didn’t have to start over from scratch.

We named servers after hobbits and constellation. We got attached to them and treated each like a pet.

“Bilbo-01 just crashed?! NOOOOOOO!”

In some ways, virtualization worsened that philosophy. Things were more abstracted, but not enough to force a mindshift. You could now move your pet servers between different hardware, reducing the reasons you would have to rebuild a particular server. At great cost, effort, and risk (“You can never patch my preciousssss.”), there are businesses running VMs that are old enough to drive.

So we ended up with thousands of VMs running thousands of apps that were setup by people who have retired, switched jobs 10 times since, or stayed and now act like fancy wizards, holding their knowledge tight to their chest.

Automation is the documentation

Let’s tackle the issue of tribal and secret knowledge first.

A big component of DevOps (and the Lean concepts that inspire it) is identifying and removing bottlenecks. Sometimes those bottlenecks are people. This doesn’t mean you have to get rid of people, but you do need to (where possible) remove any one individual as a core dependency for getting something done.

“Bob is the only person who knows how to install that app.”

“Those are Jane’s servers, you’ll have to check with her.”

“We can’t change any of this because no one knows how it works.”

At the end of the day, this is a scale problem. It’s scaling your IT to be larger than one person. Part of the solution to this problem is cross-training, but automation can also help (and prevent future stupidity).

If you use a configuration tool like Ansible or Chef, the playbooks/cookbooks become the documentation for the environment. They detail dependencies, configuration changes, and service hooks that were realistically never going to be documented otherwise. If you’ve subscribed to a declarative model of automation, the playbooks not only detail what the app stack should look like— if they’re run again, they can enforce that the stack matches what’s in the playbook.

Change control

Things generally break because something changed. Maybe it’s a hardware or network failure. Maybe the software is buggy and there was a memory overrun or a cascading service failure. Maybe somebody touched something they shouldn’t have.

In olden times, a sysad would be tasked to troubleshoot the broken thing, wasting hours with Google searches and trial & error. Meanwhile, the app is down.

If you’re automating your infrastructure, that’s less of a thing. App stopped working? Re-run the playbook for the stack. Want to know why the app stopped working? Look at your run logs. Troubleshooting is still needed sometimes, but there is a lot less fire fighting when you can push a simple reset button to get things back up and running. Turn it off and on again.

For approved changes, automation requires that the changes be well defined, which is a big positive that helps everyone know what’s happening and what to expect.

This type of state enforcement could equally be considered a security measure. Some people schedule plays that run through app stacks and repair/report anything that doesn’t match the expected norm.

NO MORE (or maybe less) PATCHING!

Not everyone is able to get there, but having fully automated stacks often means you can do away with OS patching. Just rebuild the stack once a month with the newest patched OS image. Boom!

If you do have to patch, you can significantly reduce your patching and service confirmation work by building the patch installs, reboots, and health checks into your automation. This helps prevent the post-patch-night “My app doesn’t work.” emails.

Fewer backups

Even with de-dupe, I can’t imagine how many petabytes of backup data are made of up OS volumes and full VMs. If you’re automating deployment and config management, the scope of what you need to back up is greatly decreased (so is your time to recover).

You’ll really just be concerned with backing up application data. Other than that, you can make compute and the VMs your app runs on disposable. So you’ll just have to worry about having your playbooks with configs in version control and some method to backup databases and storage blobs.

This rolls into DR and failover as well. In many instances, automation will enable you to do away with failover systems. Depending on your SLAs, a recovery plan could be as simple as “re-run the playbook with a different datacenter target.”

Integration tests… for infrastructure

If you truly are treating your infrastructure as code, you can write unit and integration tests for it that go past “well, the server responds to ping”. You can also deploy into test environments very easily and run those environments more cheaply because of not having to maintain 1:1 infra full-time.

Turns out, if you make testing easier, people actually test things and you end up with better infrastructure.

This stuff is important

I get that none of these things feel very sexy, but in practice, they are game changing. As you start automating, you’ll discover that your infrastructure doesn’t work exactly like you thought it did, you’ll figure out what different apps actually need, and you’ll pull the weight of being the only person that knows something about a particular server/app off of your shoulders.

Some people like keeping secrets. They think being the only person who can do something gives them job security.

Those people are idiots. Maybe they will keep their job, but that’s not a good thing. They’ll never advance, never do anything more interesting than their current responsibilities.

Automating your infrastructure, opening up the secret knowledge to the entire team and doing away with the idea of being a hero who fights constant fires, is how you free yourself up to do better things. So build the robot, let it take over your job, and keep peeling all the layers of the onion to find work that’s more meaningful and interesting than installing patches, troubleshooting IIS, and getting griped at because “the server” is down.

You don’t have to work for a web company or be in the cloud to do this stuff (although some of the cloud toolsets are better). If you have even a small number of servers, it’s worth it. You don’t need “scale”, you just need a desire for your infrastructure not to suck.

Originally posted on

How to lead without authority

I spent a lot of time being angry when I started my career. My employers and bosses frustrated me. My coworkers frustrated me. End users, customers, everyone frustrated me.

I got angry about decisions that made no sense to me. Most of my complaints fell into the theme of:

“If I was in charge, we’d never do X.”

If only they’d asked me first. If only I was the boss. If only…

I probably had a few legitimate criticisms and good ideas, but most of my frustration was based on the ignorance of youth and inexperience — thinking I knew more than I knew.

When I did want something changed or disagreed with a decision, my first course of action was to complain to my boss.

“This is stupid. You should fix it.”

I had no sense of agency and thought I couldn’t change anything because I didn’t have the power to. I could come up with ideas (Fun fact: ideas are easy.), but I needed someone else’s permission and authority to put them into motion.

I thought I needed control and a mandate to lead and affect change. More often than not I thought “I can’t do anything about this, other than complain.”

I was wrong.

Over time I discovered three things:

  1. True leadership is not based on authority.
  2. It’s possible (even preferable in many situations) to lead sideways.
  3. The degree to which anyone has actual control over anything or anyone is comically small.

Getting people to follow you

I’ve been very lucky in my career to have worked for good managers, although I often took them for granted. Even though I looked to their authority for solutions, with only a couple exceptions did any of them ever tell me “Do this… because I said so.”

Rather than dictating specific action, they presented a vision of what needed to be accomplished (goals), and provided me with support and breathing room to get it done. They trusted and empowered me. They made me feel important and that they genuinely cared about my well-being and personal progress.

I still thought I needed their mandate to change things, but I was able to move out of my comfort zone and build confidence in my skills and judgement.

My motivation to do well flowed from a desire to not let those managers down. I didn’t want to betray their trust or make them look bad. None of that dedication came from fear of losing my job or a respect for authority — it was because they inspired me to care. If any of them called me today and asked for help, personally or professionally, I’d be there in a heartbeat.

On the flip side, I’ve had a couple of bosses that micromanaged me (making me feel like they didn’t trust me at all) or leaned heavily on their authority to drive me and coworkers to action. I respected neither of them, although I have some sympathy for them in hindsight.

I’ve come to believe that there is no surer sign of a person’s self-perceived inadequacy — feeling in-over-their-head or simply out of control than when they feel the need to declare themselves “the boss”. The moment a person asserts their authority as the reason to follow them is the same moment they’ve proven they aren’t worth following.

I’ve seen that behavior in pimply-faced kids who get promotions at fast food restaurants. I’ve seen it in 60-year-old CEOs of large companies. Everytime I see it, I want to pull those people aside and tell them “Shhh… Shhh… You’re OK. It will all be OK.”

Then I’d tell them three things about what real leaders do:

  1. They provide a vision of something greater than day-to-day tasks.
  2. They spend the time and emotional effort to discover what the people they’re leading care about.
  3. They trust and empower the people they’re leading, even when the stakes are high.

The scope of what can be accomplished by people who are inspired and care about the person leading them is far greater than what is done out of fear of losing one’s job or being reprimanded.

The power of soft influence

Even working for good bosses, I remained under the impression for a long time that my power to drive change had to come from them.

Yet again, I was wrong.

Without really being conscious of it, I started copying some of the behavior I saw in those I admired. I spent more time building relationships with co-workers, learning what motivated them, and sharing a little of myself in turn. I started trusting others a little more and let go of tasks and control of conversations I would have normally tried to hold tight to my chest.

I made conscious changes as well. I started asking for other people’s opinions more. Although it doesn’t come naturally to my personality, I started asking for help.

I started sharing more of my vision for the things I wanted to build and the changes I wanted to make. I worked to build consensus, soft-selling my ideas and compromising when necessary. I started letting others take ownership of my ideas as well.

And a curious thing began to happen. A lot of the things I was frustrated about and wanted to change — started changing.

Much to my surprise, it was entirely possible to lead and affect change among peers without any authority at all.

Also to my surprise and frustration, the hardest thing for me to do was also the most effective in getting others to follow my lead: asking for help.

It’s one of those head-slapping things that you feel dumb about when you realize how well it works on you, but when someone asks for your help (and really means it), it makes you feel important, which in turn, makes you want to help.

Asking for help is a little like rolling over and showing your soft underbelly. Some of us have a hard time doing it because of ego and vulnerability, but if you can get past that and have confidence in your end goals, asking for help is straight up magic.

You’re saying to the other person, “You, specifically you, have the power, skills, knowledge, etc… to help me accomplish this thing. You are important to our success. You are important to me.” That’s hard to turn away from.

If this sounds a little like manipulation, it absolutely can be, but that’s fairly transparent when it happens. I think most people can tell when someone else is buttering them up or asking for something just to mooch.

If you ask for help and can get past yourself to believe that you really do need the other person’s help, you and all the others you’re leading will be able to build spaceships and cure diseases. You’ll be tapping into the real social network, the type of collaboration that got humans out of scraping by, living in caves, and into planting wheat and building cities.

Control is an illusion

I am a control freak and used to be much worse than I am now.

I thought I needed control for things to be the way I wanted them to be. I thought I needed control to change things and didn’t really change much because also I thought I needed someone else to provide me with that control.

Seeking out control is a good way to make yourself unhappy, because you’re never going to get it and those that think they do have control tend to look like idiots to everyone around them (see teenage fast food manager above).

It’s hard to admit you don’t have control. It’s really scary too. That anything can happen at any time and you can’t really do anything about it is a good way to give yourself nightmares.

But it’s the truth. The most we have control of is ourselves and how we react to things, and even that’s limited.

You can get mad and yell and try to change someone’s mind about something, but you can’t control what they think. Nevermind that desiring that type of control is borderline psychopathic.

You can buy insurance and build your house into a fortress, but you can’t stop the freak electrical fire from burning it down while you’re out of town.

The best you can do is manage your reactions and maybe give a nudge here and there. That seems to be true for both leadership and life in general.

You don’t need control or authority to lead, because those aren’t real things. Instead what you need is empathy, vision, and a realistic understanding of what you can and cannot influence to direct your efforts.

We seek out control because it feels like an easy fix. We just need that promotion, or to be our own boss and then everything will be better. Control gives us the authority to lead the charge and get stuff done.

That’s just not how life works. Actual leadership is hard because having empathy, vision, and a detachment from control is hard. The sooner you give up chasing after control and put your efforts into building those other muscles, the sooner you’ll actually accomplish something.

Originally posted on

How to take legacy apps to the cloud

Some apps are easy to move to the cloud. Some apps are born there. Some… aren’t. A scenario a lot of us see as we migrate workloads is that we end up with shiny automated hotness in the cloud and then a bunch of old busted apps running on-prem.

Alternatively, if you work with more traditional IT and don’t know where to start, migrating workloads can feel daunting. Designing for cloud requires a lot more abstraction than on-prem. You’re much further from the metal.

Multi-tiered apps are usually an easy fit for cloud. But what do you do with all the client-server junk? — off-the-shelf apps that run on a single server, can’t be scaled horizontally, and are supported by vendors who still freak out about virtualization.

Read the rest on

The scary leap from SysOps to DevOps

I have at least one existential crisis a week during which I stress out about how much there is to learn and the finite amounts of time and attention I have to do that learning.

Someone releases a new cloud service, a new programing language, a new OS, a new micro-nano-pico-container-magic-going-to-change-the-world-thing seemingly every five minutes. Nevermind all the stuff I need to learn in everyday life — how to raise a kid, how to be a better husband, how to get a good deal buying dish soap in bulk.

Read the rest on