Categories
Tech

Designing Ansible for flexibility and simplicity

When I was first getting started with Ansible, I struggled a lot with the layout for my projects. The Ansible best practices docs were a little bit of help (they’ve gotten way better), but like most “best practices” they solve for a very simple use case.

Google generally wasn’t much help either. Most of what I found solved for “I am running a two tier application on a couple of machines”.

It sounds like it would be a simple problem to solve, but thinking through all the requirements, there are a lot of things to consider.

I need a layout that is flexible with some areas of enforced consistency, covering multiple app stacks in different clouds, accounts, and environments. I also need it to be as simple and easy-to-understand as possible, so I can hand it off to others. Finally, I need something that can be self-contained for easy portability and testing.

So I came up with my own layout, and so far, it seems to work pretty well.

└── ansible
    ├── ansible.cfg
    ├── callback_plugins
    ├── library
    ├── environments
    │   ├── biz_unit_1
    │   │   ├── dev
    │   │   │   ├── group_vars
    │   │   │   │   ├── dev.yml
    │   │   │   │   └── secret.yml
    │   │   │   └── host_vars
    │   │   │       ├── api_server.yml
    │   │   │       ├── db.yml
    │   │   │       ├── router.yml
    │   │   │       └── webserver.yml
    │   │   ├── prod
    │   │   ├── qa
    │   │   ├── shared
    │   │   ├── api_server.yml
    │   │   ├── db.yml
    │   │   ├── router.yml
    │   │   └── webserver.yml
    │   ├── biz_unit_2
    │   └── biz_unit_3
    ├── global_vars
    ├── inv
    │   ├── ec2.ini
    │   ├── ec2.py
    │   └── hosts
    └── roles
        ├── apache2
        ├── general_server_config
        ├── java_jdk
        ├── mysql
        ├── nginx
        ├── postfix
        └── varnish

* a very pared-down and sanitized version of what 
I work in day to day

Let’s break this down a little to explain the rationale.

The root directory (I’d recommend tossing it in /opt/ansible) and its contents are all stored in source control, including ansible.cfg, a callback_plugins directory, and a library directory. This means ALL Ansible config and customizations can be version controlled and made portable.  The library directory may, in turn, reference git sub-projects for different modules, but can be included in the main repo either way.

I’ve seen a lot of guides that say to put everything in /etc/ansible with the Ansible binary. This will make your Ansible install fragile and difficult to port/share. I’d recommend against it.

Environments contains folders for each business unit or application family (whichever is appropriate). The biz unit folders have sub-folders for their environments (dev, qa, prod). These folders are where the Ansible group_vars and host_vars live and allow for consistent references from roles and the CLI in the form of /environments/{{ bu_name }}/{{ env }}/host_vars/{{ app_role }}.yml

Secret.yml files can be added at whatever level makes sense to hold the Ansible Vault-encrypted variables for the environment or host.

The root of each biz unit’s folder contains the playbooks for each host/stack. These are not variables, but the references to roles which in turn consume the environmental variables.

Example: /environments/biz_unit_1/webserver.yml would be something like:

---
- hosts: tag_Role_webserver
  roles:
    - general_server_config
    - java_jdk
    - tomcat
  
  include_vars:
    - '{{ env }}/host_vars/webserver.yml'
    - '{{ env }}/group_vars/{{ env }}.yml'

Global_vars contains what it says on the tin. These are variables that apply to all hosts and/or environments.

Inv – Dynamic inventory scripts for cloud providers live here, as does a static inventory file with a couple of localhost-specific settings in it (mostly done as a workaround to some buggy Ansible variable parsing).

You will find docs that suggest that you put host vars into an inventory file. This doesn’t make sense unless you are dealing with a static, traditional environment, and even then it can get ugly.

Doing as much as you can with the dynamic inventories will minimize the number of vars you need to keep track of and prevent situations were you are working against old data (that’s the dynamic part).

Roles is, of course, where roles live.  I generally use ansible-galaxy init rolename  to create roles since it builds out the directory scaffolding and boilerplate files. This does create some extra cruft you might not want (similar to running rails generate), but I like the consistency it provides.

Outside of defaults, no variables should exist in roles. These are generic plays that should be able to run against any of your environments by referencing variables in the environmental vars. Putting environment-specific variables in your roles is a good way to make things break and creates a mess if you’d like to open source your role later.

Triggering plays

Setting up Ansible in this way makes it easy to share the same Ansible repo across an entire enterprise and trigger parameterized plays by passing different values to Jenkins (or Ansible Tower, or Rundeck, or whatever you use to schedule/execute your Ansible plays).

For example, you could setup an nginx deployment in any of your environments by adding a vars file with any environment-specific customization and feeding a Jenkins job the correct parameters for the destination environment.

It might look like this:

  1. User (could basically be a trained monkey) triggers the Jenkins job for “Build Infra” with params like “Sales”, “Prod”, “Router”. These params could be hard-coded if you wanted to make a true “one-click” deployment/re-deployment. Going further, you could also auto-trigger this job with a git webhook. (Sorry, monkey, no job for you.)
  2. Jenkins checks out the latest git commit for the Ansible repo and executes a play using the parameters that were fed to it.
  3. The Ansible play calls AWS APIs and SSH to spin up an instance, install and configure nginx, save the instance as an AMI, create a load balancer, create an autoscaling group with the customized AMI, and update DNS to direct traffic to the new load balancer. (If the stack already exists, it would be updated with the new settings.)

This is for an immutable workflow. A more traditional config-management scenario would be even simpler.

Other considerations

Another design I considered was using git submodules to slice up the different environments. Ultimately, this felt a little clunky and added unnecessary complexity for my use-cases. If the teams using Ansible can’t play well together or you don’t want all your vars and roles in one repo, this path might make sense.

Related to the above, this layout does require testing discipline for the core roles that are shared across environments. Borking up a shared “launch_EC2” role and checking it into master could annoy your teammates when they’re trying to figure out why their plays are failing.

I continue to look for ways to make the layout simpler, with fewer files and directories. At the moment, this is the simplest setup I could figure out that met all requirements. I’m definitely open to feedback though.

Categories
People Tech

Dirty secrets of DevOps

I’ve read dozens of DevOps success stories, tales of bold IT leaders transforming their business and steering big corporate ships into the future. It’s hard to avoid all these stories about “DevOps transformation” and how some guy named Jim pulled the IT department out of the stone age. The trades, blogs, and conference presentations are filled with them.

No one talks about the failures though, very few even write about their struggles implementing DevOps. It’s all sunshine and rainbows, which sucks, because that isn’t real.

Success teaches us little other than survivorship bias and how to feel bad about what we haven’t achieved. Failure and hardship are where we learn. That’s where the good, meaty stuff is.

So here are a few dirty secrets of DevOps.

Most companies that say they are “doing DevOps”, aren’t.

Because of all the success stories (real or imagined) that have wiggled their way into the minds of CXOs, “we should be doing DevOps” became an empty corporate directive that inspired thousands of executives to start calling their IT infrastructure groups “DevOps” instead of “Infrastructure” or “Systems” or “Operations”.

Unfortunately, this seems to often play out as “We’ve renamed the group, so we’re going to be letting most of the team go, because we’re DevOpsing all the things now and being lean and mean. Also, the developers are still a separate group and they’ll be throwing more stuff over the wall to you.”

So you end up with a few overworked, traditional Ops folks trying to keep the wheels on the bus with zero changes to the way work is managed or how the IT group functions day-to-day. Their manager is shouting down the hall about automation while the poor Ops teamis trying to pivot a SAN-installing, server-racking skill set into something that looks like a cave-man version of coding.

The only metric that improves in this scenario is “personnel cost”, and only temporarily because burnout and churn spike, driving up staffing costs a few quarters down the road. But it looks good long enough for someone to say “See, we did it!” and feel validated.

Even if you get the “IT folks” on board, getting plugged into the business so DevOps practices can benefit other groups and the overall bottom line comes with its own challenges.

Fixing this issue requires a lot of skill managing upward and sideways. Often times, it’s not worth trying to change and moving on is a better option. Your mileage may vary.

Implementing DevOps is Really Fucking Hard™

DevOps is all about people and process, getting everyone working together to do fewer dumb things, and smart things faster.

Historically, getting people to work together and not be jerks to one-another has been a bit of a challenge. Humans achieve awesome things when we collaborate (like spaceships and lasers), but we usually suck at working together. Because of that, I’m always impressed when I come across an excellent people-whisperer, someone who can motivate different groups to work towards a common goal without burning down each other’s village.

Problem is, there’s like five of those people on the face of the planet and chances are, they don’t work for your company. You might have lucked out and have 1 or 2 folks who are kind of OK at people-wrangling and peace-keeping, but most businesses (especially the bigger ones) seem about three seconds and a passive-aggressive sticky note in the office kitchen away from an all out blood bath.

Assuming you can get people working together, you’re now faced with the challenge of implementing process. You probably have one person on your team who loves process. Everyone else hates process and that person.

You’re never finished

There’s no such thing as “we achieved DevOps”. It’s a practice like healthy living or Buddhism. There has to be a champion(s) on your team who pushes every day to make things better.

When someone talks about DevOps success, what they’re really talking about is achieving flow, that there is a functional work process in place that is continually measured and improved upon. It’s an ouroboros value pipeline.

That’s not something you can arrive at and stop tending to. Without constant care and feeding, the processes you worked so hard to implement will start to die off.

No champion, no DevOps.

All that being said…

None of this means that DevOps isn’t worth doing, just that you need to be realistic about the challenges you’re going to face. I’ve leaned on hyperbole pretty hard to swing the pendulum away from sunshine and rainbows, because reality is somewhere in the middle.

Getting Dev and Ops (and Security), groups who have traditionally walked down the hall waving their middle fingers at each other, to 1.) work together and 2.) implement and adhere to process, will likely be one of the most difficult things you’ve attempted in your career. You have to put a lot of work into making the right things the easy things, reducing friction wherever you can. Setting mandates or badgering doesn’t work, you have to sell the value.

Getting top-down buy-in (and understanding!) of true DevOps/Agile practices is hard. It requires reorganizing groups and a sustained sales pitch to all involved. The need for this trails off a little once the business and IT staff start seeing value, but expect it to be a long, sustained effort. I’m always a bit dubious when I hear something like “we transformed IT in three months” – either that group really has their shit together, someone is lying, or we’re not using the same dictionary.

For practitioners and evangelists, these are the things we need to start talking about more. There’s a slick consultant vibe that’s weaved throughout discussions about DevOps that glosses over the practical and prescriptive. Too many of these conversations focus on high-level what-to-dos and not enough on concrete how-tos and context, especially when it comes to people-centric issues.