Because spinning up VMs in the cloud is so easy, it’s equally easy for your monthly bill to scale up as well. What may have started as a few hundred dollar-a-month charge for a few VMs can quickly ratchet up into tens of thousands of dollars.
In larger businesses, I’ve seen this get so bad that management freaks out and moves everything back on-premises, even when the growth was legitimate and not the result of waste or mismanagement.
There is also a tendency for devs and ops folks to treat the financial aspects of their infrastructure and applications as “not my job”, which is an idea that makes my brain hurt from all the angry it creates.
Three tables flipped, and I still can’t write about the “not my job” angle. It will have to wait for another blog post. For now, let’s just assume that you, dear reader, are a minimally-functional adult who understands the concept of owning the things you build.
As you’re building your infrastructure, it’s always a good idea to bake in instrumentation to understand utilization. Most people view this as a must-have for troubleshooting and monitoring, but there’s also a cost angle.
Assuming you already have infrastructure, the first step in lowering your AWS EC2 cost is identifying what you actually need.
You can either roll your own solution for right-sizing your infrastructure using the performance and inventory data you should already be collecting with AWS (or your cloud provider of choice) or you could use a service like CloudHealth Tech or CloudCheckr that are purpose-built for cost optimization.
The SaaS solutions do a lot of stuff related to cost optimization, but one of the main things they provide is utilization reporting and recommendations.
Maybe you thought you needed 50 m4.2xlarges for your app. Maybe what you really need are 25 t2.mediums (a $14k a month difference). I have seen gaps that large revealed when running utilization reports for the first time.
You may be thinking “someone who understands their app would never let that happen”, and that is true most of the time. The problem is:
- Very few people actually understand the apps they run.
- Very few people consider the cost of running those apps as being something they own.
- Related to #1, many people don’t have the luxury of running apps that they have built themselves. They are handed someone else’s shitty app and told to keep the lights on.
Getting a good grip on actual utilization tends to also help with conversations like “maybe we should spend the time to make our app stateless so we can auto-scale servers” or “maybe it’s time to convert this app to PaaS.”
Note: If you do end up switching a bunch of servers to t2-tier, make sure you monitor CPU credits. This tier has a CPU-usage quota and if you go over that, your server is throttled until the credits rebuild. I’ve seen people flip over to t2s thinking “it’s got the same specs as my m4/5” and then wonder why their high-CPU-usage app is crawling a few hours later.
Buy a baseline
An argument that is normally used to justify hybrid cloud is “we have a baseline on-prem, so we’ll just use the public cloud for burst elasticity”, which is reasonable. However, you can do the same thing in-cloud.
Once you get a grip on your baseline needs, reserved instances (RIs) become an option. It is entirely realistic to cut costs in half with RIs.
RIs take away some flexibility in return for lower instance prices and are similar to financial instruments you see elsewhere in the world, like cell phone plans.
“You can pay month-to-month with no contract, or you can save X% by signing up for a minimum term.”
Unlike a cell phone plan, there is an RI market where you can resell your RIs if you need to get out of them, and there are also lots of options around converting them different instance sizes. Converting RIs to different instance families is also possible, but a bit trickier.
Note: There are limits around selling your RIs and the marketplace doesn’t have a high volume in many regions, so selling opportunities may be limited. So don’t purchase RIs with 100% certainty that you’ll be able to resell them.
RI discounts are primarily tied to term (the length of the contract) and percentage paid up front (none, partial, or full). Three-year full-upfront RIs are going to be the cheapest option.
A trick I’ve learned in the past year is that many resellers like SHI and CDW offer financing for AWS RIs. So instead of paying for 1 or 3 years fully upfront, you can amortize that “upfront” cost over a term with your reseller. Obviously, you’re paying interest as part of the financing, but if the decision was between a 10%-off no-upfront RI purchase and a 60%-off full-upfront purchase, paying a few points in interest is a no-brainier.
Fun fact: RIs can also be purchased for RDS instances and this is almost always an easy win because of how “permanent” your database servers likely are in comparison to the rest of your infrastructure.
Spot instances are your friend
Assuming you’ve purchased a baseline of RIs, any additional instances above that will be either at the “on-demand’ or “spot instance” price.
With spot instances, you are effectively bidding on AWS’ excess capacity, so pricing is highly dynamic based on demand. Because ‘spots’ are excess, they will be pulled back into AWS’ inventory when that capacity is needed by other users paying the on-demand or RI rate.
Spot instances are awesome when they’re feasible for what you’re doing (test environments, fault-tolerant apps, EMR processing, like Spark). Where you might get an RI for 50% off, I’ve seen spot instance discounts as high as 80-90%.
Some folks have done an amazing job of analyzing historical spot prices for the types of instances they need and have purchasing algorithms that help them both track price trends and buy & run their instances when it’s cheapest to do so.
Note: A word of caution here – spot price purchasing is something you want to keep an eye on. Because it’s demand-driven, spot pricing can sometimes (though it’s not often) spike above the price of on-demand pricing. So you’ll want to make sure you’ve accounted for that in any automation you build.
You’re not stuck
A common assumption that people make with their EC2 environments is that the cost “is-what-it-is” until the apps running there can be sunset or migrated to PaaS. Fortunately, that is rarely true. By leveraging some financial tools in your config, it’s highly likely you’ll be able to bring the cost of your EC2 environment down.
If you’re working in enterprise, this stuff will make the biz folks love you, and they’re likely who control your salary.
If you’re running a startup, this stuff can make the difference between you being able to hire more people, or even stay in business at all.
None of it’s hard, it just needs to be accounted for.