Cloud Auto-Scaling: Simple Concept, Hard to Get Right

Cloud Auto-Scaling: Simple Concept, Hard to Get Right

And expensive if you get it wrong.

Welcome to AutoScalr’s blog. We are going to be diving deep into this one concept in this blog series. If this headline grabbed your attention, please follow along and join in the conversation. You probably have some battle scars and knowledge to contribute.

AutoScalr was founded on the belief that Auto-Scaling in the cloud should not be so “hard to get right” and we should be able to bring innovative machine learning technologies to bear on the problem to make it easier to “get it right,” especially for organizations that simply cannot afford to have a dedicated team working on this problem for each and every cloud app they deploy.

What makes Auto-Scaling in the cloud so difficult to get right?

It is because in the end it comes down to predicting the future, which for all but the most trivial systems is usually pretty hard to do with any degree of accuracy. What exactly needs to be predicted? I would define it as:

The minimal amount of computational resources required to process the load of an application for the next 5 to 10 minutes with adequate performance.

The 5 to 10 minutes part is driven by the fact that there is latency from the time you request more capacity, e.g. a new EC2 instance, until that requested capacity is fully spun up and actually contributing to your application. If there were no latency, Auto-Scaling would be easy. You would simply add resources at the moment you needed them. Which brings up tip #1: Pay attention to your instance start-up time. Anything you can do to shorten that time is going to make your Auto-Scaling strategy more effective and efficient because the prediction timeframe required is shortened and therefore easier and more accurate.

Let’s consider what a generic optimal Auto-Scaling system would have to do from a systems point of view. The application load is the primary driver of this type of system, so the first task would be to try to predict the load for the next 5 to 10 minutes. This alone can be challenging for many applications. The next step would to map the expected load to actual required computational resources. This is where it really starts to get tough. Even with extensive performance testing or historical heuristics it typically is not practical to build up models that have enough predictive ability to be effective. And do not forget the “with adequate performance” requirement, which is often hard to define exactly and even harder to accurately predict from computational resources. Taken altogether it should come as no surprise that Auto-Scaling is hard to get exactly right.

The traditional approach to handling the predictive complexity of Auto-Scaling has been to pick the applications most limiting resource, typically CPU, and use it as an aggregate representation of system load and scale solely based upon it.   When CPU goes above a threshold, add instances, when it drops below a threshold, remove instances. The challenge is finding the right balance for those thresholds so that the system scales up early enough to not ever be caught under-provisioned, but also not keep too much spare capacity around because it is costing you money. At AutoScalr, we believe technology has advanced enough that we can begin to deliver alternative approaches to Auto-Scaling that are more effective and easier to use.

The cost is a key metric we will be coming back to over and over. If cost were not an issue we would simply over-provision to maximum expected load and not worry about it.  Auto-Scaling effectively is about finding the balance between never being under-provisioned and minimizing cost. Given two different Auto-Scaling systems that both avoid the under-provisioned state, the one that results in lower costs is the “better” of the two. The trade-off between cost and capacity will vary some between applications, but the general premise remains the same.

The cost metric drives several other dimensions of consideration for Auto-Scaling. The first is instance type selection. Not only which instance family, but also which individual instance type used can change the cost profile of an application significantly. Second, payment model type, and in particular the spot instance options in AWS. The cost savings potential of spot usage is tremendous, upwards of 80%, but they come with the nasty caveat of being taken away from you if the market price goes about your bid. That makes spot very difficult to use in most systems that have availability requirements without specific application design changes. There are strategies to combat and overcome this caveat, and we will definitely be diving in on this topic deeper during this blog series.

Hopefully these topics excite you as much as they do me. If so, follow us and let us know your thoughts and experiences on these topics, including suggestions for Auto-Scaling topics you would like covered in future posts.

Leave a reply

Your email address will not be published. Required fields are marked *