In our last post, Cloud Auto-Scaling: Simple Concept, Hard to Get Right, we dove into why Auto-Scaling in the cloud is so difficult to get right, and that much of that was derived from the fact it is about predicting the future of an unpredictable system. The other main conclusion of that post is that cost is one of the key metrics to measure any Auto-Scaling system against. This is where AWS Spot Instances enter the picture as an important player.
Spot instances have the same exact computational capability as their On-Demand counterparts, but typically run 80-90% cheaper to operate. But they come with a caveat: Amazon reserves the right to take it away from you if they need it to fulfill a higher priority request. When you request a Spot instance you have to specify an instance type, availability zone, and a Spot bid you are willing to pay. This is analogous to your “Best Offer” bid for an eBay item you covet. It is the most you are willing to pay per hour for that instance. If the market price stays below your Spot bid, you only pay the market price. The difference between a spot bid and an eBay bid is everyone who has bid over the current market price will get a spot instance, not just the highest bidder. But, if while your instance is running the market price goes above your spot bid, you lose the instance and everything it was working on. This works fine for easy to interrupt workloads, such as batch processing or map-reduce jobs, but this makes Spot instances very hard to use for systems that have high availability requirements, and the reason that the majority of those type of workloads running on AWS are still running either On-Demand or Reserved instance types.
But what defines this market price and what drives its fluctuation?
Generally speaking it is the supply and demand curve of availability of excess compute capacity in AWS data centers. When you run a c4.large instance for example, your AMI virtual machine is running on a physical machine somewhere in one of the AWS data centers for the region and availability zone you selected. AWS definitely has a lot of compute capacity, but it is not infinite. And only a subset of the physical hardware in a single availability zone is capable of running a particular instance type, so for example, there are a finite number of c4.large instances that AWS could support running simultaneously in us-east-1a. AWS does a good job of staying ahead of the demand curve in the vast majority of cases, so there is usually considerable extra c4.large capacity just lying around not being used for the normal On-Demand or Reserved Instances. That is what you are bidding on with Spot instances. You, and all the other AWS clients, are making a bid for compute capacity not being used currently.
Which brings us to the definition of a Spot market. A Spot market is defined as the combination of instance type and availability zone. AWS may have more excess compute capacity for a c4.large in us-east-1b than us-east-1a, so it naturally makes sense the supply and demand curve in us-east-1a might drive the price higher. So there are three primary drivers to what the spot price will be in a particular spot market at a particular point in time:
Spot Market Price Drivers
- the total amount of compute capacity in the availability zone capable of delivering that instance type
- the amount of current On-Demand and Reserved usage of that instance type in that availability zone
- the bidding activity for Spot instances of that instance type in that availability zone
For example, if a large AWS customer starts a large analytics job that consumes a huge number of c4.large On-Demand instances in us-east-1a, AWS may need to reclaim some spot instances currently running in that availability zone order to fulfill that request. This would change the supply curve and move the price up given the same demand level. The way AWS goes about reclaiming the Spot instances is in priority order based upon the Spot bid price of each spot instance, the lowest bid will be reclaimed first, and then successively higher bids. Once it has reclaimed enough to meet the demand, the bid price that it climbed up to while reclaiming instances will become the new market price. You will not be able to start an instance unless you bid above this price, effectively booting someone else off their instance so that AWS can give it to you.
In a similar fashion, given the same exact supply, i.e. the same number of excess capacity instances available beyond the number currently being used for On-Demand and Reserved instances, if the demand for a particular instance type in a particular availability zone increases by more people making more and higher bids for that resource, the Spot market price will also increase. Both drivers can have the opposite effect of driving prices down, e.g. if more excess capacity becomes available or fewer and/or lower bids for instances of that type are occurring.
Each Spot market operates completely independently from each other. So even though there might be a shortage of c4.larges in us-east-1a that drives the price though the roof, c4.larges in us-east-1c might be 3 to 4 times cheaper, or a c4.xlarge in us-east-1a might be cheaper. As of this writing there are approximately 200 different instance types available in AWS. So in the case of the region us-east-1, which has 5 availability zones, there are roughly 1000 different spot markets available for you to choose from in us-east-1 alone! Globally AWS operates 42 availability zones in 16 regions, so the total number of Spot markets in existence today is close to 8,400! That’s more than the stock listings of the New York Stock Exchange and the NASDAQ combined!
This entire process introduces some fascinating dynamics and potential optimization strategies for the “optimal Auto-Scaling system” we discussed in the last post. We will be going into those strategies in more detail in upcoming posts, so follow along and join in the conversation!