There were a lot of big announcements at re:Invent 2017.
Some of the most significant to me were:
- a managed Kubernetes service (EKS)
- multi-master capabilities for Aurora and DynamoDB
- SageMaker for building AI apps
- and much more.
One announcement that did not get as much press was a change to how AWS prices Spot instances. In the announcement, it was described as “simplified the Amazon EC2 Spot instance pricing by moving to a model that delivers low, predictable prices that adjust gradually based on long-term trends in supply and demand”.
This Spot market exchange floors are effectively closed. How does this affect you?
This new pricing model has been out for a little over a month, and we have been assessing the nuances of how this affects practical application of Spot instances. Here is what we have uncovered so far:
As the announcement said, the new model does indeed deliver more predictable prices that move around with lower volatility. This reduces the risk of paying exorbitant prices, often even over the On-Demand price, if prices are jumping around with high volatility, which is definitely a good thing.
The m4.10xlarge is an instance type in fairly high demand and you can see prior to the change on November 28 it was often spiking in price to 10 times the On-Demand price multiple times a day on every weekday. After the change, the price has hovered between $0.60-$0.65 without much volatility.
What could possibly be bad about more predictable and stable low prices?
It is that it is no longer really a true price. As discussed in AWS Spot Instances and Spot Market Price Drivers, the entire Spot concept is built around the bidding on excess capacity of a particular instance type in a particular availability zone. So what happens when the demand exceeds the capacity? It used to mean that the price would rise until enough capacity was reclaimed from the lower bids and a new equilibrium was reached. With the new model, this does not happen. Instead the price remains relatively the same, but some instances are terminated, not for price, but for capacity-oversubscribed. You can no longer buy the instance for the published price, because it is an artificial price that does not reflect the actual supply and demand curve. When in this state a couple of counter-intuitive things can and do occur:
- bidding over the current price does not result in receiving an instance
- an existing instance may be lost even though the bid price was over the current price
You rarely get something for free. In this case, we have essentially traded one problem for another: predictability of getting and keeping an instance for the predictability of price.
The ugly part is that AWS has not (to date) published how instances are now selected for termination when the capacity oversubscribed state is reached. We have run numerous tests at AutoScalr and have determined that it appears to not be based on bid price anymore, at least not exclusively. We have seen multiple cases where an instance was terminated that had a higher bid price than a different instance in the same Spot market with a lower bid price that continued to run. This makes trying to predict and adjust to losing an instance that much more difficult because you are forced to look at longer term trends.
Here is one example where you can see that two different instances were terminated based upon capacity oversubscribed, one at a bid of $0.40 and another at a bid of $0.50 even though the listed price was $0.25, while the one at the lower bid of $0.30 continued to run.
A Step Forward or A Step Back?
It depends on your perspective, but I think it is a step back. In almost every free market scenario when price-fixing or heavy price regulation is put into place, the negatives usually end up outweighing the positives. It would appear that this is a move by AWS to match how Google prices their pre-emptive VM’s, with the goal of making Spot easier to use and increase adoption. While it may do that in the short term, I think in the longer term it will have a negative effect because of the problems outlined above, and in the end, all the cloud providers will migrate to pricing excess capacity based on true supply and demand.
Another option would be if AWS published some sort of metric on how much excess capacity is currently available in a particular Spot market, analogous to “How many seats are available for booking on an airline flight.” This would allow you to at least have some indication on whether a bid over the current price will be successful (if there are a lot of open seats), and give some sort of indication of the risk of losing an existing instance (if the flight is filling up).
Strategies for New Spot Pricing Model
Diversification across many Spot markets of different instance types was always a core pillar of strategy for using Spot, but with this new pricing model it becomes even more important. Since it is now harder than ever to predict when you might lose an instance, it is more important than ever that diversification makes sure that the loss is minimal and sustainable.
Expect slower start times in oversubscribed Spot markets
Spot instances are now allocated on a first come first serve basis, regardless of your bid. If you are using Spot Fleet, be prepared for longer start times if one of the Spot markets used is oversubscribed. We ran one test where it took over 20 minutes for the fleet to be fully provisioned because of contention for that instance type leaving many instances showing capacity-oversubscribed state.
Remember that a low price does not necessarily mean you can launch an instance in that Spot market
If your application is under-provisioned, consider adding capacity into multiple Spot markets to lower the chance of being stuck waiting for an instance in the first-come-first-served queue of an oversubscribed Spot market. If you are severely under-provisioned, fall back to On-Demand instances, which typically launch faster.
Track oversubscribed Spot markets
Since the only clue that you have that capacity is not available in a particular Spot market is when you receive that status from either an instance not starting or from an existing one being terminated, record that information and try to avoid that Spot market for some period of time.
Use AutoScalr Service
If you just managed to get your application Spot-aware and cost-effective, and are not looking forward to adjusting to all these new changes, consider using a service like AutoScalr to do it for you. It seems likely this will not be the last change you see in the Spot market pricing model.
If you are already using AutoScalr, there is nothing you need to do. We have already adjusted the back end machine learning algorithms to accommodate this change.