ElastiCache Redis can easily scale to support millions of queries per second or terabytes of stored data. Chances are, however, that you do not need to provision the capacity of your largest anticipated spike 24/7. If you know the general shape of your spikes, you may be able to take advantage of the autoscaling features in Elasticache Redis.
Understanding Scaling Dimensions
ElastiCache Redis offers you full control of your capacity planning. As such, it is important to understand the dimensions upon which you may consider scaling. Simply put, you are likely to scale your cluster based on RAM (memory pressure), CPU (TPS pressure), and Network limitations (bandwidth or packets per seconds).
If your working set is larger than the available memory in your ElastiCache cluster, your cache hit rate will start to suffer. Having more memory allows you to handle a larger working set without compromising your cache hit rates. Looking at your memory utilization and eviction metrics gives you insights on whether you are under memory pressure and can be used to inform your autoscaling policies. Similarly, your CPU utilization metrics can give you insights on whether you are likely to need to scale for CPU usage.
Things get meaningfully trickier with network bandwidth usage on ElastiCache Redis. This is where we see most teams get in trouble. EC2 instances can burst for a few minutes - but they generally cannot sustain their “Up To” bandwidth numbers advertised on the EC2 site. Accordingly, if your spikes are short lived, the EC2 burst capacity can save the day. However, (and often at 5 minute mark), EC2 will eventually start to throttle your bandwidth, bringing your node to a harsh stop. This can be particularly troubling because the load you were previously sustaining suddenly stops working. For example, an r6g.large offers up to 10Gbps bandwidth, but the sustained bandwidth it can handle is limited to only .75Gbps.
What can you actually scale?
There are two units of scaling in ElastiCache Redis: shards and replicas. Shards determine how many slots your data is divided in, while replicas determine how many copies of each shard you have.
There are a couple of reasons why you may need to have multiple shards in your ElastiCache Redis cluster. First, if your data simply does not fit into a single node, shards allow you to divide the data across multiple nodes. Second, imagine if your dataset fits in a single node but the write throughput you require cannot be supported by a single node, sharding allows you to multiply the write throughput across the shards in your cluster.
There are more reasons to have replicas in your system. First, if you have primarily a read -heavy workflow, more replicas allow you to divide your reads across multiple machines. This can help you mitigate CPU or networking pressure you may have on your fleet. Second, replicas are great for redundancy against node failures. Specifically, you do not want your cache hit rates to be impacted if an instance dies - so having the replicas helps you gracefully degrade in such scenarios.
As a cheatsheet, if you are trying to store a very large amount of data or handle high write throughput, you need more shards. However, if you are just trying to increase your read throughput, more replicas can help you scale your fleet better.
Official Best Practices on ElastiCache Redis autoScaling (and what’s wrong with them)
AWS has official best practices for autoscaling your ElastiCache Redis cluster here. There are a few of these best practices that I found particularly troublesome.
- Disable Scale-In. Nope - this is not us misinterpreting the guidelines. The guide literally states that you ought to “disable scale-in” as best practice number 4. It goes on to state “you can start with scale-in disabled and later you can always manually scale-in to your need.”
- Scale on only one metric. The reality is that you can get memory, CPU, or network pressure on your fleet. However, ElastiCache best practices recommend against applying multiple autoscaling policies as they may conflict with each other - and can cause an infinite loop when scaling out or scaling in.
- Use only if you have uniform distribution. This is an important caveat. If you have a workflow with a hot shard, you have to vertically scale all shards in your fleet (or add more replicas to every shard). This can get incredibly inefficient, especially as the number of shards in your fleet grows.
- Best Suited for Gradual Spikes. Unless using scheduled scaling policies, it is important to internalize that scaling an ElastiCache Redis cluster can take more than 10 minutes. In some cases, your spikes may be gone by the time your cluster scales up.
Is autoscaling worth it?
ElastiCache Redis autoscaling is like cruise control in a car. It was super useful when it first came out - and even impressive. Like cruise control, autoscaling is over a decade old - and managing capacity when dealing with instance type, instance count, shards vs replicas, recognizing memory vs CPU vs network pressure is just way too tedious in today’s serverless-first world. Developers should not have to get distracted by the nitty gritty details here. Generally, when you hit unprecedented scale, it is when your business/app has hit it big and has its biggest, most visible moment. This is not the time to learn why your autoscaling and capacity planning was subpar. Teams should not have to take on the aftermath of outages in these highly visible moments, when the full focus should be on delivering value to the end consumers instead of getting distracted by outages.
Momento is like auto-pilot. You do not have to do capacity management with Momento. Hot Shards? Hot Keys? Momento automatically makes more copies of your hottest data to relieve your cluster when your demand is on fire. Similarly, Momento can scale to millions of TPS without you needing to manually test it. We know we can handle it because we do this every day.