I have learned key lessons from my time at Amazon where I was running a fleet of thousands of EC2 instances across multiple regions. In this blog, I walk through my lessons learned around caching, most of which were learned the hard way.
I helped build a personalized ad insertion service which had to handle millions of transactions per second (TPS). The goal of the service was to give each viewer a personalized experience, but we had a tight budget in which we had to do the personalization. If we took too long, our customers’ video would buffer, and it would erode the end user experience. This workload was also very spiky—and somewhat unpredictable. You never knew when a football game would get interesting and lots of fans would start tuning in. First thought to solve this issue—add a cache!! As an engineering leader now at a caching company, my response to adding a cache may puzzle you…
Are you sure you want to add a cache?
Believe me—when working on Amazon scale services, my own instinct was often to use a cache as the de facto tool in my toolbox to improve performance. Time and time again, my team (and other teams I would be doing design reviews for) would experience some of the same pitfalls.
Slowing down (instrument and size your cache)
First and foremost, it takes time to add a cache. You cannot rush into it. Adding a cache required our teams to provision Amazon ElastiCache clusters, size them, benchmark them, and instrument them. This planning was easily a sprint or more of work, and this is before we even start integrating the cache into the mix.
Instrumentation is frequently overlooked when teams rush into caching. If implemented incorrectly, a cache doesn’t even help you go faster. In fact, in some cases, cache can actually slow you down or increase your error rates. Without instrumentation, you’d never know it and would continue working off the assumption the cache is speeding you up instead of slowing you down.
Similarly, sizing and benchmarking the cache is critical. If you overprovision, you are wasting money. If you under provision, a bunch of screens buffer, or your cache ends up being slower and more error prone.
Now, I'm obviously going to plug how Momento Serverless Cache has your cache ready in mere seconds, you don't have to size it, and developers save time in sprints and instrumentation. But...just because it is easier to cache, doesn’t mean you should blindly add a cache everywhere! It requires more deliberation.
Recognize the modality a cache adds
We spend a lot of time improving the cache miss rates (CMR) for our customers, but there is a downside of really low CMRs. The DynamoDB paper talks about Cache Hit Rates (CHR) of 99.75%, which is an astonishingly low CMR of 0.25%. This means if the CHR drops by 0.75% to 99%, the CMR goes from 0.25% to 1%, which at face value may appear to be a relatively small increase. However, this is actually a 4X spike. Which then results in a sudden 4X bigger load on your backend database. This bimodal behavior can be catastrophic. A cold cache can wreak havoc on your system during a cold start (for a real world example see this Roblox outage postmortem).
Explicitly plan around staleness
As soon as you add a cache to your system, you are likely compromising the consistency semantics of your system. There is a likelihood of getting old data without realizing it. This is also really hard to debug. Developers are often found pulling their hair out, because the response they are getting just doesn’t add up based on what’s in the database. Then, they eventually realize a small subset of the response is driven by a stale cached value they believed was not cached.
Some important questions to consider when planning:
- How do you update values in the cache?
- Do you use a TTL (Time To Live), or do you need another component which is in charge of deciding when to update the cached data?
- Is it up to those who read and write data to kick in the update process, or do you have a background process to refresh the data? If you have a background process, what happens if the process stops running?
- If you are not able to fetch a new value, do you return a stale value? For how long can you return the stale value?
- How do you deal with items not being present in the cache—do you put the item in the cache before returning success or after returning success?
- How do you deal with errors when the item is not in the cache and downstream returns an error? Do you cache the error response?
Gradually, your solution will just get more complex. In return, the cost you were trying to reduce will instead result in higher cost from having to maintain this spaghetti code. Having seen this pattern so many times, I always insist on having explicit TTLs on all items in the cache—even when the engineers insist the data won’t change.
Location, Location, Location: Where to Cache What?
We often use cache as a way to simply hide performance and scalability problems. However, this approach has two major flaws. First, sooner or later performance and scalability will become a problem again. For example, local caching works really well in a small web-server fleet, but as the number of web-servers grows, your backend starts to suffer a new type of load. In these situations, you have to start thinking hard about the consequence of doubling your web-server fleet size.
And this, in turns, leads to the second major flaw. Caches add complexity to your code. In the simplest case, you store a value and you get the value back when you need it. Evicting stale data from cache is quite easy as there are no dependencies. With local caching, you risk having stale values in more places. Clearing a poison pill could require you to bounce the entire web-server fleet.
Local caches add more cold starts. Each time you add a new web-server (deployments and scaling), you take on a latency hit until the cache warms up. This adds variability. It’s hard to pinpoint this without having deep per-web-server metrics on your dashboards.
Caching in a separate fleet
A look-aside caching fleet has meaningful advantages over local caching. For instance, web-server deployments don’t erode cache hit rates or result in cold caches. Furthermore, by sharing the cache across your web-server fleets, your cache hit rates get better as your cache builds collective intelligence across your servers. This is particularly useful when using AWS Lambda as your web-server layer as each Lambda can only handle one request at a time. The separate fleet is also more efficient. Instead of wasting memory on each web-server, you are able to size the caching fleet and reason about it better (separation of concerns).
Nevertheless, separate cache fleets also have their own challenges. The cache fleet needs to be monitored, managed, and scaled. You will have to be accountable for security patches on it or deal with maintenance windows. Scaling up (adding nodes), scaling down (removing nodes), and deployments (replacing nodes) on the fleet can result in cold caches—and you have to think hard about your scaling and deployment story.
One of the benefits of local caching is it gets reset with each deployment. You do not have to worry about incongruent items in the cache that are incompatible across your web-server deployments. With a separate cache fleet, deployments to your web-server get more challenging, especially if they are changing the way you cache data. Now, you have to deal with stale schemas of data, which could end up being poison pills, and clearing those can be pretty challenging.
Separate caching fleets add an RPC call in the critical path of your request. This is much more involved than a local in-memory lookup. With RPC calls, you have to tune your clients to deliberately deal with connection pools, timeouts, retries, etc. We have a detailed blog on this coming soon, but a quick example of how nuanced this can get: caching clients typically set timeouts at 60 seconds, which is meaningfully longer than how long it takes for an Amazon DynamoDB query to result. This timeout may make sense for you, but it has some key implications. During maintenance windows, your connections may hang until they timeout. This could lead to backpressure on your web-server threads. This is covered really well in Tinder’s blog on their experience with Amazon ElastiCache.
You would need to ensure the clients which you are using to interact with the fleet are configured correctly and can deal with timeouts, retries, and failover scenarios correctly. And you need to ensure you are keeping cost under control—the original reasoning behind adding a cache in the first place. But now you must support a caching fleet and a caching team to manage this fleet...which is actually making it more expensive.
Caching is one of the best tools to solve scalability problems. It can help you with hot keys, throttling, or optimizing costs of your underlying databases. This blog outlined the core considerations you should make before you jump into caching and lessons I learned along the way.
Instrumenting your cache can help you keep yourself and your cache accountable. Being explicit about TTL can save you a ton of headache from debugging stale cache data. Strategically choosing what to cache where (locally, look-aside cache fleet, etc) will help you make the most out of your cache. If you apply these core lessons, you will be happier with your cache and for a longer term.