10 years ago, we made a tough decision: instead of investing cycles in shipping faster, adding more tests to existing features, or delivering new features, we chose to invest in a caching face in our backend Java Spring (Boot) Library that is shared across several applications. Our effort has continued to pay dividends even today. First, it enabled any Java developer to rapidly add caching throughout their stack with a simple annotation (to any public function) like:
Second, it made our prolific use of caching more tenable. Instead of writing basic caching tests for each use case throughout our application, we got more leverage by putting caching tests around the backend library. This applied to bugs as well—due to this facade, we haven’t had to fix the same bug in each cache. Similarly, upgrading our cache implementation across *all* our java cache applications became so much more tenable. Third, it allowed us to make our bet on Memcached a two-way door. If Redis (or a new engine) became better suited for our performance or availability needs, we could quickly swap it in.
We anticipated a ton of innovation in the caching space over the years—but we never expected to be able to swap out the entire management of caching for a serverless cache! There are more options than ever on databases, caching, and broader backend platforms. To deliver the best customer experience possible, the best teams are getting nimbler at evaluating and adopting new technologies.
This blog covers how this facade enabled us to quickly evaluate and deploy a new caching service with simple configuration changes and no code changes! We will go over our existing solution, the evaluation criteria, and our results. Ultimately, we were able to meaningfully enhance our ElastiCache-backed caching setup with Momento Serverless Cache. The application we tested this with is built on Java running in a Spring Boot framework on AWS. This is a performance sensitive application, where availability matters.
Facades matter! Baeldung says it best: "a facade encapsulates a complex subsystem behind a simple interface. It hides much of the complexity and makes the subsystem easy to use." Facades become a very powerful tool when combined with an elegant framework like Spring (Boot).
The challenges with our existing Caching setup
Slow scale up. Our load is bursty with the bulk of our load occurring during high-profile events. Infrastructure autoscaling often takes more than 10 minutes to detect the need for and to add capacity. We often find that the burst is over before autoscaling kicks in.
Cold caches during scale up. Scaling changes the cache topology, leading to cold caches. This is particularly tough on our systems during peak load and thus undesirable.
Overprovisioned cache fleets. Since scale up during events is risky and scale down is manually intensive, we typically end up leaving our clusters overprovisioned for peak load. Scaling down is manual and error prone, with meaningful availability risk.
Hot keys and hot shards cause outages. We define a hot key as a key that generates enough load to overwhelm a shard. A hot shard is a shard that is forced to serve more requests than it can handle without compromising availability or latencies. A hot shard could be hot due to a single hot key or a collection of collocated keys that in aggregate overwhelm the node. In our case, we routinely would have hot keys that would saturate the network bandwidth of a single node, causing cascading failures across our clusters as we overwhelm a node, reshard, and overwhelm the next node.
To address this in the short term, we have had to move to the biggest instances across all our shards: this gives us more network per node, but it ends up forcing us to scale our entire cluster even though the bandwidth requirements would be limited to a single node. Traditional solutions to these problems are doing application-level sharding or adding more replicas. This is cumbersome work that we have had on our backlog—but since we are busy adding customer-facing improvements to our apps, we have swallowed the additional infra costs for a simpler architecture thus far.
Maintenance windows are frequent and cool our cache. Amazon ElastiCache has a maintenance window of one hour/week. We know exactly when our maintenance windows are. And even if we forget, our dashboards remind us when nodes are rebooted for maintenance—leading to cold caches, higher application server/database load, and elevated latencies for our customers. This is not much of a problem for caches supporting short-duration events—like a football game—but they are particularly impactful on the majority of our app-based experiences that our customers use throughout the year.
How does Momento Serverless Cache help?
By switching to Momento Serverless Cache we were able to address the operational issues we were having with our legacy caching cluster in the following ways.
Instant scaling to quickly address traffic spikes. Momento does not have a concept of provisioned capacity. As a serverless service, the Momento team keeps warm capacity available behind the scenes that can be added to our bursty caches instantly—instead of waiting minutes for new instances to be provisioned and added to our clusters.
Cache stays warm during scaling events. Momento scales up and down gracefully, warming up new nodes as they are introduced into the cluster. This allows us to sustain high cache hit rates and low latencies, especially for our most popular objects.
Intelligent scaling to handle hot keys. Momento samples for hot keys and automatically makes more copies of hot keys. We no longer have to worry about a hot key overwhelming a single node. Similarly, hot shards are not our problem—Momento automatically detects this, reshards, and gracefully warms up new capacity in these scenarios.
No planned downtime. With the cache warming techniques that the Momento team has deployed in practice, upgrading software on a fleet is not much different than the scale up/scale down events. This allows the Momento team to continuously patch and optimize the software behind the scenes seamlessly without impacting our latencies, load, or cache hit rates.
How did it go?
When we first met with the Momento team, we spent a few minutes showing them our facade. We set a goal to minimize application changes and expressed our desire for a “drop-in” solution to enhance our current cache solution. Much to our surprise, the solution they came up with could not have been a better fit. Together, we built support for Momento as a provider in our existing Spring Memcached facade. Our facade enables developers to use annotations to cache objects into Memcached—and through the Momento provider, we were able to swap Memcached with Momento across all services using our Spring Memcached facade. The library is now available here (open-source under Apache license).
Once we had the new Momento provider, we were able to quickly swap Momento across multiple workflows and assess the performance and scale impact. Momento outperformed ElastiCache on our workflows by 15% on latencies and is able to automatically handle approximately 12X our current peak loads, without wasteful overprovisioned infrastructure or distracting scaling efforts from our engineers. The ability to instantly scale to handle burst, automate support for hot shards/hot keys, and no maintenance windows dramatically improved our operational stance and overall customer experience.
Momento may be a great serverless caching service—but the winner here is facades. They enable us to quickly evaluate the performance and scale of a new service, make the deployment painless, and make the adoption a two-way-door. With facades, we can easily roll back to ElastiCache Memcached if we change our minds or move on to a new service if a better one comes out.
A deeper dive into CBS Sports facades and Momento is coming soon.
Edwin Rivera is a Principal Architect at CBS Interactive.