Momento announces $15M Series A funding. Topics event bus released for general availability. Read more →
July 26, 2022 - 7 Min Read

Finally, a serverless cache that delivers on the promise of the cloud era

It's time to demand more from your cache.

Alex DeBrie


It is an incredible time to be a developer.

The tools available to quickly build, scale, and operate a new application are miles ahead of what they were just a decade ago. Even for early adopters of the cloud, who were running most applications and infrastructure on self-managed EC2 instances, rapid spikes in traffic could lead to user downtime and a long night.

Today, we have AWS Lambda, a serverless, pay-per-use compute service that can scale to infinity and runs only when needed. We have serverless databases like Amazon DynamoDB and fully managed systems like Google Spanner and MongoDB Atlas that can dynamically handle any workloads we throw at them. We have CDNs like CloudFlare that can speed up response times and block DDoS attacks of over 17 million requests per second.

These services are available to any developer with a credit card and an idea. They provision in seconds and scale dynamically to an application’s needs. They take full advantage of the scalability and elasticity of the modern cloud environment.

And yet, somehow our caches are stuck in the 20th century. They’re slow to provision and require careful planning of cluster size or extreme overprovisioning to avoid incidents. They provide unpredictable tail latencies that lead to bad experiences for users or downtime for our applications. And they charge you full price for your instances even when your utilization is low.

This is why Momento built its serverless cache service. It’s a cache for the 21st century—for the serverless, cloud-enabled era—that doesn’t slow you down while you’re developing a new idea and holds up during peak demand.

Below, we’ll discuss problems with previous-generation caching options and how Momento solves them. But if you’re already sold, go ahead and get started with Momento now.

The autoscaling cache you need

Instance-based provisioning is one of the worst parts about legacy infrastructure.

With instance-based provisioning, you are responsible for picking and managing a specific virtual instance. You need to choose CPU, RAM, disk, and network configurations. You need to monitor low-level system metrics to ensure you’re not overloading the instance. You need to think about failovers, redundancies, and backups.

This is a datacenter with an API. It’s an improvement over running your own datacenter, but there’s still work to be done. You’re still susceptible to and responsible for incidents that can affect your specific instances and lead to downtime for your application.

Worst of all, you’ll face a tradeoff inherent to inelastic infrastructure. You either overprovision your cache to handle peak traffic and pay for it in your bill at the end of the month, or you underprovision to save a few dollars but leave your users to deal with the implications when your cache can’t keep up.

The modern infrastructure we love is not instance-based infrastructure, it’s resource-based infrastructure.

With resource-based infrastructure, your infrastructure provider is sharing a giant pool of resources across many customers in a particular datacenter. 

With AWS Lambda, this means a giant pool of worker instances that are hosting Lambda functions for customers. For Amazon DynamoDB, this means region-wide request router and metadata services, along with a pool of storage instances for managing customer data.

By using resource-based infrastructure that shares infrastructure across customers, infrastructure providers are able to supply a level of dynamism and resiliency that is impossible from instance-based infrastructure. A single customer can’t send enough requests to take down the DynamoDB request router for a region, nor can a single customer overload the worker pool for AWS Lambda in a region.

Furthermore, by designing for a pool of shared resources up front, infrastructure providers have to plan for and handle failure in a different way. At that level of shared infrastructure, instance failure is a fact of life, and individual failures can affect multiple customers. Providers need to design for failover situations and API limits to prevent one customer from affecting another.

Momento is the first cache that uses resource-based infrastructure. This cloud-first design allows us to provide a dynamically scaling cache that doesn’t require pre-provisioning or coarse-grained “autoscaling” policies.

For you as a user, this means less time in Excel spreadsheets doing capacity planning based on clumsy load tests. It’s fewer outages based on an unpredicted spike in traffic. It’s a cache that just works, rather than one that gets in your way.

The pay-per-use cache pricing you want

Resource-based infrastructure not only means that Momento handles fluctuations in traffic better. It also means you get a pricing model that better aligns with the value you’re getting from your infrastructure.

With instance-based infrastructure, you pay for your provisioned instances no matter what. If it’s night time, your users are sleeping, and your utilization rate is in the single-digits, you’ll still be paying the full rate for your instance. And because instance-based infrastructure is harder to scale up and down, you’ll likely spend a lot of time over-provisioned in anticipation of your traffic peaks.

With Momento’s resource-based infrastructure, you are charged based on the resources you actually use. Pricing is simple—you pay per GB of data that you write to or read from Momento. That’s it—no always-on monthly cost, no data tiering calculations, and no memory limits.

This pricing model makes a tighter connection between the value you get and the price you pay. It also makes it easy to have Momento Caches in staging and developer environments, as you won’t be paying for always-on infrastructure in ephemeral test environments.

The unparalleled cache performance your clients deserve

Billing, scaling, and management are important, but caches are ultimately about performance. 

When you add a cache to an application, you want ultra-low latency. You want it when your application needs it the most: at periods of high traffic. And you want predictability and consistency so you can plan effectively.

Momento shines here as well. In thinking about performance testing, they’ve focused on two key metrics:

  • Throughput, measured in terms of requests per second (RPS); and
  • Client-side latency, measured in terms of the number of milliseconds (ms) to receive a response. 

When thinking about latency, Momento focuses on client-side latency (how long it takes the client to receive a response), as server-side latencies miss the variability on the network and impact of cross AZ traffic. The network and connection semantics are an unavoidable part of accessing your data, so it’s best to see how quickly your cached data will be available in your application.

Additionally, latency is a distribution and metrics are generally reported at a specific percentile, as server-side latencies miss the variability on the network and impact of cross AZ traffic. Momento strongly believes that tail latencies matter and thus focuses on “p999” latency. This is the measurement at the 99.9% distribution of your latency metrics. In other words, only 0.1% of requests will be slower than your p999 latency. Note that this is measuring requests to your cache. If a single web request results in multiple requests to your cache, this will affect more than 0.1% of your users.

Momento customers regularly load test the service and consistently report sub 5ms p999 latencies at the client. Recently, the team completed a routine soak test (500K RPS load) with 2.5ms p99 latencies at the client.

Bottom line: Momento is ready and able to handle your caching needs at a scale that other caches have a hard time matching.


The Momento team loves how infrastructure options have progressed over the past decade, and they want your cache infrastructure to catch up. They’ve built a cache that’s specifically designed for the unique properties of the cloud rather than for instance-based, self-hosted infrastructure. The result is a dynamic cache with no pre-provisioning of instances, pay-per-use pricing, and unparalleled performance

But don’t take my word for it—try it out yourself. You can provision a Momento Cache in seconds, and you can integrate with your application with just 5 lines of code.

For more serverless content, developer perspectives, customer stories, and product updates, follow Momento on Twitter and LinkedIn—or start a conversation on Discord!

Alex DeBrie is a consultant focused on helping people using cutting-edge, cloud-native technologies. He specializes in serverless technologies on AWS, including DynamoDB, Lambda, API Gateway, and more.