This post is the fourth entry in a series detailing how Momento thinks about our developer experience. Getting started with our serverless cache is shockingly simple, but that is only half of the story.
We believe a delightful user experience includes writing and running code that interacts with your cache, and making our client libraries shockingly simple takes deliberate effort. To say it another way: our cache clients are a first-class concern.
If you’re up for some nerdery related to gRPC clients and identifying performance bottlenecks, read on! However… if you are MORE interested in just getting up and running with a free serverless cache in mere minutes… you should feel free to forego my nerdery and get straight to the fun part. Visit our Getting Started guide and we’ll have you set up with a free cache in a jiffy. Then head to our GitHub org to download the Momento client for your favorite language!
Last time, on Shockingly Simple
In case you haven’t read the previous posts in this series, here’s a recap of season 1:
- Momento clients are built using gRPC, a high-performance framework for remote procedure calls, created by Google.
- Therefore, gRPC client tuning makes up a big chunk of the work for tuning each Momento client.
- Managing the number of concurrent requests in flight at one time on a single client node has proven to be one of the best ways to maximize throughput and avoid CPU thrashing.
- In Python, where the gRPC layer is written in C, there are usually no meaningful performance benefits to using more than one gRPC channel.
Tuning the Momento .NET cache client
As we did in the tuning work for the previous clients, we start off with the super naive approach of just spamming way too many concurrent requests by spawning a bunch of C# async Tasks in a loop.
Why? Because it’s fun!
This led to our first interesting observation!
Not all gRPC libraries are created equal
This is not the case in C#! The .NET gRPC library is a good deal more sophisticated in how it manages connections to the server.
So, if I ran my code in a configuration that produced 100 or fewer concurrent requests, I would observe exactly one connection open via lsof. But if I increased it to 101 concurrent requests, lsof would show a second connection magically opened for me by the .NET gRPC library! At 201 requests, a third connection appeared, at 301 requests a fourth connection appeared, and so on.
This is great! Or is it?
This behavior is really helpful… to a point. It’s wonderful not to have to manage creating and destroying those connections explicitly, and especially wonderful not to have to worry about how to route requests to one of the open connections. But there is a down side…
This implementation meant that in my initial pathological code where I was bombarding the client with 5,000 concurrent requests at all time, the gRPC library was opening as many as 50 connections to the server. There is a point after which having so many connections open and so many requests on the wire yields diminishing returns, and the client’s overhead of dealing with all of those channels actually reduces performance and throughput.
Not to mention that it forces the server to deal with a bunch of extra connections that aren’t providing any value. One client with 50 connections open isn’t a big deal, but if the ideal number of connections is 5 or fewer, and every client is potentially opening 10 times that many, that just bogs everything down for everyone, for no good reason.
So the next step was to figure out how many connections (and concurrent requests) gave the best performance.
Laptop environment: Varying the number of concurrent requests
Here are some graphs showing the performance we got from running tests on a laptop with varying numbers of concurrent requests.
It’s clear that we get diminishing returns well before the 5,000 mark, and likewise that there is no good reason to ever allow the client to create so many connections in this environment.
To address that, we added some code to the Momento .NET cache client that would allow us to configure an internal limit for the number of concurrent requests in flight. We did this via .NET’s usual async Task API, so it’s invisible to you as a user of the Momento client, but it gives us the ability to cap the maximum number of connections that the .NET gRPC library creates to the server.
Even with that code change in place, there’s nothing we can do to prevent user code from issuing 5,000 concurrent requests against the Momento client; we can limit how many are on the wire at once, but not how many async calls are issued. Interestingly enough, running the test code again with 5,000 concurrent async calls, but with the Momento SDK limiting us to 100 concurrent requests on the wire at once, we still get much more consistent throughput and latency numbers due to less context switching. So capping the number of connections is a win all around.
Observations on throughput and resource utilization
Moving from the laptop to the cloud: Tuning goals
Alright, the laptop environment is interesting, but now let’s move to a more realistic production environment: an EC2 instance running in the same region as the Momento service. This should eliminate most of the network latency, so we expect the client-side latency numbers to look much closer to the server-side numbers. (We use a c6i.4xlarge instance type because we have observed more consistent network performance in this class than with smaller instances.)
We’ll start by establishing some latency goals for our tuning:
- Client-side p999 latency of 20ms: This is a reasonable target for applications which are caching data that is extremely expensive to compute (e.g. a big JOIN query in a complex relational database schema) and when the application’s own latency requirements are lenient.
- Client-side p999 latency of 5ms: This is a better target for an application whose own latency requirements are absolutely critical.
Cloud environment: Revisiting the number of concurrent requests
Here are some graphs showing the performance of the client in the cloud environment given various numbers of concurrent requests:
From these charts we can see we are able to hit our first latency target (p999 latency less than 20ms) with 200 concurrent requests.
For an application whose latency requirements are more stringent, we are able to hit our target of a p999 latency of less than 5ms with 25 concurrent requests. Our throughput goes down from about 26,000 to about 11,000, but this will still be the right trade-off for environments that are extremely latency-sensitive.
- Python (with uvloop): 9,100 requests per second
- .NET: 26,000 requests per second
With this data in hand we were able to establish good default values for configuring the Momento client for various environments and workloads. In the Momento .NET SDK, you can now choose between these pre-built configurations: Configurations.Laptop, Configurations.InRegion.Default, and Configurations.InRegion.LowLatency. Of course, you can always customize the configuration for your specific needs, but we hope to provide pre-built configs that will cover the 90% cases so that you can spend your time working on things that are unique to your business.
If you try out our .NET cache client and find any situations where the pre-built configs don’t give you what you need, please file a GitHub issue or reach out to us and let us know! We’d love to fill in whatever gaps are missing.
We enjoy pulling back the curtain a bit to give a glimpse into how we are approaching our client software—taking it just as seriously as our server products. But this is all just for fun and insight, and our position is that you shouldn’t actually need to worry about this stuff with Momento.