Prefill and Decode Want Different Chips. The Economics Finally Agree.

Splitting inference across specialized hardware can cut costs dramatically, but making it work in production depends on better scheduling and data movement.
Why Scaling Looks Different at Uber, Apple, and Mercado Libre

Three companies, three scaling challenges, three completely different solutions. Why the best scaling advice from Uber, Apple, and Mercado Libre might not work for you, and what to look for instead.
Reduce TTFT by >50% with LMCache + Momento Accelerator

How distributed KV caching with LMCache and Momento Accelerator enables unified access to remote token storage, improving inference efficiency at scale.
リアルタイム・データ処理とは何か?

リアルタイム・データ処理とは何か、なぜそれが企業にとってミッションクリティカルなのかを学ぶ。
How we turned up the heat on Node.js Lambda cold starts

We reduced a customer’s Lambda cold starts by 90%—and then did the same for ourselves!
Quick Primer on ElastiCache Redis Maintenance Windows

When is your window?
Turbocharging Pelikan Cache on Google Cloud’s latest Arm-based T2A VMs

Momento exceeded throughput and latency goals for its serverless cache by 25% with Google’s latest Arm-based T2A VMs.
Oops, Momento ate 98% of my GCP Cloud Run and Firestore latencies!

Serverless caching makes it easy to reduce your Cloud Run and Firestore latencies.
Shockingly simple: Tuning the Momento JavaScript cache client

We optimized our node.js gRPC client so you don’t have to.
Faster APIs, faster developers: API Gateway custom authorizers

How to add a custom authorizer and reduce API latencies with remote caching.
Oops, Momento ate 60% of my Lambda latencies!

Accelerate your app in minutes with a serverless cache.
Making Pelikan fly on Arm: Diving deeper into our adventures with Tau T2A VMs

Discover how Momento tripled throughput on Google’s Arm-based VMs with simple optimizations and zero code changes.