split illustration showing Mo struggling to carry a chaotic pile of uneven blocks on the left, contrasted with Mo calmly organizing blocks into neat, separate lanes on the right.

Disaggregated Inference, Part 1: When & Where to Route

Hien Luu Hien Luu

The Concurrency Cliff is a Memory Limit

Khawaja Shams headshot Khawaja Shams

The Concurrency Cliff is a Memory Limit

Khawaja Shams headshot

Your KV Cache Benchmark Is “hi hi hi”

Khawaja Shams headshot

vLLM’s Hash Chain, SGLang’s Radix Tree

Khawaja Shams headshot

Disaggregation Makes KV Cache a System Primitive

Khawaja Shams headshot

KV Caching Pays Off Under Load

Khawaja Shams headshot

Beyond the Goals, Three Ways Momento Scales the Football World Cup in Real Time

A New Live Streaming Origin Built for Global Scale

Introducing valkey-lab: Stop Guessing When Your Cache Hits Its Limit

Khawaja Shams headshot

Why Snap Was Willing to Fork, and Why They Still Came Back

Why Large Payloads Break Caches at Scale

Disaggregated LLM Inference, Part 3: Why Your Networking Stack May Not Be Ready

Hien Luu

Disaggregated Inference,Part 2: Moving the KV Cache Without Stalling the Decode

Hien Luu

The Snowflake Moment for Inference

Khawaja Shams headshot

Disaggregated Inference, Part 1: When & Where to Route

Hien Luu

Prefill and Decode Want Different Chips. The Economics Finally Agree.

Hien Luu

1-Bit Models Just Moved the Pareto Frontier

Khawaja Shams headshot
Hien Luu

Your AI Remembers Everything Except the Thing You Keep Telling It

KV Cache Isn’t a Caching Problem

The Rise of the Internal Cache Platform

A Roadmap for KV Cache Offloading at Scale