If you run an OTT service, you probably obsess over picture quality, rebuffering rates, and the next rights deal. But there’s another metric quietly eating your margin: how many people are really watching your content right now on a single set of credentials. By ‘people,’ I’m including the password-sharers, bot-scrapers, and multi-device power users who don’t show up on your subscriber growth charts but definitely show up on your server load.
This is the domain of concurrency tracking. And when done right, concurrency tracking is one of the most practical tools you have to fight a rapidly growing problem: CDN leeching.
In this post, we’ll unpack what CDN leeching is, why it’s worse than “just” password sharing with friends and family, and how a real‑time concurrency service backed by a low‑latency data plane like Momento can help you shut it down, without breaking your user experience.
CDN leeching: piracy powered by your infrastructure
At a high level, CDN leeching is piracy that runs on your dime. Instead of spinning up their own infrastructure, pirates:
- Obtain valid URLs and decryption keys (often via stolen or “resold” legitimate accounts).
- Automate access with bots and restreaming platforms.
- Drive large numbers of unauthorized viewers through your CDN endpoints.
From the CDN’s perspective, all of this traffic looks legitimate. Tokens validate. Segments are cached. Edge nodes happily serve gigabits of premium content… to viewers who aren’t paying for your service!
According to analyst firm Kearney, online video piracy is already costing the global media industry around $75 billion annually, with losses projected to reach $125 billion by 2028, and CDN leeching is one of the techniques feeding that growth.
To make matters worse, modern pirate services look and feel like real streaming platforms: slick UX, login systems, promo codes, “support” channels. But behind the scenes, they’re often just reselling your content and your infrastructure capacity.
Why CDN leeching hurts OTT businesses so much
CDN leeching is brutal because it attacks you in three dimensions at once:
- Lost subscriptions and ARPU
When a “vampire” service offers your full catalog for a fraction of your price, backed by your own CDN quality, some users will choose the pirate instead of your subscription. You lose revenue, while they pocket it. - Inflated CDN and infrastructure bills
Every unauthorized stream inflates:- CDN egress and HTTP request costs
- Origin load (for cache misses)
- Downstream services (authentication, recommendations, analytics…)
- As a direct consequence, you face the risk of a degraded quality of experience for your legitimate paying users, who could just see:
- More buffering
- Lower renditions / ABR downshifts
- Timeouts during big live events
In the worst case, leeching can overload specific streams or regions, causing visible service degradation right when you can least afford it: during tentpole sports, premieres, or live concerts. The net effect: you’re paying more, earning less, and frustrating the customers you care about most.
DRM alone isn’t enough
Most premium OTT providers already use Digital Rights Management (DRM), secure players, and tokenized URLs. That’s necessary, but increasingly not sufficient.
CDN leeching thrives in the gaps between those layers:
- Once a token is valid, the CDN trusts it. If that token is reused by an entire pirate service, the CDN will serve them content until the token expires.
- DRM protects bits, not business rules. It ensures only authorized devices can decrypt content, but it doesn’t inherently enforce “only two concurrent streams per household” or “only this geography.”
- Credential sharing is a feature pirates exploit. A single legitimate account can be resold or shared widely, and as long as each device passes DRM checks independently, your infrastructure can’t distinguish “family account” from “pirate reseller.”
This is where server‑side concurrency enforcement becomes powerful. Instead of just asking “is this device allowed to decrypt this segment?”, you also ask “should this account still be allowed access to the content, given everything else it’s doing right now?”
Concurrency tracking as an anti‑leeching control plane
Think of a concurrency service as your real‑time policy engine for “who gets to keep watching.” It doesn’t replace DRM, tokenization, or watermarking: it orchestrates them.
At a high level, an effective architecture looks like this:
- A player requests a segment (HLS or DASH) for a given piece of content.
- Your API gateway or edge function authenticates the request (JWT, OAuth, session cookie, etc.).
- Before issuing or honoring the signed CDN URL, your backend calls a concurrency service:
- Inputs:
account_id,content_id,device_id(or fingerprint), possiblyip/geography, andsegment_idor timestamp. - Logic: increment a counter, validate thresholds, and return an allow/deny decision in milliseconds.
- Inputs:
- If allowed, the service records the session (or segment‑level entry) with a tight TTL and returns a valid signed URL / 200 OK.
- If denied:, the backend refuses the request and signals the player to stop (e.g., custom error code, UX message: “Too many devices streaming at once”) and can optionally record telemetry for fraud analytics or customer support.
In terms of performance, to be really effective, this concurrency check must run inline on every request or on every few segments, without becoming a bottleneck. For a large service operator, this needs to handle millions of operations per second, with single-digit millisecond latencies.
The key properties are:
- It has to be real time: Decisions are made in milliseconds, not minutes. When a pirate spins up 200 devices on one stolen account, you detect and clamp down almost immediately.
- Globally consistent enough: You’re not trying to maintain banking‑grade ACID semantics. You’re enforcing policy over short sliding windows, which is a perfect match for fast, ephemeral, in‑memory data.
- Entirely server‑side: No client modifications are required beyond normal auth. Pirates can’t bypass the check without also finding a way to bypass your API.
Modeling concurrency: from accounts to segment
There are many ways to model concurrency; the right choice depends on your product and risk tolerance.
Per‑account, per‑content concurrency
A common pattern:
- Primary key:
account_id - Secondary dimensions in the value:
content_id,device_id,session_id,timestamp
You maintain an in‑memory structure that tracks active sessions for that account:
- Max total streams per account: e.g., 2, 4, or 10 depending on tier.
- Max streams per piece of content: e.g., prevent restreaming a single live sports match out to hundreds of devices on the same account.
- Optional per‑device or per‑IP caps.
On each check, you:
- Drop expired entries (based on last segment timestamp).
- Count active sessions according to your rules.
- Decide whether to admit a new session or segment request.
Segment‑level granularity
For more surgical control, if you operate high-profile live events, like major sports competitions, you can push enforcement down to the segment:
- Key:
account_id:content_id:segment_sequence - Value: set of
device_ids currently fetching that segment - TTL: slightly longer than segment duration, like 2 or 3 times the length of a segment.
This lets you detect anomalous patterns like:
- Hundreds of devices fetching the same segment within a tiny window using the same account.
- A sudden explosion of geographically dispersed IP addresses all requesting identical content immediately after a legitimate login.
Because each segment key naturally expires, the system self‑garbage‑collects: you’re always enforcing concurrency over “what’s happening right now,” not trying to maintain a giant historical database.
Why does a millisecond‑grade cache matter?
Concurrency enforcement only works if it’s fast enough to sit in the hot path.
An OTT provider handling millions of requests per second cannot afford:
- Round trips to distant databases for every segment request.
- Heavy joins or complex transactions just to say “yes/no” on a stream.
A low‑latency data plane like Momento is a good fit because:
- Throughput: It’s built to sustain millions of operations per second across regions.
- Latency: Single‑digit millisecond p99s keep you comfortably under your segment budget.
- Ephemerality as a feature: You want the concurrency state to be short‑lived and automatically expiring. TTL‑driven data models avoid manual cleanup and keep costs predictable.
- Simplicity for app teams: Concurrency checks look like simple
GET/SET/INCRpatterns rather than bespoke, hand‑rolled distributed systems.
In other words, you can treat concurrency as a straightforward service call, not a brand‑new platform you have to build and maintain.
How concurrency tracking helps fight CDN leeching specifically?
So how does all of this tie back to CDN leeching? CDN leeching relies on a few core assumptions:
- Once a token is valid, the CDN will serve anyone who can present it.
- Operators won’t notice if bandwidth usage and license counts drift apart.
- Credential sharing is effectively unbounded until a human intervenes.
A robust concurrency service breaks these assumptions:
- Hard caps on concurrent usage per account. If a pirate tries to resell one account to hundreds of users, they’ll quickly hit concurrency limits. Your backend will start rejecting their segment requests, and their “service” becomes unusable.
- Anomaly detection at the account and content level. Because you aggregate real‑time usage by account, content, and device, you can spot suspicious patterns—like one account watching 40 streams of the same live match from dozens of countries—and feed that data to your anti‑piracy or fraud systems.
- Alignment between licenses and bandwidth. When you couple DRM license issuance with concurrency tracking, you can enforce “no segment without an active, counted session.” That shrinks the gap pirates exploit when they replay keys or tokens beyond their intended scope.
- Low blast radius. Instead of globally blacklisting an IP range or tearing down a CDN endpoint—both risky moves during big events—you can surgically terminate abusive accounts and their associated streams, while legitimate users continue unaffected.
Put simply: CDN leeching turns your service into a free CDN for pirates. Concurrency tracking gives you a kill switch that operates at the exact granularity you care about: per account, per title, or per segment.
Concurrency tracking as part of a layered defense
Concurrency enforcement is not a silver bullet; it’s a high‑leverage component in a multi‑layered strategy that also includes:
- Strong DRM implementations and regular key rotation.
- Short‑lived, tightly scoped CDN tokens.
- Application hardening to resist reverse engineering.
- Forensic watermarking to trace leaks.
- AI‑driven anomaly detection.
- Legal takedowns and enforcement.
But it has a unique advantage: it is entirely under your control, implemented on your servers, and it directly governs whether a given request gets content.
With a low‑latency, highly scalable data plane like Momento behind it, a concurrency service can make those decisions in real time, at massive scale, without complicating your player or bloating your infrastructure.
Lionel Bringuier

