system design · system-design
Design a Distributed Rate Limiter
Token bucket, sliding window, Redis-backed. Asked at Meta E5+, Amazon, Stripe, Cloudflare.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
Limit users to N requests per second. Single-node trivial; distributed requires shared counter that survives concurrent updates without becoming bottleneck.
Token Bucket: bucket holds N tokens; refilled R/sec; request consumes 1. Implemented in Redis via Lua script for atomicity: GET state, compute refill delta, decrement, SET. Sliding Window: count requests in last second window; trades memory for accuracy. Sharded by user/IP key; failure mode = fail-open or fail-closed (deliberate choice).
When to use
API quotas, abuse prevention, fair-use throttling, login attempt limits.
When not to
Tiny scale (in-memory map). Strict SLA needs (hard limit at gateway, not app).
flowchart LR
Client([Client]) --> GW[API Gateway]
GW --> RL{Rate Limit Check}
RL -->|Lua atomic| Redis[(Redis · token bucket)]
RL -->|allow| App[App]
RL -->|deny| Block[429 Too Many Requests]Key insights
- Lua script atomicity is the trick, without it, race conditions allow over-limit.
- Token bucket allows bursts up to bucket size; leaky bucket smooths.
- Sliding window counter (e.g. Cloudflare's "5-min count + interpolation") balances accuracy + memory.
- Fail-open during Redis outage if service availability matters more than strict limit.
- Layer at edge (CDN/gateway) cheapest; app-layer for granular per-resource limits.