system design · system-design

Design a Distributed Rate Limiter

Token bucket, sliding window, Redis-backed. Asked at Meta E5+, Amazon, Stripe, Cloudflare.

medium3hgeneralredissystem-design

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Limit users to N requests per second. Single-node trivial; distributed requires shared counter that survives concurrent updates without becoming bottleneck.

Token Bucket: bucket holds N tokens; refilled R/sec; request consumes 1. Implemented in Redis via Lua script for atomicity: GET state, compute refill delta, decrement, SET. Sliding Window: count requests in last second window; trades memory for accuracy. Sharded by user/IP key; failure mode = fail-open or fail-closed (deliberate choice).

When to use

API quotas, abuse prevention, fair-use throttling, login attempt limits.

When not to

Tiny scale (in-memory map). Strict SLA needs (hard limit at gateway, not app).

flowchart LR
  Client([Client]) --> GW[API Gateway]
  GW --> RL{Rate Limit Check}
  RL -->|Lua atomic| Redis[(Redis · token bucket)]
  RL -->|allow| App[App]
  RL -->|deny| Block[429 Too Many Requests]

Key insights

Lua script atomicity is the trick, without it, race conditions allow over-limit.
Token bucket allows bursts up to bucket size; leaky bucket smooths.
Sliding window counter (e.g. Cloudflare's "5-min count + interpolation") balances accuracy + memory.
Fail-open during Redis outage if service availability matters more than strict limit.
Layer at edge (CDN/gateway) cheapest; app-layer for granular per-resource limits.