system design · system-design

Design a URL Shortener (TinyURL / bit.ly)

Classic system-design starter. Tests capacity estimation, ID generation, read-heavy storage, caching, and analytics. Asked at Amazon, Microsoft, Uber, LinkedIn.

medium3hgeneralsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

A URL shortener is a glorified key-value store with two extra problems: choosing keys that are short, unique, and unguessable; and serving redirects at very low p99 latency. Nearly every component is read-optimized because reads outnumber writes by ~100:1.

Three layers: (1) Write path, generate a short code, persist mapping long_url → short_code with reverse index. (2) Read path, short_code → long_url lookup, served from cache, falling back to durable store. (3) Analytics path, async event pipeline records click metadata without blocking the redirect.

When to use

Whenever you need stable, sharable, traceable handles for opaque payloads (URLs, document IDs, share links). Same template applies to share-link services in Drive, Photos, Notion.

When not to

When the underlying content is mutable and the short code must reflect the latest version (use versioned IDs instead). When attribution/security requires per-user resolution (sign the link instead).

flowchart LR
  Client([Client]) --> LB[Load Balancer]
  LB --> RS{{Redirect Service}}
  RS -->|cache hit 95%| Cache[(Redis Cluster)]
  RS -->|cache miss| KV[(KV Store / DynamoDB)]
  Cache -.warm.-> KV
  RS -.fire-and-forget.-> Q[[Event Bus / Kafka]]
  Q --> AW[(Analytics Warehouse)]
  Writer[Write API] --> IDG{{ID Generator base62+counter}}
  IDG --> KV
  KV -.prime.-> Cache

Key insights

  • Read:write ratio is the dominant force, every choice should favor reads.
  • Counter + base62 gives shortest codes and no collisions; hash-based gives no central counter but needs collision resolution.
  • 301 (permanent) vs 302 (temporary) redirects: 301 caches in browsers, killing your analytics. Use 302.
  • Short code length: log62(N), at 1B URLs, 6 chars (62^6 ≈ 56B) is comfortable.
  • Cache is the read tier; the database is the durability tier. Most production shorteners serve >95% from cache.
  • Analytics must not block the redirect, fire-and-forget to a queue.