system design · system-design

Design YouTube Watch History

Google L4+ favorite. Tests append-only event ingestion, time-partitioned storage, materialized views for "Continue Watching," and read-time aggregations under privacy constraints.

hard4hgeneralkafkasqlsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Watch history is an append-only timeline that gets queried two ways: "what did this user watch recently?" (last 50) and "resume where I left off" (most recent unfinished). Writes are millions/sec global, reads are mostly the last 1% of a user's history. Optimize for write throughput + recent-read freshness; let cold storage handle the long tail.

Three tiers: (1) Ingest, every watch event (user_id, video_id, progress_ms, ts) is published to a Kafka log partitioned by user_id. Consumers fan out to: write store, recent-events cache, analytics warehouse. (2) Hot store, sharded by user_id, partitioned by date. Stores last 90 days. Indexed by (user_id, ts DESC) for timeline reads, and by (user_id, video_id) for resume lookups. (3) Cold store, older than 90 days lives in object storage (Spanner-archive or BigQuery), accessed via background scan only. (4) Materialized views, "Continue Watching" maintains user_id → top-K unfinished videos via stream processing on the same Kafka topic.

When to use

Any append-only per-user activity log: watch history, search history, page-view log, podcast listen history, exercise sessions. Identical pattern.

When not to

Mutable user state (profile, settings), use a regular KV with last-write-wins. Real-time fraud detection, needs windowed aggregations, not historical timelines. Cross-user analytics, pivot to OLAP cube, not row-by-row history.

Time: Write O(1), Read last-K O(K), Resume O(1) · Space: O(events) ~petabytes

flowchart LR
  Player([Player]) -->|watch event| Edge[Edge Ingest]
  Edge --> Kafka[[Kafka · partition by user_id]]
  Kafka --> C1[Hot Writer]
  Kafka --> C2[Resume Materializer]
  Kafka --> C3[Warehouse Sink]
  C1 --> HotDB[(Hot Store · 90d · Bigtable)]
  C2 --> ResumeKV[(Continue Watching KV)]
  C3 --> Cold[(BigQuery · Archive)]
  ReadAPI{{Read API}} --> HotDB
  ReadAPI --> ResumeKV
  ReadAPI -.range scan.-> Cold
  Privacy[Privacy Service] -.tombstone.- HotDB
  Privacy -.tombstone.- ResumeKV
  Privacy -.tombstone.- Cold

Key insights

  • Append-only writes scale linearly with shards. Make user_id the shard key so all writes for one user land on one partition, sequential disk writes, cache friendly.
  • Hot/cold split lets you optimize storage cost. 99% of reads are <30 days old; <1% need archive. Cold tier costs ~10x less per GB.
  • Continue Watching is a separate materialized view, do not derive at read time. Stream processing on Kafka emits one update per video to a KV store; reads are O(1).
  • Privacy = first-class component, not afterthought. Incognito mode = client never emits the event. Delete = tombstone written to Kafka; consumers apply soft-delete + scheduled compaction.
  • GDPR right-to-erasure = irrevocable delete within 30 days. Tombstones propagate to hot + cold + warehouse + materialized views via the same Kafka topic.
  • Read p99 budget: ~50 ms for "recent watch list" (1 Bigtable point lookup), ~5 ms for "resume" (1 KV lookup). Cold reads can take seconds, only used for "see full history" UI.
  • Event ordering matters within a user partition (Kafka preserves it). Across users, does not matter. Pick partition key accordingly.