system design · system-design

Design Netflix Video Streaming End-to-End

Encoding pipeline → multi-bitrate manifest → Open Connect CDN → ABR playback → resume, sub-2s start, 99.99% availability.

expert5hawsgeneralsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Stack of layers each independently optimized: encode once (offline), fan-out to edge (Open Connect), serve manifest + segments adaptively, client switches bitrate on the fly. Goal: <2s startup, smooth playback, 99.99% availability.

Ingest → encoder farm produces per-title bitrate ladder (per-title encoding optimization). Package into HLS + DASH manifests. Push to Open Connect appliances inside ISPs. Client: requests manifest from control plane (auth), then fetches segments from nearest OCA. ABR algorithm picks rung based on measured throughput; switches without re-buffer. Resume position synced via per-user state service.

When to use

Premium SVOD platforms.

When not to

Live streaming (different optimization for latency).

flowchart LR
  Master[Master] --> Encode[Per-Title Encoder]
  Encode --> Ladder[Bitrate Ladder]
  Ladder --> Pkg[HLS+DASH Packager]
  Pkg --> Origin[(S3 Origin)]
  Origin --> OCA[Open Connect Appliances · in ISP]
  Client([Client]) --> Control[Control Plane · auth + manifest]
  Control --> Origin
  Client --> OCA
  Client -.ABR.-> OCA
  Resume[(Resume State)] -.cross-device.-> Client

Key insights

  • Per-title encoding cuts bandwidth ~20% with no quality loss.
  • Sub-2s startup requires manifest pre-loaded + first segment cached at edge.
  • ABR switching happens segment-by-segment, never block playback.
  • Resume state must be sync'd cross-device with <2s eventual consistency.
  • 99.99% achieved via multi-region + chaos engineering testing every failure mode.