system design · system-design

Design Teams Voice & Video (WebRTC + Signaling)

WebRTC, signaling, TURN/STUN, recording. Real-time low-latency media.

hard4hgeneralsystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

WebRTC handles browser-native low-latency audio/video. Signaling negotiates session; SFU routes media in groups. Recording duplicates streams. NAT traversal via STUN; relay via TURN when direct fails.

Signaling server runs SDP offer/answer + ICE candidate exchange. Clients hit STUN to discover public address; fall back to TURN relay if NAT symmetric. Group calls use SFU (cheap forward, no decode). Simulcast: client sends 3 quality layers; SFU picks per-receiver based on bandwidth. Recording: SFU mirrors all streams to a recorder service which transcodes + uploads.

When to use

Real-time voice/video products: meetings, telehealth, gaming voice chat.

When not to

Pre-recorded streaming (use HLS). Sub-50ms (use UDP custom).

flowchart LR
  A([Caller]) --> Sig[Signaling Server]
  B([Callee]) --> Sig
  Sig -->|SDP+ICE| A
  Sig -->|SDP+ICE| B
  A -->|STUN| Stun[STUN]
  A <-->|P2P or via TURN| B
  Group[Group Call] --> SFU[SFU Media Server]
  SFU --> Rec[Recorder]
  Rec --> Blob[(Recording Blob)]

Key insights

  • SFU is the cost optimization, never decode in the cloud.
  • Simulcast lets each receiver request appropriate quality without bothering sender.
  • TURN relays cost real bandwidth, minimize via good STUN coverage.
  • Recording is a side-channel, adding/removing it does not affect live participants.
  • ICE failure rate ~5%, TURN fallback non-optional.