system design · system-design
Design Microsoft Teams (Chat + Presence + Voice/Video)
Channels + persistent chat + presence + voice/video + meetings. Microsoft's most-asked SDI.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
Slack/Discord clone: persistent channels, threads, mentions, file share, voice + video calls. Multi-tenant SaaS, every enterprise gets isolation. Messages durable; presence ephemeral; meetings stream via SFU.
Chat: messages persisted to per-tenant Cosmos DB partition by channel_id; fan-out via SignalR / WebSocket to online members. Presence: per-user heartbeat to in-memory store; multi-key (chat-available, calling, in-meeting). Meetings: signaling server connects participants; SFU (Selective Forwarding Unit) routes media streams; recording uploaded to blob storage. Files via OneDrive integration.
When to use
Enterprise chat + meeting products. Same pattern as Slack, Zoom Chat.
When not to
Tiny teams, overkill. Pure VoIP, different focus.
flowchart LR Client([Client]) --> WS[WebSocket / SignalR] WS --> Chat[Chat Service] Chat --> DB[(Cosmos DB · per tenant)] Client --> Pres[Presence Service] Pres --> Redis[(In-memory presence)] Client --> Sig[Meeting Signaling] Sig --> SFU[SFU Media Server] SFU --> Record[(Recording → Blob)] Files[File Share] --> OD[OneDrive]
Key insights
- Channels are the partition key, single channel must stay on single shard for ordering.
- Presence is best-effort, eventual; never persisted long-term.
- SFU forwards media without decoding/re-encoding, lower cost than MCU.
- Recording is async, uploads after meeting ends, post-processed.
- Per-tenant data isolation is non-negotiable in enterprise, separate DB partitions.