system design · system-design

Design Netflix Recommendation Engine

Offline candidate generation + online ranking, "row-of-rows" homepage, A/B testing infra, personalization signals.

expert5hgeneralml-aisystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

Homepage = rows of titles. Each row generated by a different "row algorithm" (Continue Watching, Because You Watched X, Trending Now, etc.). Each row picks candidates, then a meta-ranker orders rows themselves. Personalization compounds through both layers.

Offline: nightly batch ranks all titles per profile using factorization + neural collab filter. Stored per (profile_id, row_id) in materialized cache. Online: at homepage load, fetch row materializations, apply real-time signals (recency, fresh releases), pick top-K rows + top-K per row. A/B framework allocates experiments; metrics dashboards measure long-term retention not just CTR.

When to use

Content discovery, e-commerce homepages.

When not to

Strict ordering required (search by intent).

flowchart LR
  Watch[Watch History] --> Offline[Offline Pipeline · nightly]
  Offline --> Models[Row Models · CF, content, trending]
  Models --> Cache[(Materialized Row Cache · per profile)]
  HomeAPI[Homepage API] --> Cache
  HomeAPI --> RT[Real-time Signals]
  HomeAPI --> Meta[Meta-Ranker · row order]
  Meta --> Client[Client]
  AB[A/B Framework] -.flag.-> Meta
  AB --> Metrics[(Long-term Retention)]

Key insights

  • Two-stage = row picks candidates, meta-ranker orders rows. Both personalized.
  • Offline materialization saves online compute; freshness via real-time overlay.
  • A/B success metrics = retention + engagement, not just CTR.
  • Diversity injection prevents filter-bubble (always show one row outside comfort zone).
  • Cold-start handled by content-based + onboarding survey.