system design · system-design

Design an Amazon Fulfillment Center System

Inventory tracking, robotics control, picking/packing optimization, order routing. Tests cyber-physical workflow design + real-time coordination at warehouse scale.

hard4hawsgeneralkafkasystem-design
Ask GPTConfidence

Theory

Explanation

Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.

A fulfillment center is a real-time control plane over physical inventory. Every SKU has a location; every robot has a task; every order has a deadline. Software must orchestrate thousands of robots + humans + conveyor belts + sortation machines such that an item moves from shelf to truck within hours, with zero loss.

Five subsystems: (1) Inventory Service, authoritative state of every SKU × bin location, eventually consistent across replicas; (2) Robotics Dispatcher, accepts pick tasks, runs A*/heuristics across the floor graph, assigns to nearest free drive unit; (3) Workstation Service, choreographs human pickers + barcode scans, validates picks; (4) Sortation Engine, chute assignment from package → outbound dock based on carrier + truck schedule; (5) Exception Handling, damage, missing item, mis-pick, escalates to operator UI. All glued by Kafka event log; every state change is an event sourced from immutable log.

When to use

Any high-throughput physical-goods flow: warehouses, ports, postal sorting, parts manufacturing.

When not to

Small inventories with manual ops, overhead exceeds benefit. Pure-digital goods.

Time: p99 dispatch <100ms · Space: O(SKUs × locations + active tasks)

flowchart TB
  Order[Order Stream] --> Plan[Wave Planner]
  Plan --> Pick[Pick Tasks]
  Pick --> Dispatch{{Robotics Dispatcher}}
  Dispatch --> Floor[Drive Units Fleet]
  Floor --> WS[Workstation]
  WS --> Pack[Pack Station]
  Pack --> Sort[Sortation Engine]
  Sort --> Dock[Outbound Dock]
  Floor -.events.-> Kafka[[Kafka Event Log]]
  WS -.events.-> Kafka
  Kafka --> Inv[(Inventory Service)]
  Kafka --> Exc[Exception Handler]
  Exc --> OpsUI[Operator UI]

Key insights

  • Event sourcing is mandatory, losing a single pick event corrupts inventory forever. Kafka with replication factor 3+.
  • Inventory is eventually consistent globally but strongly consistent per bin (compare-and-swap on bin version).
  • Robot dispatch is constrained optimization: minimize total drive distance subject to deadline + battery + congestion.
  • Exception rate ~0.5% of picks, operator UI must surface within 30s or wave throughput collapses.
  • Wave planning amortizes setup cost, group orders that share aisles into a single floor sweep.