system design · system-design · domain
Design Tesla Autopilot Data Pipeline (Auto-labeling + Retraining)
Auto-labeling, shadow mode, rare-event mining, training data curation, reprocessing.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
Autopilot improves through closed-loop data engine: deployed model produces predictions, shadow mode logs when prediction differs from reality, mined cases become training data, retrained model deploys. The pipeline is the moat.
Shadow mode: production model runs alongside larger shadow model in some vehicles. Disagreements logged. Auto-labeling: offline pipeline runs heavy ground-truth model (often using future frames to look back at past frames). Rare-event mining: query data lake for under-represented scenarios (rain + intersection + pedestrian). Curate training set with class balancing. Re-train; validate on regression suite; stage OTA rollout.
When to use
Any closed-loop ML system where deployment data improves future models.
When not to
Static models without retraining.
flowchart LR
Vehicle[Production Model] -->|prediction| Compare{Shadow disagree?}
Shadow[Shadow Model · larger] --> Compare
Compare -->|disagree| Clip[Save 5s clip]
Clip --> Lake[(Data Lake)]
Lake --> Mine[Rare-Event Miner]
Mine --> Curate[Curated Training Set]
Curate --> AutoLabel[Auto-Labeling · ground-truth model]
AutoLabel --> Train[Training]
Train --> Eval[Regression Suite]
Eval -->|pass| OTA[OTA Stage Rollout]
OTA --> VehicleKey insights
- Auto-labeling can use future frames (looking back), far more accurate than real-time inference.
- Shadow mode is the primary signal source, costs negligible to run in field.
- Rare-event mining handles class imbalance: 99% of driving is boring; rare cases drive most learning.
- Regression suite catches behavioral changes; subjective driving feel matters.
- OTA staged rollout lets you measure real-world delta before fleet-wide.