system design · system-design
Design Amazon Customer-Order-Product DB
Relational + NoSQL hybrid, OLTP vs OLAP separation, global tables. Tests data-modeling decisions at multi-region scale.
Theory
Explanation
Intuition first, formal definition second. Skim the bullets if you already know this; read the prose if you don't.
No one DB does everything. Customer profile (low write, joined reads) fits relational. Orders (high write, sharded reads) fit DynamoDB. Search uses denormalized index. Analytics uses columnar warehouse. Glue together with CDC.
Polyglot persistence: Aurora MySQL for customer/account (ACID, joins). DynamoDB for orders (sharded by customer_id, secondary index by order_id, global table for multi-region). OpenSearch for product search (CDC from catalog). Redshift / S3 + Athena for analytics (loaded via Kinesis Firehose from CDC streams). Each store owns its access pattern; CDC keeps them in sync.
When to use
Multi-tenant SaaS, marketplaces, anything with mixed OLTP + OLAP + search workloads.
When not to
Small data, one access pattern, single Postgres beats polyglot complexity.
flowchart LR App[App Servers] --> Aur[(Aurora MySQL · Customers)] App --> Dy[(DynamoDB · Orders · sharded)] App --> OS[(OpenSearch · Catalog)] Aur -.CDC.-> Stream[[Kinesis Stream]] Dy -.CDC.-> Stream Stream --> FH[Kinesis Firehose] FH --> S3[(S3 Data Lake)] S3 --> RS[(Redshift)] S3 --> Ath[Athena]
Key insights
- Choose DB per access pattern, not per "we standardized on X".
- OLTP and OLAP are different beasts. Co-locating tanks analytics under prod load.
- CDC is the spine, must be reliable, idempotent, and ordered per key.
- Global tables for orders give regional read locality; multi-master writes require conflict resolution rules.
- Schema-on-read for analytics (Parquet on S3 + Athena) is cheaper than rigid warehouse schemas during exploration.