System Design: Complete Guide for Interviews

Introduction

System design interviews assess your ability to architect large-scale services. Unlike algorithm questions, they focus on high-level thinking: gathering requirements, making trade-offs, and balancing non-functional needs (scale, reliability, maintainability). In this guide you’ll learn a repeatable framework, key building blocks, common patterns, and a worked example.

1. Clarify Requirements

Before sketching any boxes, ask clarifying questions:

Scope & Features

What functionality is in/out of scope? (e.g. “Design a URL shortener that supports custom aliases but no analytics.”)

Scale & Constraints

Expected traffic (QPS), data size, peak vs average load.

Performance & SLAs

Latency targets, consistency needs (strong vs eventual), availability requirements.

Data & Operations

Read/write ratio, retention policy, batch vs real-time processing.

Tip: Frame each requirement you elicit as an assumption you’ll validate later.

2. Define the API Interface

Describe the core endpoints and their request/response shapes. This keeps the discussion concrete.

Example: URL Shortener

3. High-Level (Black-Box) Design

Draw a block diagram showing:

Clients

Browser/mobile apps calling your APIs.

API Gateway / Load Balancer

Distributes requests across stateless application servers.

Application Servers

Your business-logic services, horizontally scalable.

Datastore(s)

Primary database, cache layer, and any async/message queues.

External Services

CDN for static content, email/SMS gateways, analytics pipelines.

Keep it simple at first; you’ll dive deeper component by component.

4. Component Deep Dive

4.1 Load Balancer

Purpose: Distribute traffic, handle SSL termination, health checks.

Options: HAProxy, Nginx, AWS ALB/ELB, GCP Cloud Load Balancing.

4.2 Application Servers

Stateless: So you can scale horizontally.

Tech Choice: Java/Spring Boot, Node.js/Express, Go, etc.

4.3 Database Layer

Primary Store: Relational (PostgreSQL/MySQL) vs NoSQL (Cassandra, DynamoDB).

Schema Design:

Scaling:

Replication for read-scaling.
Sharding (range or hash) when writes exceed a single node.

4.4 Caching

Use Case: Offload frequent reads (popular URLs).

Tech: Redis or Memcached.

Pattern: Cache aside

Application checks cache first.

On miss, fetch from DB, then populate cache.

5. Handling Scale and Traffic Patterns

Rate Limiting / Throttling: Protect backend during spikes (e.g. token bucket via API gateway).

Autoscaling: Based on CPU/RPS.

CDN: Serve static assets, offload traffic.

Backpressure & Queueing: Use Kafka or RabbitMQ for async tasks (e.g. logging, analytics).

6. Consistency, Availability & Trade-Offs

CAP Theorem: You can only have two of Consistency, Availability, Partition-tolerance.

Consistency Models:

Strong: Reads always return latest write.
Eventual: Better latency/availability, but clients may see stale data.

Use Cases: Analytics can be eventually consistent; payment systems usually need strong consistency.

7. Monitoring, Logging & Alerting

Metrics: QPS, latency percentiles (p50/p95/p99), error rates.

Logging: Structured logs (JSON), correlation IDs for tracing.

Distributed Tracing: OpenTelemetry, Jaeger, Zipkin.

Alerts: On thresholds (e.g. error rate >1%, latency >200 ms p95).

8. Security Considerations

Authentication & Authorization: JWT, OAuth2.

Input Validation: Prevent URL injection, XSS.

Encryption: TLS in transit, encryption at rest for sensitive data.

Secrets Management: Vault, AWS KMS.

9. Sample Case Study: Designing a Chat Service

Requirements: 1:1 and group chat, message history, online presence.

API Sketch:

High-Level:

Clients ↔️ WebSocket Gateway ↔️ Chat Service.

Messages → Kafka → Storage Service.

Read from database (e.g., Cassandra).

Real-Time Delivery:

Use Pub/Sub channels (Redis Pub/Sub or MQTT).

Maintain user-to-connection map in a distributed cache.

Scaling:

Partition chat rooms by shard key.

Autoscale WebSocket nodes.

10. Interview Tips & Best Practices

Communicate Clearly: Narrate your thought process—don’t code in silence.

Draw Diagrams: A quick sketch on the whiteboard boosts clarity.

Discuss Alternatives: Show you know trade-offs.

Focus on Non-Functional: Latency, throughput, cost, maintainability.

Time Management: If you get stuck, loop back—summarize what you’ve covered.

Conclusion

System design interviews reward structured thinking, clear communication, and trade-off analysis. By following this framework—requirements, API, high-level design, deep dives, scale considerations, and monitoring—you’ll present a robust solution. Practice with real-world case studies (URL shortener, chat, social feed) to internalize these patterns and go into your next interview confident and prepared.

Here are three ready-to-use system-design templates—one each for video streaming, ride-sharing, and social media. You can adapt these to any similar service by swapping in your own requirements, tech choices, and scale numbers.

1. Video Streaming Service (e.g. Netflix)

1.1 Clarify Requirements

Features In-Scope:

On‐demand video playback (VOD)
User profiles & recommendations
Search & browse catalogs
DRM / geo‐restriction

Scale:

10M active users, 100K concurrent streams
Average video size: 1.5 GB; daily traffic: ~150 PB

Performance SLAs:

Start-up latency < 2 s
99th-percentile buffer time < 1 s

1.2 API Definition

1.3 High-Level Architecture

1.4 Core Components

CDN & Streaming

HLS/DASH segments in S3 → Edge caching (CloudFront)

Catalog Service

Read‐heavy: DynamoDB + global replication

Playback Service

Issues signed URLs; enforces DRM via token service

Recommendation Engine

Batch Spark jobs + real‐time feature store (Redis)

Analytics Pipeline

Clickstream → Kafka → Flink → S3 / BigQuery

1.5 Data Model (example SQL for user-video watch history)

1.6 Scaling & Resilience

Auto-scaling on ALB based on CPU/RPS

Sharding catalog DB by video ID prefix

Multi-AZ object storage + cross-region replication

2. Ride-Sharing Service (e.g. Uber)

2.1 Clarify Requirements

Features In-Scope:

Real-time driver matching
Dynamic pricing (surge)
In-trip tracking & ETA updates
Payment processing

Scale:

1M daily rides, peak QPS for matching: 5K/s

SLAs:

Match latency < 200 ms
Location update latency < 1 s

2.2 API Definition

2.3 High-Level Architecture

2.4 Core Components

Geospatial Index

Geo-hash grid in Redis for “nearest drivers”

Matching Service

k-nearest neighbor lookup + surge multiplier

Pricing Service

Base fare + time + distance + dynamic surge factor

Tracking

WebSockets or MQTT for real-time location updates

Payments

Stripe / Braintree integrations; idempotent charge flow

2.5 Data Model (schema for ride requests)

2.6 Scaling & Resilience

Partition Redis grid by region

Circuit Breakers on payment gateways

Event Sourcing for audit trails via Kafka

3. Social Media Feed (e.g. Reddit)

3.1 Clarify Requirements

Features In-Scope:

Subreddits (topics), posts, comments, upvotes/downvotes
Personalized front page
Notifications & moderation

Scale:

500 M monthly active users, 50 K posts/min, 500 K comments/min

SLAs:

Feed retrieval < 100 ms
Vote propagation < 1 s

3.2 API Definition

3.3 High-Level Architecture

3.4 Core Components

Feed Generation

Push model: on post/vote, push IDs into followers’ sorted sets in Redis
Pull model: query recent posts in subscribed subreddits + apply ranking ML

Ranking Algorithm

“Hot” score = (up – down) / time^1.5

Comment Threads

Materialized path or adjacency list in a document store (MongoDB)

Notifications

Fan-out via Kafka → worker pools → push notifications

3.5 Data Model (votes table example)

3.6 Scaling & Resilience

Hot Partition Mitigation: shard active subreddits across multiple DB partitions

Cache Invalidation on vote changes

Rate Limiting on comment/post endpoints

How to Use These Templates

Plug in Your Numbers: Adjust QPS, data volumes, TTLs.

Choose Technologies: Swap in your cloud provider or open-source stack.

Draw & Explain: Sketch the boxes, call out trade-offs (consistency vs latency, read vs write scaling).

Dive Deeper: For any “hot” component—caching, geoindex, feed gen—be ready with alternatives.

Happy designing and good luck in your interviews!

Introduction

1. Clarify Requirements

Before sketching any boxes, ask clarifying questions:

Scope & Features

What functionality is in/out of scope? (e.g. “Design a URL shortener that supports custom aliases but no analytics.”)

Scale & Constraints

Expected traffic (QPS), data size, peak vs average load.

Performance & SLAs

Latency targets, consistency needs (strong vs eventual), availability requirements.

Data & Operations

Read/write ratio, retention policy, batch vs real-time processing.

Tip: Frame each requirement you elicit as an assumption you’ll validate later.

2. Define the API Interface

Describe the core endpoints and their request/response shapes. This keeps the discussion concrete.

Example: URL Shortener

3. High-Level (Black-Box) Design

Draw a block diagram showing:

Clients

Browser/mobile apps calling your APIs.

API Gateway / Load Balancer

Distributes requests across stateless application servers.

Application Servers

Your business-logic services, horizontally scalable.

Datastore(s)

Primary database, cache layer, and any async/message queues.

External Services

CDN for static content, email/SMS gateways, analytics pipelines.

Keep it simple at first; you’ll dive deeper component by component.

4. Component Deep Dive

4.1 Load Balancer

Purpose: Distribute traffic, handle SSL termination, health checks.

Options: HAProxy, Nginx, AWS ALB/ELB, GCP Cloud Load Balancing.

4.2 Application Servers

Stateless: So you can scale horizontally.

Tech Choice: Java/Spring Boot, Node.js/Express, Go, etc.

4.3 Database Layer

Primary Store: Relational (PostgreSQL/MySQL) vs NoSQL (Cassandra, DynamoDB).

Schema Design:

Scaling:

Replication for read-scaling.
Sharding (range or hash) when writes exceed a single node.

4.4 Caching

Use Case: Offload frequent reads (popular URLs).

Tech: Redis or Memcached.

Pattern: Cache aside

Application checks cache first.

On miss, fetch from DB, then populate cache.

5. Handling Scale and Traffic Patterns

Rate Limiting / Throttling: Protect backend during spikes (e.g. token bucket via API gateway).

Autoscaling: Based on CPU/RPS.

CDN: Serve static assets, offload traffic.

Backpressure & Queueing: Use Kafka or RabbitMQ for async tasks (e.g. logging, analytics).

6. Consistency, Availability & Trade-Offs

CAP Theorem: You can only have two of Consistency, Availability, Partition-tolerance.

Consistency Models:

Strong: Reads always return latest write.
Eventual: Better latency/availability, but clients may see stale data.

Use Cases: Analytics can be eventually consistent; payment systems usually need strong consistency.

7. Monitoring, Logging & Alerting

Metrics: QPS, latency percentiles (p50/p95/p99), error rates.

Logging: Structured logs (JSON), correlation IDs for tracing.

Distributed Tracing: OpenTelemetry, Jaeger, Zipkin.

Alerts: On thresholds (e.g. error rate >1%, latency >200 ms p95).

8. Security Considerations

Authentication & Authorization: JWT, OAuth2.

Input Validation: Prevent URL injection, XSS.

Encryption: TLS in transit, encryption at rest for sensitive data.

Secrets Management: Vault, AWS KMS.

9. Sample Case Study: Designing a Chat Service

Requirements: 1:1 and group chat, message history, online presence.

API Sketch:

High-Level:

Clients ↔️ WebSocket Gateway ↔️ Chat Service.

Messages → Kafka → Storage Service.

Read from database (e.g., Cassandra).

Real-Time Delivery:

Use Pub/Sub channels (Redis Pub/Sub or MQTT).

Maintain user-to-connection map in a distributed cache.

Scaling:

Partition chat rooms by shard key.

Autoscale WebSocket nodes.

10. Interview Tips & Best Practices

Communicate Clearly: Narrate your thought process—don’t code in silence.

Draw Diagrams: A quick sketch on the whiteboard boosts clarity.

Discuss Alternatives: Show you know trade-offs.

Focus on Non-Functional: Latency, throughput, cost, maintainability.

Time Management: If you get stuck, loop back—summarize what you’ve covered.

Conclusion

1. Video Streaming Service (e.g. Netflix)

1.1 Clarify Requirements

Features In-Scope:

On‐demand video playback (VOD)
User profiles & recommendations
Search & browse catalogs
DRM / geo‐restriction

Scale:

10M active users, 100K concurrent streams
Average video size: 1.5 GB; daily traffic: ~150 PB

Performance SLAs:

Start-up latency < 2 s
99th-percentile buffer time < 1 s

1.2 API Definition

1.3 High-Level Architecture

1.4 Core Components

CDN & Streaming

HLS/DASH segments in S3 → Edge caching (CloudFront)

Catalog Service

Read‐heavy: DynamoDB + global replication

Playback Service

Issues signed URLs; enforces DRM via token service

Recommendation Engine

Batch Spark jobs + real‐time feature store (Redis)

Analytics Pipeline

Clickstream → Kafka → Flink → S3 / BigQuery

1.5 Data Model (example SQL for user-video watch history)

1.6 Scaling & Resilience

Auto-scaling on ALB based on CPU/RPS

Sharding catalog DB by video ID prefix

Multi-AZ object storage + cross-region replication

2. Ride-Sharing Service (e.g. Uber)

2.1 Clarify Requirements

Features In-Scope:

Real-time driver matching
Dynamic pricing (surge)
In-trip tracking & ETA updates
Payment processing

Scale:

1M daily rides, peak QPS for matching: 5K/s

SLAs:

Match latency < 200 ms
Location update latency < 1 s

2.2 API Definition

2.3 High-Level Architecture

2.4 Core Components

Geospatial Index

Geo-hash grid in Redis for “nearest drivers”

Matching Service

k-nearest neighbor lookup + surge multiplier

Pricing Service

Base fare + time + distance + dynamic surge factor

Tracking

WebSockets or MQTT for real-time location updates

Payments

Stripe / Braintree integrations; idempotent charge flow

2.5 Data Model (schema for ride requests)

2.6 Scaling & Resilience

Partition Redis grid by region

Circuit Breakers on payment gateways

Event Sourcing for audit trails via Kafka

3. Social Media Feed (e.g. Reddit)

3.1 Clarify Requirements

Features In-Scope:

Subreddits (topics), posts, comments, upvotes/downvotes
Personalized front page
Notifications & moderation

Scale:

500 M monthly active users, 50 K posts/min, 500 K comments/min

SLAs:

Feed retrieval < 100 ms
Vote propagation < 1 s

3.2 API Definition

3.3 High-Level Architecture

3.4 Core Components

Feed Generation

Push model: on post/vote, push IDs into followers’ sorted sets in Redis
Pull model: query recent posts in subscribed subreddits + apply ranking ML

Ranking Algorithm

“Hot” score = (up – down) / time^1.5

Comment Threads

Materialized path or adjacency list in a document store (MongoDB)

Notifications

Fan-out via Kafka → worker pools → push notifications

3.5 Data Model (votes table example)

3.6 Scaling & Resilience

Hot Partition Mitigation: shard active subreddits across multiple DB partitions

Cache Invalidation on vote changes

Rate Limiting on comment/post endpoints

How to Use These Templates

Plug in Your Numbers: Adjust QPS, data volumes, TTLs.

Choose Technologies: Swap in your cloud provider or open-source stack.

Draw & Explain: Sketch the boxes, call out trade-offs (consistency vs latency, read vs write scaling).

Dive Deeper: For any “hot” component—caching, geoindex, feed gen—be ready with alternatives.

Happy designing and good luck in your interviews!

Introduction

1. Clarify Requirements

2. Define the API Interface

3. High-Level (Black-Box) Design

4. Component Deep Dive

4.1 Load Balancer

4.2 Application Servers

4.3 Database Layer

4.4 Caching

5. Handling Scale and Traffic Patterns

6. Consistency, Availability & Trade-Offs

7. Monitoring, Logging & Alerting

8. Security Considerations

9. Sample Case Study: Designing a Chat Service

10. Interview Tips & Best Practices

Conclusion

1. Video Streaming Service (e.g. Netflix)

1.1 Clarify Requirements

1.2 API Definition

1.3 High-Level Architecture

1.4 Core Components

1.5 Data Model (example SQL for user-video watch history)

1.6 Scaling & Resilience

2. Ride-Sharing Service (e.g. Uber)

2.1 Clarify Requirements

2.2 API Definition

2.3 High-Level Architecture

2.4 Core Components

2.5 Data Model (schema for ride requests)

2.6 Scaling & Resilience

3. Social Media Feed (e.g. Reddit)

3.1 Clarify Requirements

3.2 API Definition

3.3 High-Level Architecture

3.4 Core Components

3.5 Data Model (votes table example)

3.6 Scaling & Resilience

More posts

Introduction

1. Clarify Requirements

2. Define the API Interface

3. High-Level (Black-Box) Design

4. Component Deep Dive

4.1 Load Balancer

4.2 Application Servers

4.3 Database Layer

4.4 Caching

5. Handling Scale and Traffic Patterns

6. Consistency, Availability & Trade-Offs

7. Monitoring, Logging & Alerting

8. Security Considerations

9. Sample Case Study: Designing a Chat Service

10. Interview Tips & Best Practices

Conclusion

1. Video Streaming Service (e.g. Netflix)

1.1 Clarify Requirements

1.2 API Definition

1.3 High-Level Architecture

1.4 Core Components

1.5 Data Model (example SQL for user-video watch history)

1.6 Scaling & Resilience

2. Ride-Sharing Service (e.g. Uber)

2.1 Clarify Requirements

2.2 API Definition

2.3 High-Level Architecture

2.4 Core Components

2.5 Data Model (schema for ride requests)

2.6 Scaling & Resilience

3. Social Media Feed (e.g. Reddit)

3.1 Clarify Requirements

3.2 API Definition

3.3 High-Level Architecture

3.4 Core Components

3.5 Data Model (votes table example)

3.6 Scaling & Resilience

More posts