Twitter handles 500 million tweets per day. Netflix streams to 230 million subscribers simultaneously. Google processes 8.5 billion searches daily. How do these systems actually work? More importantly—can you explain how to build one in 45 minutes? System design interviews at FAANG companies have a 60% fail rate, often because candidates dive into details without a structured approach. Here's the framework that actually works.
The Framework: 6 Steps in 45 Minutes
System design interviews are conversations, not coding tests. Your goal is to demonstrate structured thinking, make reasonable trade-offs, and communicate clearly.
Step 1: Clarify Requirements (5 minutes)
Never start designing without asking questions.
"Before I dive in, I'd like to understand the requirements better..."
Functional requirements - What should the system do?
- Core features only (don't over-scope)
- User actions and flows
- Input/output expectations
Non-functional requirements - How well should it do it?
- Scale: How many users? Requests per second?
- Performance: Acceptable latency?
- Availability: What uptime is required?
- Consistency: Can data be eventually consistent or must it be strong?
Constraints:
- Are we building from scratch or integrating with existing systems?
- Any technology preferences or restrictions?
- Budget or team size considerations?
Step 2: Estimate Scale (5 minutes)
Back-of-envelope calculations show you think about real-world constraints.
Example: Designing Twitter
Users: 500M monthly active users
Daily active: 200M (40%)
Tweets per day: 500M (avg 2.5 per active user)
Reads per day: 200M users × 100 tweets viewed = 20B reads
Tweets per second: 500M / 86400 ≈ 6000 TPS (write)
Reads per second: 20B / 86400 ≈ 230,000 QPS (read)
Storage per tweet: 280 chars + metadata ≈ 500 bytes
Daily storage: 500M × 500 bytes = 250GB
Yearly storage: 250GB × 365 = ~90TB (just text, media is much more)
Key insight: This is a read-heavy system (230K reads vs 6K writes). Design accordingly.
Step 3: Define API and Data Model (5 minutes)
API Design:
POST /tweets
body: { text, media_ids }
returns: { tweet_id, created_at }
GET /timeline
params: ?cursor=xxx&limit=20
returns: { tweets: [...], next_cursor }
GET /users/{id}/tweets
returns: { tweets: [...] }
POST /follow/{user_id}
DELETE /follow/{user_id}
Data Model:
User
- id (PK)
- username
- email
- created_at
Tweet
- id (PK)
- user_id (FK)
- text
- created_at
- media_urls
Follow
- follower_id (PK, FK)
- followee_id (PK, FK)
- created_at
Step 4: High-Level Design (10 minutes)
Draw the main components and explain data flow.
┌─────────────┐
│ CDN │ (static assets)
└──────┬──────┘
│
┌──────────┐ ┌──────────────┐ ┌──────┴──────┐
│ Client │────▶│ Load Balancer│──▶│ API Servers │
└──────────┘ └──────────────┘ └──────┬──────┘
│
┌────────────────────────────┼────────────────────────────┐
│ │ │
┌──────┴──────┐ ┌───────┴───────┐ ┌───────┴───────┐
│ Cache │ │ Tweet Service │ │ Timeline Svc │
│ (Redis) │ └───────┬───────┘ └───────┬───────┘
└─────────────┘ │ │
┌──────┴──────┐ ┌───────┴───────┐
│ Tweet DB │ │ Timeline Cache │
│ (Sharded) │ │ (Redis) │
└─────────────┘ └───────────────┘
Walk through the flow:
"When a user posts a tweet: request hits the load balancer, goes to an API server, which writes to the Tweet database. Then we need to update timelines - this is where it gets interesting.
For reading timelines, we want to avoid expensive database queries, so we pre-compute timelines and store them in Redis. When you open the app, we just read from cache.
The challenge is: when should we update these cached timelines?"
Step 5: Deep Dive (15 minutes)
Choose 2-3 components to discuss in detail. The interviewer may guide you, or you can pick based on what's most interesting.
Timeline Generation: Fan-out on Write vs Fan-out on Read
This is the classic Twitter design problem.
Fan-out on Write (Push model):
When user posts tweet:
1. Write tweet to DB
2. Get all followers (could be millions)
3. Push tweet to each follower's timeline cache
Pros: Fast reads - timeline is pre-computed
Cons: Slow writes for users with many followers (celebrities)
High storage - tweet duplicated N times
Fan-out on Read (Pull model):
When user reads timeline:
1. Get list of who they follow
2. Fetch recent tweets from each
3. Merge and sort
Pros: Fast writes - just store the tweet once
Cons: Slow reads - must query multiple users
High compute at read time
Hybrid approach (what Twitter actually uses):
- Regular users: Fan-out on write
- Celebrities (>10K followers): Fan-out on read
When building timeline:
1. Read pre-computed timeline (regular users' tweets)
2. Merge with celebrity tweets fetched on-demand
Database Sharding:
"With 500M tweets per day, a single database won't scale. We need to shard.
Shard key options:
- User ID: All tweets from a user on same shard. Good for user profile pages, bad for timeline (must query all shards)
- Tweet ID: Even distribution. Good for single tweet lookups, bad for user's tweet list
- Time-based: Recent data on 'hot' shards. Good for recent queries, needs re-sharding over time
I'd recommend user_id sharding with a separate index for tweet_id lookups."
Step 6: Address Bottlenecks (5 minutes)
Scaling:
- Horizontal scaling of API servers behind load balancer
- Database read replicas for read-heavy workload
- Sharding for write scaling
Reliability:
- Multiple data centers for geographic redundancy
- Database replication (primary-replica)
- Circuit breakers for failing services
Monitoring:
- Request latency percentiles (p50, p95, p99)
- Error rates
- Database query times
- Cache hit rates
Common System Design Patterns
Caching Strategy
Read Request
│
▼
┌─────────────┐
│ Check Cache │
└──────┬──────┘
│
┌─────────┴─────────┐
│ │
Cache Hit Cache Miss
│ │
▼ ▼
Return Data Query Database
│
▼
Update Cache
│
▼
Return Data
Cache-aside (Lazy Loading):
async function getUser(userId) {
// Try cache first
let user = await cache.get(`user:${userId}`);
if (!user) {
// Cache miss - load from DB
user = await db.query('SELECT * FROM users WHERE id = ?', userId);
// Update cache
await cache.set(`user:${userId}`, user, { ttl: 3600 });
}
return user;
}Write-through:
async function updateUser(userId, data) {
// Update DB
await db.query('UPDATE users SET ... WHERE id = ?', userId);
// Update cache immediately
await cache.set(`user:${userId}`, data);
}Rate Limiting
Request
│
▼
┌─────────────────┐
│ Rate Limiter │
│ (Token Bucket) │
└────────┬────────┘
│
┌────────┴────────┐
│ │
Allowed Rejected
│ │
▼ ▼
Process Return 429
Request
Token Bucket Algorithm:
class RateLimiter {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens
this.tokens = capacity; // Current tokens
this.refillRate = refillRate; // Tokens per second
this.lastRefill = Date.now();
}
allowRequest() {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return true;
}
return false;
}
refill() {
const now = Date.now();
const elapsed = (now - this.lastRefill) / 1000;
this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
this.lastRefill = now;
}
}Message Queues for Async Processing
┌─────────┐ ┌─────────────┐ ┌──────────┐
│ Producer│───▶│ Message │───▶│ Consumer │
│ │ │ Queue │ │ │
└─────────┘ │ (Kafka/SQS) │ └──────────┘
└─────────────┘
Use cases:
- Decoupling services (post service → notification service)
- Handling traffic spikes (queue absorbs burst)
- Async processing (image resizing, email sending)
- Event sourcing (log all changes for replay)
Database Replication
Writes
│
▼
┌─────────────┐
│ Primary │
│ Database │
└──────┬──────┘
│ Replication
┌──────┴──────┐
│ │
▼ ▼
┌─────────┐ ┌─────────┐
│ Replica │ │ Replica │
│ 1 │ │ 2 │
└─────────┘ └─────────┘
▲ ▲
│ │
└──────┬──────┘
│
Reads
Classic Interview Problems
Design a URL Shortener
Requirements:
- Shorten long URLs
- Redirect short URLs
- Custom short codes (optional)
- Analytics (optional)
Scale:
- 100M URLs created per month
- 10:1 read:write ratio
- 7 characters = 62^7 = 3.5 trillion combinations
Design:
POST /shorten
body: { long_url, custom_code? }
returns: { short_url }
GET /{short_code}
returns: 302 redirect to long_url
Key decisions:
- ID generation: Counter + base62 encode, or random generation with collision check
- Database: NoSQL (Cassandra/DynamoDB) - simple key-value, high write throughput
- Caching: Hot URLs in Redis
- Analytics: Async logging to Kafka → Analytics DB
Design a Chat System
Requirements:
- 1-on-1 and group messaging
- Online status
- Message history
- Push notifications
Key components:
- WebSocket servers for real-time communication
- Message queue for delivery guarantee
- User presence service (heartbeat-based)
- Push notification service (APNs/FCM)
Message flow:
User A sends message
│
▼
WebSocket Server
│
▼
Message Queue (Kafka)
│
├──▶ Message Store (Cassandra)
│
└──▶ Delivery Service
│
├──▶ User B online → WebSocket push
│
└──▶ User B offline → Push notification
Design a Rate Limiter
Requirements:
- Limit requests per user/IP
- Different limits for different APIs
- Distributed (multiple servers)
Algorithms:
- Token Bucket: Smooth rate limiting, allows burst
- Leaky Bucket: Fixed rate output
- Fixed Window: Simple but edge case at window boundaries
- Sliding Window: Most accurate, more complex
Distributed implementation:
Using Redis:
MULTI
GET rate_limit:{user_id}
INCR rate_limit:{user_id}
EXPIRE rate_limit:{user_id} 60
EXEC
if count > limit:
reject request
else:
allow request
Common Follow-Up Questions
"How would you handle a celebrity posting a tweet?"
"Celebrities have millions of followers, so fan-out on write is too slow. I'd use a hybrid approach: for accounts over a threshold (say 10K followers), we don't fan out immediately. Instead, when a user loads their timeline, we fetch their pre-computed feed AND query recent tweets from celebrities they follow, then merge them. This trades off some read latency for much better write latency."
"What happens if the database goes down?"
"We need to plan for this. First, database replication - a primary with multiple replicas in different availability zones. If the primary fails, we promote a replica. For read traffic, we can route to replicas immediately. For writes, we might need brief downtime during failover.
For critical data, we could use a message queue as a write-ahead log - even if the database is down, we accept writes to the queue and process them when the database recovers."
"How do you ensure consistency in a distributed system?"
"It depends on the use case. For financial transactions, we need strong consistency - I'd use a distributed transaction or the Saga pattern with compensating transactions.
For social media features like like counts, eventual consistency is fine - the count might be slightly off for a few seconds, but it will converge. We can use async replication and accept temporary inconsistencies.
The CAP theorem tells us we have to choose. During a network partition, do we want availability (serve potentially stale data) or consistency (reject requests)? Most social applications choose availability."
What Interviewers Are Really Testing
When I conduct system design interviews, I'm checking:
- Structured thinking - Do you have a framework or do you ramble?
- Clarifying questions - Do you dive in or understand requirements first?
- Scale awareness - Do you think about real-world numbers?
- Trade-off analysis - Can you articulate pros/cons of decisions?
- Communication - Can you explain your design clearly?
- Depth where needed - Can you go deep on specific components?
A candidate who asks good questions, draws clear diagrams, explains trade-offs, and drives the conversation will stand out.
Quick Reference Card
| Component | When to Use |
|---|---|
| Load Balancer | Multiple servers, high availability |
| CDN | Static assets, global users |
| Cache (Redis) | Read-heavy, acceptable staleness |
| Message Queue | Async processing, decoupling |
| Database Sharding | Single DB can't handle write load |
| Read Replicas | Single DB can't handle read load |
| NoSQL | Simple queries, high scale, flexible schema |
| SQL | Complex queries, transactions, relationships |
| Concept | Remember |
|---|---|
| CAP Theorem | Choose 2: Consistency, Availability, Partition tolerance |
| Horizontal scaling | Add more machines (better) |
| Vertical scaling | Bigger machine (limited) |
| Fan-out on write | Pre-compute, fast reads, slow writes |
| Fan-out on read | Compute on demand, fast writes, slow reads |
Practice Questions
Test yourself before your interview:
1. Design a parking lot system. What are the key components and how do you handle multiple entry/exit points?
2. Design Instagram. How would you handle image storage and delivery at scale?
3. You're designing a notification system. How do you ensure notifications are delivered even if the user's device is offline?
4. Design a web crawler. How do you avoid crawling the same page twice?
Related Articles
If you found this helpful, check out these related guides:
- Complete Node.js Backend Developer Interview Guide - comprehensive preparation guide for backend interviews
- SQL JOINs Interview Guide - Master JOIN types with visual examples
- REST API Interview Guide - API design principles and best practices
- Node.js Advanced Interview Guide - Event loop, streams, and Node.js internals
- Complete DevOps Engineer Interview Guide - comprehensive preparation guide for DevOps interviews
- Docker Interview Guide - Containers, images, and production-ready Dockerfiles
- Kubernetes Interview Guide - Container orchestration, pods, and deployments
Ready for More System Design Questions?
This is just one topic from our complete system design interview prep guide. Get access to 20+ full system design problems covering:
- Social media (Twitter, Instagram, Facebook)
- Messaging (WhatsApp, Slack)
- Streaming (Netflix, YouTube)
- E-commerce (Amazon, payment systems)
- Infrastructure (rate limiter, URL shortener, search)
Get Full Access to All System Design Questions →
Or try our free System Design preview to see more questions like this.
Written by the EasyInterview team, based on real interview experience from 12+ years in tech and hundreds of technical interviews conducted at companies like BNY Mellon, UBS, and leading fintech firms.
