System Design Interview: A Framework That Actually Works

·12 min read
system-designinterview-questionsarchitecturescalabilitybackend

Twitter handles 500 million tweets per day. Netflix streams to 230 million subscribers simultaneously. Google processes 8.5 billion searches daily. How do these systems actually work? More importantly—can you explain how to build one in 45 minutes? System design interviews at FAANG companies have a 60% fail rate, often because candidates dive into details without a structured approach. Here's the framework that actually works.

The Framework: 6 Steps in 45 Minutes

System design interviews are conversations, not coding tests. Your goal is to demonstrate structured thinking, make reasonable trade-offs, and communicate clearly.

Step 1: Clarify Requirements (5 minutes)

Never start designing without asking questions.

"Before I dive in, I'd like to understand the requirements better..."

Functional requirements - What should the system do?

  • Core features only (don't over-scope)
  • User actions and flows
  • Input/output expectations

Non-functional requirements - How well should it do it?

  • Scale: How many users? Requests per second?
  • Performance: Acceptable latency?
  • Availability: What uptime is required?
  • Consistency: Can data be eventually consistent or must it be strong?

Constraints:

  • Are we building from scratch or integrating with existing systems?
  • Any technology preferences or restrictions?
  • Budget or team size considerations?

Step 2: Estimate Scale (5 minutes)

Back-of-envelope calculations show you think about real-world constraints.

Example: Designing Twitter

Users: 500M monthly active users
Daily active: 200M (40%)
Tweets per day: 500M (avg 2.5 per active user)
Reads per day: 200M users × 100 tweets viewed = 20B reads

Tweets per second: 500M / 86400 ≈ 6000 TPS (write)
Reads per second: 20B / 86400 ≈ 230,000 QPS (read)

Storage per tweet: 280 chars + metadata ≈ 500 bytes
Daily storage: 500M × 500 bytes = 250GB
Yearly storage: 250GB × 365 = ~90TB (just text, media is much more)

Key insight: This is a read-heavy system (230K reads vs 6K writes). Design accordingly.

Step 3: Define API and Data Model (5 minutes)

API Design:

POST /tweets
  body: { text, media_ids }
  returns: { tweet_id, created_at }

GET /timeline
  params: ?cursor=xxx&limit=20
  returns: { tweets: [...], next_cursor }

GET /users/{id}/tweets
  returns: { tweets: [...] }

POST /follow/{user_id}
DELETE /follow/{user_id}

Data Model:

User
  - id (PK)
  - username
  - email
  - created_at

Tweet
  - id (PK)
  - user_id (FK)
  - text
  - created_at
  - media_urls

Follow
  - follower_id (PK, FK)
  - followee_id (PK, FK)
  - created_at

Step 4: High-Level Design (10 minutes)

Draw the main components and explain data flow.

                                    ┌─────────────┐
                                    │     CDN     │ (static assets)
                                    └──────┬──────┘
                                           │
┌──────────┐     ┌──────────────┐   ┌──────┴──────┐
│  Client  │────▶│ Load Balancer│──▶│ API Servers │
└──────────┘     └──────────────┘   └──────┬──────┘
                                           │
              ┌────────────────────────────┼────────────────────────────┐
              │                            │                            │
       ┌──────┴──────┐             ┌───────┴───────┐            ┌───────┴───────┐
       │   Cache     │             │  Tweet Service │            │ Timeline Svc  │
       │  (Redis)    │             └───────┬───────┘            └───────┬───────┘
       └─────────────┘                     │                            │
                                    ┌──────┴──────┐             ┌───────┴───────┐
                                    │  Tweet DB   │             │ Timeline Cache │
                                    │  (Sharded)  │             │    (Redis)    │
                                    └─────────────┘             └───────────────┘

Walk through the flow:

"When a user posts a tweet: request hits the load balancer, goes to an API server, which writes to the Tweet database. Then we need to update timelines - this is where it gets interesting.

For reading timelines, we want to avoid expensive database queries, so we pre-compute timelines and store them in Redis. When you open the app, we just read from cache.

The challenge is: when should we update these cached timelines?"

Step 5: Deep Dive (15 minutes)

Choose 2-3 components to discuss in detail. The interviewer may guide you, or you can pick based on what's most interesting.

Timeline Generation: Fan-out on Write vs Fan-out on Read

This is the classic Twitter design problem.

Fan-out on Write (Push model):

When user posts tweet:
1. Write tweet to DB
2. Get all followers (could be millions)
3. Push tweet to each follower's timeline cache

Pros: Fast reads - timeline is pre-computed
Cons: Slow writes for users with many followers (celebrities)
      High storage - tweet duplicated N times

Fan-out on Read (Pull model):

When user reads timeline:
1. Get list of who they follow
2. Fetch recent tweets from each
3. Merge and sort

Pros: Fast writes - just store the tweet once
Cons: Slow reads - must query multiple users
      High compute at read time

Hybrid approach (what Twitter actually uses):

- Regular users: Fan-out on write
- Celebrities (>10K followers): Fan-out on read

When building timeline:
1. Read pre-computed timeline (regular users' tweets)
2. Merge with celebrity tweets fetched on-demand

Database Sharding:

"With 500M tweets per day, a single database won't scale. We need to shard.

Shard key options:

  • User ID: All tweets from a user on same shard. Good for user profile pages, bad for timeline (must query all shards)
  • Tweet ID: Even distribution. Good for single tweet lookups, bad for user's tweet list
  • Time-based: Recent data on 'hot' shards. Good for recent queries, needs re-sharding over time

I'd recommend user_id sharding with a separate index for tweet_id lookups."

Step 6: Address Bottlenecks (5 minutes)

Scaling:

  • Horizontal scaling of API servers behind load balancer
  • Database read replicas for read-heavy workload
  • Sharding for write scaling

Reliability:

  • Multiple data centers for geographic redundancy
  • Database replication (primary-replica)
  • Circuit breakers for failing services

Monitoring:

  • Request latency percentiles (p50, p95, p99)
  • Error rates
  • Database query times
  • Cache hit rates

Common System Design Patterns

Caching Strategy

                Read Request
                     │
                     ▼
              ┌─────────────┐
              │ Check Cache │
              └──────┬──────┘
                     │
           ┌─────────┴─────────┐
           │                   │
      Cache Hit           Cache Miss
           │                   │
           ▼                   ▼
     Return Data        Query Database
                              │
                              ▼
                       Update Cache
                              │
                              ▼
                        Return Data

Cache-aside (Lazy Loading):

async function getUser(userId) {
  // Try cache first
  let user = await cache.get(`user:${userId}`);
 
  if (!user) {
    // Cache miss - load from DB
    user = await db.query('SELECT * FROM users WHERE id = ?', userId);
 
    // Update cache
    await cache.set(`user:${userId}`, user, { ttl: 3600 });
  }
 
  return user;
}

Write-through:

async function updateUser(userId, data) {
  // Update DB
  await db.query('UPDATE users SET ... WHERE id = ?', userId);
 
  // Update cache immediately
  await cache.set(`user:${userId}`, data);
}

Rate Limiting

           Request
              │
              ▼
    ┌─────────────────┐
    │ Rate Limiter    │
    │ (Token Bucket)  │
    └────────┬────────┘
             │
    ┌────────┴────────┐
    │                 │
 Allowed           Rejected
    │                 │
    ▼                 ▼
 Process         Return 429
 Request

Token Bucket Algorithm:

class RateLimiter {
  constructor(capacity, refillRate) {
    this.capacity = capacity;      // Max tokens
    this.tokens = capacity;        // Current tokens
    this.refillRate = refillRate;  // Tokens per second
    this.lastRefill = Date.now();
  }
 
  allowRequest() {
    this.refill();
 
    if (this.tokens >= 1) {
      this.tokens -= 1;
      return true;
    }
 
    return false;
  }
 
  refill() {
    const now = Date.now();
    const elapsed = (now - this.lastRefill) / 1000;
    this.tokens = Math.min(this.capacity, this.tokens + elapsed * this.refillRate);
    this.lastRefill = now;
  }
}

Message Queues for Async Processing

┌─────────┐    ┌─────────────┐    ┌──────────┐
│ Producer│───▶│ Message     │───▶│ Consumer │
│         │    │ Queue       │    │          │
└─────────┘    │ (Kafka/SQS) │    └──────────┘
               └─────────────┘

Use cases:

  • Decoupling services (post service → notification service)
  • Handling traffic spikes (queue absorbs burst)
  • Async processing (image resizing, email sending)
  • Event sourcing (log all changes for replay)

Database Replication

         Writes
            │
            ▼
     ┌─────────────┐
     │   Primary   │
     │   Database  │
     └──────┬──────┘
            │ Replication
     ┌──────┴──────┐
     │             │
     ▼             ▼
┌─────────┐   ┌─────────┐
│ Replica │   │ Replica │
│    1    │   │    2    │
└─────────┘   └─────────┘
     ▲             ▲
     │             │
     └──────┬──────┘
            │
         Reads

Classic Interview Problems

Design a URL Shortener

Requirements:

  • Shorten long URLs
  • Redirect short URLs
  • Custom short codes (optional)
  • Analytics (optional)

Scale:

  • 100M URLs created per month
  • 10:1 read:write ratio
  • 7 characters = 62^7 = 3.5 trillion combinations

Design:

POST /shorten
  body: { long_url, custom_code? }
  returns: { short_url }

GET /{short_code}
  returns: 302 redirect to long_url

Key decisions:

  • ID generation: Counter + base62 encode, or random generation with collision check
  • Database: NoSQL (Cassandra/DynamoDB) - simple key-value, high write throughput
  • Caching: Hot URLs in Redis
  • Analytics: Async logging to Kafka → Analytics DB

Design a Chat System

Requirements:

  • 1-on-1 and group messaging
  • Online status
  • Message history
  • Push notifications

Key components:

  • WebSocket servers for real-time communication
  • Message queue for delivery guarantee
  • User presence service (heartbeat-based)
  • Push notification service (APNs/FCM)

Message flow:

User A sends message
    │
    ▼
WebSocket Server
    │
    ▼
Message Queue (Kafka)
    │
    ├──▶ Message Store (Cassandra)
    │
    └──▶ Delivery Service
              │
              ├──▶ User B online → WebSocket push
              │
              └──▶ User B offline → Push notification

Design a Rate Limiter

Requirements:

  • Limit requests per user/IP
  • Different limits for different APIs
  • Distributed (multiple servers)

Algorithms:

  • Token Bucket: Smooth rate limiting, allows burst
  • Leaky Bucket: Fixed rate output
  • Fixed Window: Simple but edge case at window boundaries
  • Sliding Window: Most accurate, more complex

Distributed implementation:

Using Redis:

MULTI
  GET rate_limit:{user_id}
  INCR rate_limit:{user_id}
  EXPIRE rate_limit:{user_id} 60
EXEC

if count > limit:
  reject request
else:
  allow request

Common Follow-Up Questions

"How would you handle a celebrity posting a tweet?"

"Celebrities have millions of followers, so fan-out on write is too slow. I'd use a hybrid approach: for accounts over a threshold (say 10K followers), we don't fan out immediately. Instead, when a user loads their timeline, we fetch their pre-computed feed AND query recent tweets from celebrities they follow, then merge them. This trades off some read latency for much better write latency."

"What happens if the database goes down?"

"We need to plan for this. First, database replication - a primary with multiple replicas in different availability zones. If the primary fails, we promote a replica. For read traffic, we can route to replicas immediately. For writes, we might need brief downtime during failover.

For critical data, we could use a message queue as a write-ahead log - even if the database is down, we accept writes to the queue and process them when the database recovers."

"How do you ensure consistency in a distributed system?"

"It depends on the use case. For financial transactions, we need strong consistency - I'd use a distributed transaction or the Saga pattern with compensating transactions.

For social media features like like counts, eventual consistency is fine - the count might be slightly off for a few seconds, but it will converge. We can use async replication and accept temporary inconsistencies.

The CAP theorem tells us we have to choose. During a network partition, do we want availability (serve potentially stale data) or consistency (reject requests)? Most social applications choose availability."

What Interviewers Are Really Testing

When I conduct system design interviews, I'm checking:

  1. Structured thinking - Do you have a framework or do you ramble?
  2. Clarifying questions - Do you dive in or understand requirements first?
  3. Scale awareness - Do you think about real-world numbers?
  4. Trade-off analysis - Can you articulate pros/cons of decisions?
  5. Communication - Can you explain your design clearly?
  6. Depth where needed - Can you go deep on specific components?

A candidate who asks good questions, draws clear diagrams, explains trade-offs, and drives the conversation will stand out.

Quick Reference Card

ComponentWhen to Use
Load BalancerMultiple servers, high availability
CDNStatic assets, global users
Cache (Redis)Read-heavy, acceptable staleness
Message QueueAsync processing, decoupling
Database ShardingSingle DB can't handle write load
Read ReplicasSingle DB can't handle read load
NoSQLSimple queries, high scale, flexible schema
SQLComplex queries, transactions, relationships
ConceptRemember
CAP TheoremChoose 2: Consistency, Availability, Partition tolerance
Horizontal scalingAdd more machines (better)
Vertical scalingBigger machine (limited)
Fan-out on writePre-compute, fast reads, slow writes
Fan-out on readCompute on demand, fast writes, slow reads

Practice Questions

Test yourself before your interview:

1. Design a parking lot system. What are the key components and how do you handle multiple entry/exit points?

2. Design Instagram. How would you handle image storage and delivery at scale?

3. You're designing a notification system. How do you ensure notifications are delivered even if the user's device is offline?

4. Design a web crawler. How do you avoid crawling the same page twice?



Related Articles

If you found this helpful, check out these related guides:

Ready for More System Design Questions?

This is just one topic from our complete system design interview prep guide. Get access to 20+ full system design problems covering:

  • Social media (Twitter, Instagram, Facebook)
  • Messaging (WhatsApp, Slack)
  • Streaming (Netflix, YouTube)
  • E-commerce (Amazon, payment systems)
  • Infrastructure (rate limiter, URL shortener, search)

Get Full Access to All System Design Questions →

Or try our free System Design preview to see more questions like this.


Written by the EasyInterview team, based on real interview experience from 12+ years in tech and hundreds of technical interviews conducted at companies like BNY Mellon, UBS, and leading fintech firms.

Ready to ace your interview?

Get 550+ interview questions with detailed answers in our comprehensive PDF guides.

View PDF Guides