A news feed is two systems wearing one name. There is the moment a user posts, which must be cheap and instant for them - and there is the moment a follower opens the app, which must assemble a relevant, ranked feed in under a couple of hundred milliseconds. Connecting the two is a fan-out problem, and it is dominated by one brutal asymmetry: most accounts have a few hundred followers, and a handful have a hundred million. Any design that ignores that asymmetry falls apart on the first celebrity post.
This walkthrough assumes the 6-step system design framework and applies it at senior depth. It is Part 5 of a system design series.
Table of Contents
- The Problem
- Step 1 - Clarify Requirements
- Step 2 - Estimate Scale
- Step 3 - API and Data Model
- Step 4 - High-Level Design
- Step 5 - Deep Dive: Fan-Out on Write, on Read, and the Hybrid
- Step 6 - Bottlenecks and Trade-offs
- Reference Architecture
- Common Mistakes in the Interview
- Quick Reference
- Related Articles
The Problem
We are designing the home feed of a social platform: when a user opens the app, they see recent posts from the accounts they follow, in a useful order. The canonical examples are the Twitter/X home timeline and the Instagram feed.
The senior framing is that this is a read-heavy aggregation over a producer set with an extreme skew. Feed reads dominate posts, so the read path must be fast - which argues for precomputing each feed. But precomputing a feed means doing work proportional to a poster's follower count, and that count ranges over six orders of magnitude. The entire design is the search for a strategy that keeps both the write cost and the read cost bounded.
Step 1 - Clarify Requirements
Functional requirements:
- A user can publish a post.
- A user can view their home feed: recent posts from accounts they follow.
- The feed is paginated for infinite scroll.
Out of scope (name, then defer): the follow/social-graph service itself - we assume getFollowers(userId) and getFollowees(userId) exist - the media storage for post content, and the machine-learning ranking model, which we treat at the system level only.
Non-functional requirements:
- Read-heavy. Feed opens vastly outnumber posts - assume a ratio well above 50:1.
- Low read latency. A feed open should complete in roughly 200 ms.
- Eventual consistency is fine. A post taking a few seconds to reach followers' feeds is acceptable.
- Extreme fan-out skew. Most accounts have hundreds of followers; a few have tens of millions. This is the defining constraint.
Two questions to settle: chronological or ranked? Modern feeds are ranked, so we design for a ranked feed and note that chronological is the simpler subset. And followed accounts only, or recommendations too? We design the classic followed-accounts feed.
Step 2 - Estimate Scale
The arithmetic here is what exposes the celebrity problem.
Reads. Assume 500 million daily active users opening the feed ~10 times/day: 5 billion feed reads/day ≈ ~58,000 reads/sec average, perhaps ~250,000/sec at peak.
Posts. At ~0.2 posts/user/day, that is 100 million posts/day ≈ ~1,200 posts/sec average.
Fan-out amplification. With an average of ~200 followers per account, fan-out on write turns 100M posts into 100M x 200 = 20 billion feed insertions/day ≈ ~230,000 inserts/sec. That is the routine cost of pushing.
The celebrity number. An account with 100 million followers, posting once, generates 100 million feed insertions from that single post. No amount of averaging hides this - one celebrity post is four days of the entire platform's average fan-out volume. This single figure is why a pure push design fails.
Storage. Feeds store post IDs, not post bodies: ~800 entries x ~16 bytes ≈ ~13 KB per user, x 500M users ≈ ~6 TB of precomputed feed data, which lives in a fast store.
Step 3 - API and Data Model
POST /api/posts
body: { "authorId": "...", "content": "..." }
202 Accepted { "postId": "..." }
GET /api/feed?cursor=<opaque>
200 OK { "items": [ ... ], "nextCursor": "<opaque>" }The core entities:
| Entity | Key fields |
|---|---|
| Post | postId, authorId, content, createdAt - the source of truth |
| Social graph | follower-followee edges; accessed via the follow service |
| Feed | userId -> a bounded, ordered list of (postId, score) - derived, rebuildable |
The feed is best held as a per-user sorted set (post ID keyed by score), capped at ~800 entries so memory stays bounded and old items fall off.
Pagination is cursor-based, never offset-based. An offset (skip N) is meaningless on a feed whose head shifts every second - new posts push items down, so consecutive pages overlap and skip. A cursor encodes the (score, postId) of the last item seen and asks for items strictly after it, which stays correct under head insertions.
Step 4 - High-Level Design
The write path and the read path are separated by a fan-out stage and a queue.
flowchart TD
Client([Client]) -->|POST post| PS[Post Service]
PS --> PStore[(Posts Store)]
PS -->|new-post event| Q[Fan-Out Queue]
Q --> FW[Fan-Out Workers]
FW -->|getFollowers| Graph[(Social Graph)]
FW -->|insert postId| Feeds[(Feed Store<br/>per-user sorted sets)]
Client -->|GET feed| FS[Feed Service]
FS -->|read precomputed| Feeds
FS -->|pull celebrity posts| PStore
FS -->|hydrate IDs| PCache[(Post Cache)]
FS -->|rank + merge| ClientFigure 1. The architecture separates the write path (post -> store -> fan-out queue -> fan-out workers -> per-user feeds) from the read path (read precomputed feed -> pull celebrity posts -> hydrate -> rank -> return). Posting returns 202 immediately; all of the expensive fan-out work runs asynchronously through the queue, the same backpressure-friendly pattern as Part 3.
Posting is cheap: store the post, emit an event, return 202. Fan-out workers consume the event asynchronously - the same durable-queue pattern from Part 3. The feed service reads a precomputed feed, pulls a little extra, ranks, hydrates post IDs into bodies, and returns.
Step 5 - Deep Dive: Fan-Out on Write, on Read, and the Hybrid
This is the core. The question is when the feed is assembled, and the answer is "it depends on the poster" - which is the hybrid model.
Fan-out on write (push)
When a user posts, immediately insert that post's ID into the precomputed feed of every follower.
The feed read becomes trivial - the feed is already assembled, so reading it is a single sorted-set lookup. The cost moves to the write: a post with F followers is F insertions, done asynchronously by fan-out workers off a queue. For the overwhelming majority of accounts - hundreds of followers - this is exactly the right trade, because it spends cheap, deferrable write work to make the hot, latency-critical read path nearly free.
It breaks on two cases. Celebrities: 100M followers means 100M insertions per post - enormous write amplification and minutes of propagation lag. And inactive followers: pushing into the feeds of users who will not open the app for weeks is pure wasted work.
Fan-out on read (pull)
When a user opens their feed, query the recent posts of every account they follow, then merge and rank.
Now the write is trivial - just store the post - and celebrities cost nothing extra to post. But the read explodes: a user following 1,000 accounts triggers a 1,000-way scatter-gather on the hot path, every single feed open. Since reads dominate by 50:1 or more, paying the cost there is backwards as a default.
flowchart LR
subgraph Push["Fan-out on write (push)"]
P1[User posts] -->|F insertions now| P2[Every follower feed]
P3[Reader opens feed] -->|1 read| P4[Done - precomputed]
end
subgraph Pull["Fan-out on read (pull)"]
L1[User posts] -->|1 write| L2[Posts store]
L3[Reader opens feed] -->|N queries| L4[Merge + rank now]
endFigure 2. The two pure strategies side by side and where each one breaks. Push pays the cost at write time and is fast on read; pull pays it on read and is slow there. Since reads outnumber writes by 50:1 or more, pull is wrong as a default - and push is wrong for a celebrity. Neither alone works, which is what motivates the hybrid model.
The hybrid model
Neither pure strategy works; the senior answer combines them by poster type:
- Normal accounts (below a follower threshold) use fan-out on write. Their posts push into followers' precomputed feeds.
- Celebrity accounts (above the threshold) are skipped by fan-out. Their posts are not pushed anywhere.
- At feed-read time, the feed service reads the user's precomputed feed and pulls recent posts from the handful of celebrities that user follows, then merges and ranks the two sets.
This bounds both costs. Writes never explode, because the accounts with explosive follower counts are exactly the ones excluded from push. Reads never explode, because a user follows only a few celebrities - a pull of ~5 celebrity timelines plus one precomputed-feed read, not a 1,000-way scatter-gather.
sequenceDiagram
participant U as User
participant FS as Feed Service
participant F as Feed Store
participant P as Posts Store
U->>FS: GET /feed
FS->>F: read precomputed feed (push portion)
F-->>FS: post IDs from normal accounts
FS->>P: pull recent posts from followed celebrities
P-->>FS: celebrity post IDs
Note over FS: merge + rank both sets
FS-->>U: ranked, hydrated feed pageFigure 3. The hybrid feed read in action. The reader's precomputed feed (filled by normal-account fan-out) is combined with on-demand pulls from the handful of celebrities they follow, then ranked and returned. A user with five celebrity follows pays five extra reads, not a thousand-way scatter-gather - this bounded cost is what makes the hybrid work.
The threshold is the tuning knob. Lower it and fewer giant fan-outs occur, but more accounts are pulled at read time; raise it and the reverse. It is set from the platform's actual follower distribution, and a senior answer says so rather than quoting a magic number.
Ranking and feed assembly
A chronological feed just sorts by createdAt. A ranked feed scores each candidate - recency, the viewer's affinity with the author, predicted engagement - and orders by score. Ranking runs at read time over a bounded candidate set (the precomputed entries plus celebrity pulls), so the score can use fresh engagement data; the precomputation's job is only to keep that candidate set small, not to finalise the order. The author's own just-posted item is injected directly into their feed response so they get read-your-writes consistency without waiting for fan-out.
Consistency model
The feed is eventually consistent: fan-out workers drain their queue over seconds, so a post reaches followers shortly after publication, not instantly - and for a feed that is fine. The deliberate exception is the author's own view, handled by direct injection as above. The feed store itself is derived data: it can always be rebuilt by pulling from the posts store and social graph, so it is a rebuildable cache, not a source of truth.
Failure modes
- Fan-out backlog. A surge of posting grows the fan-out queue and propagation lag rises. The hybrid model already removes the worst offenders (celebrities) from the queue; beyond that, the queue absorbs the spike and workers autoscale - the Part 3 backpressure pattern.
- Feed store node loss. Because the feed is derived, a lost shard degrades affected users to pull-based assembly while their feeds are rebuilt - degradation, not data loss.
- Celebrity post thundering herd. Millions opening their feeds right after a celebrity posts all pull and hydrate the same post. That is a hot key on the post object, solved by the post cache - the caching and hot-key techniques from Part 4.
Multi-region
Feeds are regional - each region holds its users' feed stores, with region affinity by userId. The posts store and social graph are globally replicated. Fan-out workers in each region consume a global new-post event stream, so a post by an author in one region reaches followers in every region. Celebrity pulls hit the local posts replica.
Evolution path
| Stage | Approach |
|---|---|
| Launch | Pure fan-out on read - simplest, fine when follower counts are small |
| Growth | Add fan-out on write so the hot read path is precomputed |
| Scale | Hybrid by poster type, ranked feed, multi-region feed stores |
Adopt cursor-based pagination from day one - offset pagination is a trap that is painful to undo - and keep posts and feeds as separate stores. Defer the celebrity hybrid, the ranking model, and multi-region until follower skew and traffic force them.
Observability
Track feed-load p99 (the headline), fan-out queue depth and lag (this is the propagation-delay metric), fan-out write rate, feed-store hit ratio, ranking latency, and the post count per celebrity tier. Reasonable SLOs: 99% of feed loads under 200 ms, and 99% of posts visible to followers within 30 seconds.
Step 6 - Bottlenecks and Trade-offs
- Celebrity fan-out is the defining bottleneck - unbounded write amplification, resolved only by excluding celebrities from push.
- Feed read latency is kept low by precomputing the common case, which is the entire reason push exists.
- Feed store memory is bounded by capping each feed at a few hundred entries.
- Ranking cost is contained by ranking only a bounded candidate set, never the whole follow graph.
- Pagination stability under a shifting feed head requires cursors, and a ranked feed may additionally snapshot the candidate set per session.
Reference Architecture
The pattern this problem teaches, reusable well beyond feeds:
Precompute the expensive read for the common case (fan-out on write), compute on read for the skewed tail (fan-out on read), and merge the two - a hybrid that keeps both the write cost and the read cost bounded.
flowchart LR
subgraph Common["Common case - precomputed"]
C1[Post] -->|push, async| C2[(Per-user feeds)]
end
subgraph Tail["Skewed tail - computed on read"]
T1[Celebrity post] --> T2[(Posts store)]
end
Read[Feed read] --> C2
Read --> T2
Read --> Merge[Merge + rank]Figure 4. The reference architecture as two flows ending at a merge. The common case is precomputed for cheap reads; the skewed tail is computed on demand; the merge brings them together at read time. Whenever a read-heavy aggregation has a bimodal producer distribution - a few prolific producers and many small ones - this shape applies.
The same shape recurs wherever a read-heavy aggregation faces a skewed producer distribution: a notification inbox, an activity stream, a "latest from people you follow" panel. Precompute for the many, compute on demand for the few, and merge - rather than forcing one strategy onto a workload that has two populations.
Common Mistakes in the Interview
- Choosing pure push or pure pull and never confronting the celebrity problem.
- Offset-based pagination on a feed whose head shifts continuously, producing duplicates and gaps.
- Pushing into inactive users' feeds, spending write work on feeds no one will read.
- Fanning out synchronously in the post request path instead of off a queue.
- Treating the feed store as the source of truth rather than rebuildable derived data.
- Forgetting read-your-writes for the author's own post.
- Storing full post bodies in every feed instead of IDs hydrated at read time.
Quick Reference
| Topic | Key Point |
|---|---|
| Core pattern | Hybrid fan-out: push for normal accounts, pull for celebrities, merge at read |
| Fan-out on write | Fast reads, write cost proportional to follower count - the common-case default |
| Fan-out on read | Cheap writes, expensive scatter-gather reads - right only for the skewed tail |
| Celebrity problem | One post = millions of writes; fix by excluding celebrities from push |
| Threshold | Follower count that splits push from pull; tuned from the real distribution |
| Ranking | At read time, over a bounded candidate set so scores stay fresh |
| Pagination | Cursor-based on (score, postId); offset breaks on a shifting head |
| Feed store | Per-user capped sorted set; derived, rebuildable - not the source of truth |
| Consistency | Eventually consistent; inject the author's own post for read-your-writes |
| Multi-region | Regional feed stores; global post-event stream feeds fan-out everywhere |
Related Articles
- System Design Interview Problems: A Senior's Roadmap - the full series index and pattern library.
- System Design Interview Guide: The 6-Step Framework - the method this walkthrough applies.
- Design a Notification Service - Part 3; the durable-queue fan-out pattern reused here.
- Design a Distributed Cache - Part 4; post hydration and hot-key handling for viral posts.
- Design a Chat System - Part 6; the large-group fan-out reappears as broadcast channels.
This is Part 5 of a 12-part system design series where each post solves one problem around one core pattern. Next: Design a Chat System.
