Part 5 · Case Studies 📖 ১৫ মিনিট পড়া 📝 ২০টি কুইজ

Case Study: Twitter / X

Newsfeed generation — system design-এর crown jewel।

📝 কুইজে যান

Elon Musk একটি tweet করলেন। ১৫০ million followers এর timeline-এ সেটি কীভাবে দ্রুত পৌঁছায়? অথচ নতুন user-এর কাছে timeline 200ms-এ load হয়? Twitter-এর architecture system design interview-এর সবচেয়ে নাটকীয় case study।

Step 1: Requirements

Functional

  • Tweet post (280 chars + media)।
  • Follow/unfollow।
  • Home timeline (followee-দের tweet)।
  • User timeline (নিজের tweet)।
  • Search / hashtag।
  • Retweet, like, reply।

Non-Functional

  • Read-heavy (100:1)।
  • Timeline load <200ms।
  • Eventually consistent OK (1-2 sec)।
  • High availability।

Step 2: Capacity Estimation

DAU: 250M Tweets/day: 500M (avg 2/user) Tweets/sec: 500M / 86400 ≈ 5,800 writes/sec Read QPS: 5,800 × 100 = 580K timeline reads/sec Per tweet: ~1KB Daily tweet storage: 500GB Timeline cache (per user 800 tweets): 1KB × 800 × 250M = 200TB

Step 3: API Design

POST /tweet { text, mediaUrls? } GET /timeline?cursor=X&limit=20 POST /follow { userId } GET /user/:id/tweets GET /search?q=...

Step 4: Data Model

User: { id, name, handle, follower_count, ... } Tweet: { id (Snowflake), user_id, text, media_urls[], created_at, reply_to, retweet_of } Follow: { follower_id, followee_id, ts } Timeline (cache): { user_id, tweet_ids[] (latest 800) } Engagement: { tweet_id, likes, retweets, replies }

The Big Question: Timeline Generation

Three approaches:

Approach 1: Pull (Read-time fan-out)

User timeline দেখার সময় — সব followee-এর recent tweet fetch + sort।

  • Pros: Storage কম, write fast।
  • Cons: Read slow — N user-এর data fetch।
  • Use case: Inactive user, low-follow user।

Approach 2: Push (Write-time fan-out)

Tweet post করার সময় — সব followers-এর timeline cache-এ inject।

  • Pros: Read super fast — pre-computed।
  • Cons: Write expensive — celebrity তে disaster।
  • Celebrity problem: Elon Musk-এর tweet → 150M timeline write।

Approach 3: Hybrid (Twitter's Choice)

Most users push; celebrities pull।

  • User < 1M followers → push।
  • Celebrity (1M+) → pull at read time।
  • User-এর timeline = pre-computed timeline + celebrity-দের live fetch + merge।

Step 5: Architecture

[Client] ↓ [CDN] [LB] ↓ [API Gateway] ↓ ↓ ↓ [Tweet Service] [Timeline Service] [User Service] ↓ ↓ ↓ [Kafka] [Redis Timeline Cache] [User DB] ↓ [Fan-out Workers] → [Followers' Timelines] ↓ [Tweet Storage] (Cassandra/Manhattan) [Search Index] (Elasticsearch)

Step 6: Components

Tweet Storage

  • Cassandra/Manhattan (Twitter-এর internal)।
  • Sharded by user_id।
  • Snowflake ID — time-ordered।

Timeline Cache (Redis)

  • Per-user latest 800 tweet IDs।
  • Sorted by time।
  • Eviction: inactive user (3 days no login)।

Fan-out Service

  1. New tweet → Kafka event।
  2. Worker tweet-এর author-এর followers fetch।
  3. Each follower-এর timeline cache-এ tweet ID prepend।
  4. Celebrity skip — runtime-এ merge।

Search

  • Tweet → Elasticsearch index।
  • Hashtag, full-text search।

Celebrity Problem in Detail

Elon Musk tweet posts:

Pure Push

  • 150M timeline write
  • Massive backend load
  • Slow follower delivery
  • Storage explosion

Hybrid Approach

  • Celebrity tweet → no fan-out
  • Followers' read-time merge
  • Cache hit at celebrity level
  • Manageable cost
  • Stream processing (Storm/Heron)।
  • Hashtag count over sliding window।
  • Real-time + decay function।
  • Geographic trending।

Scale Considerations

Read Path

  • Multi-tier cache (CDN → Redis → DB)।
  • Connection pooling।
  • Pagination।

Write Path

  • Async fan-out via Kafka।
  • Eventual consistency 1-2 sec OK।

Storage

  • Tweet sharded by user_id।
  • Old tweets archive (cold storage)।

Real World

  • ২৫০M+ DAU।
  • ৫০০M+ tweets/day।
  • Manhattan — Twitter-এর internal distributed DB।
  • Heron — stream processing।
  • Mesos — container orchestration।

Trade-offs

  • Push: write-heavy, fast read, celebrity disaster।
  • Pull: read-heavy, scalable write, slow read for active user।
  • Hybrid: complexity but right for both।
  • Eventually consistent: tweet 1-2 sec late = fine।

Engineering Lessons

  1. Hybrid approach often best for skewed distributions।
  2. Pre-computation trades storage for speed।
  3. Eventually consistent OK for social।
  4. Identify edge cases (celebrity)।
  5. Multi-tier caching essential।

📌 চ্যাপ্টার সারমর্ম

  • Twitter timeline = read-heavy (100:1)।
  • Pull, Push, Hybrid — three strategies।
  • Hybrid: normal user push, celebrity pull।
  • Redis timeline cache + Cassandra storage।
  • Async fan-out via Kafka।