High Level Design for Twitter System Design

We will discuss about high level design for twitter,

6.1 Architecture:

For twitter we are using microservices architecture since it will make it easier to horizontally scale and decouple our services. Each service will have ownership of its own data model. We will divide our system into some cores services.

6.2 User Services

This service handles user related concern such as authentication and user information. Login Page, Sign Up page, Profile Page and Home page will be handle into User services.

6.3 Newsfeed Service:

This service will handle the generation and publishing of user newsfeed. We will discuss about newsfeed in more details. When it comes to the newsfeed, it seems easy enough to implement, but there are a lot of things that can make or break this features. So, let’s divide our problem into two parts:

6.3.1 Generation:

Let’s assume we want to generate the feed for user A, we will perform the following steps:

  • Retrieve the ID’s of all the users and the enitities( hashtags, topics, etc.)
  • Fetch the relevant algorithm to rank the tweets on paramaters such as relevance, time management, etc.
  • Use a ranking algorithm to rank the tweets based on parameters such as relevance, time, engagement, etc.
  • Return the ranked tweets data to the client in a paginated manner.

Feed geneartion is an intensive process and can take quite a lot of time, especially for users following a lot of people. To imporve the performance, the feed can be pre-generated and stored in the cache, then we can have a mechanism to periodically update the feed and apply or ranking algorithm to the new tweets.

6.3.2 Publishing

Publishing is the step where the feed data us pushed according to each specify user. This can be a quite heavy operation, as a user may have million of friend or followers. To deal with this, we have three different approcahes:

  • Pull Model ( or Fan-out on load)
    • When a user creates a tweet, and a follower reloads their newsfeed, the feed is created and stored in memory.
    • The most recent feed is only loaded when the user requests. This approcah reduces the number of write operation on our database.
    • The downside of this approach is that the users will not be able to view recent feeds unless they “pull” the data from the server, which will increase the number of read operation on the server.

  • Push Model( or Fan-out on write)
    • In this model, once a user creates a tweet, it is pushed to all the followers feed.
    • This prevents the system from having to go through a user’s entire followers list to check for updates.
    • However, the downside of this approach is that it would increase the number of write operation on the databases.

  • Hybrid Model:
    • A third approach is a hybrid model between the pull and push model.
    • It combines the beneficial features of the above two models and tries to provide a balanced approach between the two.
    • The hybrid model allows only users with a lesser number of followers to use the push model.
    • For users with a higher number of followers such as celebrities, the pull model is used.

6.4 Tweet service:

The tweet service handle tweet-related use case such as posting a tweet, favorites, etc.

6.5 Retweets :

Retweets are one of our extended requirements. To implement this feature, we can simply create a new tweet with the user id of the user retweeting the original tweet and then modify the type enum and content property of the new tweet to link it with the original tweet.

6.6 Search Service:

This service is responsible for handling search related functionality. In search service we get the Top post, latest post etc. These things we get because of ranking.

6.7 Media Service:

This service will handle the media(images, videos, files etc.) uploads.

6.8 Analytics Service:

This service will be use for metrics and analytics use cases.

6.9 Ranking Algorithm:

We will need a ranking algorithm to rank each tweet according to its relevance to each specific user.

Example: Facebook used to utilize an EdgeRank algorithm. Here, the rank of each feed item is described by:

Rank = Affinity * Weight * Decay

Where,

  • Affinity: is the “closeness” of the user to the creator of the edge. If a user frequently likes, comments, or messages the edge creator, then the value of affinity will be higher, resulting in a higher rank for the post.
  • Weight: is the value assigned according to each edge. A comment can have a higher weightage than likes, and thus a post with more comments is more likely to get a higher rank.
  • Decay: is the measure of the creation of the edge. The older the edge, the lesser will be the value of decay and eventually the rank.

Now a days, algorithms are much more complex and ranking is done using machine learning models which can take thousands of factors into consideration.

6.10 Search Service

  • Sometimes traditional DBMS are not performant enough, we need something which allows us to store, search, and analyze huge volumes of data quickly and in near real-time and give results within milliseconds. Elasticsearch can help us with this use case.
  • Elasticsearch is a distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. It is built on top of Apache Lucene.

6.11 How do we identify trending topics?

  • Trending functionality will be based on top of the search functionality.
  • We can cache the most frequently searched queries, hashtags, and topics in the last N seconds and update them every M seconds using some sort of batch job mechanism.
  • Our ranking algorithm can also be applied to the trending topics to give them more weight and personalize them for the user.

6.12 Notifications Service:

  • Push notifications are an integral part of any social media platform.
  • We can use a message queue or a message broker such as Apache Kafka with the notification service to dispatch requests to Firebase Cloud Messaging (FCM) or Apple Push Notification Service (APNS) which will handle the delivery of the push notifications to user devices.

Designing Twitter – A System Design Interview Question

Designing Twitter (or Facebook feed or Facebook search..) is a quite common question that interviewers ask candidates. A lot of candidates get afraid of this round more than the coding round because they don’t get an idea of what topics and tradeoffs they should cover within this limited timeframe.

Important Topics for Designing Twitter

  • How Would You Design Twitter?
  • Requirements for Twitter System Design
  • Capacity Estimation for Twitter System Design
  • Use Case Design for Twitter System Design
  • Low Level Design for Twitter System Design
  • High Level Design for Twitter System Design
  • Data Model Design for Twitter System Design
  • API Design for Twitter System Design
  • Microservices Used for Twitter System Design
  • Scalability for Twitter System Design

Similar Reads

1. How Would You Design Twitter?

Don’t jump into the technical details immediately when you are asked this question in your interviews. Do not run in one direction, it will just create confusion between you and the interviewer. Most of the candidates make mistakes here and immediately they start listing out some bunch of tools or frameworks like MongoDB, Bootstrap, MapReduce, etc....

2. Requirements for Twitter System Design

2.1 Functional Requirements:...

3. Capacity Estimation for Twitter System Design

To estimate the system’s capacity, we need to analyze the expected daily click rate....

4. Use Case Design for Twitter System Design

...

5. Low Level Design for Twitter System Design

A low-level design of Twitter dives into the details of individual components and functionalities. Here’s a breakdown of some key aspects:...

6. High Level Design for Twitter System Design

We will discuss about high level design for twitter,...

7. Data Model Design for Twitter System Design

This is the general Dara model which reflects our requirements....

8. API Design for Twitter System Design

A basic API design for our services:...

9. Microservices Used for Twitter System Design

9.1 Data Partitioning...

10. Scalability for Twitter System Design

Let us identify and resolve Scalability such as single points of failure in our design:...

11. Conclusion

Twitter handles thousands of tweets per second so you can’t have just one big system or table to handle all the data so it should be handled through a distributed approach. Twitter uses the strategy of scatter and gather where it set up multiple servers or data centers that allow indexing. When Twitter gets a query (let’s say #geeksforgeeks) it sends the query to all the servers or data centers and it queries every Early Bird shard. All the early bird that matches with the query return the result. The results are returned, sorted, merged, and reranked. The ranking is done based on the number of retweets, replies, and the popularity of the tweets....