Low-Level Design of Zoom System Design

Let us now discuss about the low-level design of zoom system design

Below is the explanation of the above low-level design image:

  • Everything in orange is a user interface. Most likely these will be mobile applications.
  • Things in Grey are the load balancers + reverse proxy + authentication authorisation layer.
  • Things in blue are the web services or the UDP services that we have developed.
  • And things in pink would be the databases, data stores or some kind of clusters that we will use.

When a user, let’s say U1, wants to start a call with another user, U2, the process involves several backend components working together seamlessly.

1. WebSocket Handler

  • It maintains live connections with active users and facilitates bidirectional communication between users and server.
  • It utilizes WebSocket technology for persistent connections and handles incoming messages and route them to appropriate recipients.
  • We can deploy multiple WebSocket Handlers behind a load balancer to distribute incoming connections evenly. And better to implement connection pooling and efficient message routing algorithms to handle high traffic.

2. WebSocket Manager

  • This can manage mapping between WebSocket Handler machines and users.
  • Also ensures correct routing of messages between users and WebSocket Handlers.
  • It maintain a distributed data store to store mappings efficiently and utilizes consistent hashing or another suitable algorithm to distribute mappings evenly.
  • To design for horizontal scalability to handle increasing numbers of users and WebSocket Handlers.
  • Implement strategies for fault tolerance and recovery in case of failures.

3. Signalling Service

  • Signalling service initiates and coordinate communication between users.
  • It checks for call conditions and coordinate with User Service and implements APIs for call initiation, termination, and status updates.
  • Integrates with User Service for user authentication and authorization.
  • It Ensure the service can handle concurrent call requests efficiently. Also, uses asynchronous processing and message queues for scalability and fault tolerance.

4. User Service

  • User Service is a repository for user data. It handles user authentication, authorization, and access control.
  • It utilizes a database to store user information securely.
  • This implements APIs for user registration, login, and profile management.
  • We can design for horizontal scalability to handle increasing user base. And by using caching mechanisms we can reduce database load and improve performance.

5. Connector (STUN Server)

  • This assists users in discovering their publicly accessible IP addresses. Facilitate peer-to-peer connection establishment.
  • Implements STUN protocol for IP address discovery and integrates with WebSocket Handler and Signalling Service for communication.
  • This deploys multiple instances of the STUN Server for redundancy and load distribution. We can monitor and scale resources based on demand to ensure availability.

6. Handshake for Connection Details

  • Exchanges information about available bitrate, codec support, and bandwidth between users.
  • Defines protocols and message formats for exchanging connection details.
  • This ensures compatibility between different clients and devices.
  • We can optimize message formats and protocols for efficiency to handle high message throughput.

7. Establishing Peer-to-Peer Connection

  • We can establish a direct connection between users for real-time communication by enabling packet exchange for video call transmission.
  • It utilizes WebRTC technology for peer-to-peer communication.
  • Implements NAT traversal techniques for connectivity across different network configurations.
  • This monitors connection and adjust resources dynamically to maintain optimal performance.

8. Fallback to TURN Server

  • Act as an intermediary for relaying messages between users when peer-to-peer connection fails.
  • Deploys TURN Server instances for relaying messages.
  • By integrating with WebSocket Handler and Signalling Service for fallback mechanism.
  • Ensures TURN Server instances are deployed in geographically distributed locations for low latency.
  • We should monitor server load and scale resources as needed to handle increased traffic.

9. Handling Bandwidth Changes

  • Log events into Kafka for processing when bandwidth fluctuates during calls.
  • Defines event schemas and topics for logging bandwidth changes.
  • Implements Kafka producers for publishing events.
  • By implementing partitioning and replication strategies for fault tolerance and high availability.

Important Scenarios

  1. Group Conversations:
    • It is Peer-to-peer for small groups, Call Server for large groups. Transcoding for different user bandwidths and codecs. Analytics events logged. Dynamically adjusts bandwidth.
    • The clients can dynamically switch from peer-to-peer to Call Server
  2. Recording:
    • Logger service records chunks of conversation. File created and stored in distributed file systems. Notifications will be received by users with link of recording.
  3. For Live video:
    • Aggregates video and audio inputs from cameras and microphones and transcoders convert input streams for different devices.
    • Call Servers receive the transcoded streams and distribute them via Content Delivery Networks (CDNs) they handle session manangement, user authentication etc.
    • WebSocket Manager Coordinates communication between different Call Servers and manages the WebSocket connections between clients and servers.
    • Provides real time data exchange and provide fault tolerance and load balance to maintain the performance as well.

Here Call Server is close to the users, soo that latency would be minimized. From Call Sever to users, there could be a lot of hops, we want to minimize the number of hops over here because this is where the data is getting replicated multiple times.

Designing Zoom | System Design

Creating an app like Zoom may seem simple from the user’s perspective, but in reality, it’s a complex task involving hundreds of software engineers working for years. Zoom, like other similar apps, requires careful planning and design to provide seamless video conferencing services worldwide. This article explains how Zoom works and how it handles a lot of cases.

Important Topics for the Zoom System Design

  • Requirements of Zoom System Design
  • Capacity Estimation
  • High-Level Design of Zoom System Design
  • Low-Level Design of Zoom System Design
  • Microservices used in Zoom System Design
  • API Design of Zoom System Design
  • Database Design of Zoom System Design
  • How Zoom handle Scalability?

Similar Reads

1. Requirements of Zoom System Design

1.1 Functional Requirements of Zoom System Design...

2. Capacity Estimation

Let’s assume we have 1 billion users. Assuming 1 billion users with 100 million group video calls daily, the Zoom App needs to handle approximately 58,000 requests per second to provide a scalable backend....

3. High-Level Design of Zoom System Design

At the heart of Zoom’s success is its robust infrastructure, which includes key features like Zoom clients, distributed data centers, web infrastructure, and new technologies like HTTP tunnels Let’s explore how each feature has contributed to Zoom’s impressive growth and we have overcome the challenges....

4. Low-Level Design of Zoom System Design

Let us now discuss about the low-level design of zoom system design...

5. Microservices used in Zoom System Design

Zoom’s structure has user management service, meeting scheduler service, video streaming services, chat services, record management services, notification services and so on. Some of them are mentioned below:...

6. API Design of Zoom System Design

...

7. Database Design of Zoom System Design

Zoom’s database design is about user management and recording functionalities. A user table that should store necessary user information, while a separate recording table manages recorded sessions and required information. Permissions are set to record access tables to facilitate user access to recordings....

8. How does Zoom handle Scalability?

Zoom’s architecture distributes meetings across its data center network, allowing users to join meetings via the closest data center, ensuring scalability and a reliable video experience for large gatherings. Unlike legacy systems that rely on resource-intensive Multipoint Control Units (MCUs), Zoom’s multimedia routing delivers multiple video streams directly to clients, reducing computing requirements and enabling scalability for meetings with thousands of participants. Each video stream in Zoom can adjust to multiple resolutions, eliminating the need for separate encoding and decoding processes for each endpoint. This optimization enhances performance and scalability while providing varying levels of video quality based on device capabilities and network conditions. Zoom’s quality-of-service application layer optimizes video, audio, and screen-sharing experiences based on each device’s capabilities and available bandwidth. This proactive approach ensures the best possible user experience across diverse network conditions. With support for distributed architecture and multimedia routing, Zoom can accommodate meetings with thousands of participants, ensuring seamless video and audio communication for large-scale events....

9. Conclusion

Zoom’s design encompasses numerous components and strategies to ensure seamless, reliable, and scalable video communication services for its extensive user base. Its key focus on efficiency, fault tolerance, and adaptability positions Zoom as a leading platform in modern video conferencing, providing an outstanding communication experience....