Low-Level Design of Zoom System Design
Let us now discuss about the low-level design of zoom system design
Below is the explanation of the above low-level design image:
- Everything in orange is a user interface. Most likely these will be mobile applications.
- Things in Grey are the load balancers + reverse proxy + authentication authorisation layer.
- Things in blue are the web services or the UDP services that we have developed.
- And things in pink would be the databases, data stores or some kind of clusters that we will use.
When a user, let’s say U1, wants to start a call with another user, U2, the process involves several backend components working together seamlessly.
1. WebSocket Handler
- It maintains live connections with active users and facilitates bidirectional communication between users and server.
- It utilizes WebSocket technology for persistent connections and handles incoming messages and route them to appropriate recipients.
- We can deploy multiple WebSocket Handlers behind a load balancer to distribute incoming connections evenly. And better to implement connection pooling and efficient message routing algorithms to handle high traffic.
2. WebSocket Manager
- This can manage mapping between WebSocket Handler machines and users.
- Also ensures correct routing of messages between users and WebSocket Handlers.
- It maintain a distributed data store to store mappings efficiently and utilizes consistent hashing or another suitable algorithm to distribute mappings evenly.
- To design for horizontal scalability to handle increasing numbers of users and WebSocket Handlers.
- Implement strategies for fault tolerance and recovery in case of failures.
3. Signalling Service
- Signalling service initiates and coordinate communication between users.
- It checks for call conditions and coordinate with User Service and implements APIs for call initiation, termination, and status updates.
- Integrates with User Service for user authentication and authorization.
- It Ensure the service can handle concurrent call requests efficiently. Also, uses asynchronous processing and message queues for scalability and fault tolerance.
4. User Service
- User Service is a repository for user data. It handles user authentication, authorization, and access control.
- It utilizes a database to store user information securely.
- This implements APIs for user registration, login, and profile management.
- We can design for horizontal scalability to handle increasing user base. And by using caching mechanisms we can reduce database load and improve performance.
5. Connector (STUN Server)
- This assists users in discovering their publicly accessible IP addresses. Facilitate peer-to-peer connection establishment.
- Implements STUN protocol for IP address discovery and integrates with WebSocket Handler and Signalling Service for communication.
- This deploys multiple instances of the STUN Server for redundancy and load distribution. We can monitor and scale resources based on demand to ensure availability.
6. Handshake for Connection Details
- Exchanges information about available bitrate, codec support, and bandwidth between users.
- Defines protocols and message formats for exchanging connection details.
- This ensures compatibility between different clients and devices.
- We can optimize message formats and protocols for efficiency to handle high message throughput.
7. Establishing Peer-to-Peer Connection
- We can establish a direct connection between users for real-time communication by enabling packet exchange for video call transmission.
- It utilizes WebRTC technology for peer-to-peer communication.
- Implements NAT traversal techniques for connectivity across different network configurations.
- This monitors connection and adjust resources dynamically to maintain optimal performance.
8. Fallback to TURN Server
- Act as an intermediary for relaying messages between users when peer-to-peer connection fails.
- Deploys TURN Server instances for relaying messages.
- By integrating with WebSocket Handler and Signalling Service for fallback mechanism.
- Ensures TURN Server instances are deployed in geographically distributed locations for low latency.
- We should monitor server load and scale resources as needed to handle increased traffic.
9. Handling Bandwidth Changes
- Log events into Kafka for processing when bandwidth fluctuates during calls.
- Defines event schemas and topics for logging bandwidth changes.
- Implements Kafka producers for publishing events.
- By implementing partitioning and replication strategies for fault tolerance and high availability.
Important Scenarios
- Group Conversations:
- It is Peer-to-peer for small groups, Call Server for large groups. Transcoding for different user bandwidths and codecs. Analytics events logged. Dynamically adjusts bandwidth.
- The clients can dynamically switch from peer-to-peer to Call Server
- Recording:
- Logger service records chunks of conversation. File created and stored in distributed file systems. Notifications will be received by users with link of recording.
- For Live video:
- Aggregates video and audio inputs from cameras and microphones and transcoders convert input streams for different devices.
- Call Servers receive the transcoded streams and distribute them via Content Delivery Networks (CDNs) they handle session manangement, user authentication etc.
- WebSocket Manager Coordinates communication between different Call Servers and manages the WebSocket connections between clients and servers.
- Provides real time data exchange and provide fault tolerance and load balance to maintain the performance as well.
Here Call Server is close to the users, soo that latency would be minimized. From Call Sever to users, there could be a lot of hops, we want to minimize the number of hops over here because this is where the data is getting replicated multiple times.
Designing Zoom | System Design
Creating an app like Zoom may seem simple from the user’s perspective, but in reality, it’s a complex task involving hundreds of software engineers working for years. Zoom, like other similar apps, requires careful planning and design to provide seamless video conferencing services worldwide. This article explains how Zoom works and how it handles a lot of cases.
Important Topics for the Zoom System Design
- Requirements of Zoom System Design
- Capacity Estimation
- High-Level Design of Zoom System Design
- Low-Level Design of Zoom System Design
- Microservices used in Zoom System Design
- API Design of Zoom System Design
- Database Design of Zoom System Design
- How Zoom handle Scalability?