Skip to content
What it takes to build real-time voice and video infrastructure

In this series, WebRTC expert Tsahi Levent-Levi of provides an overview of the essential parts of a real-time voice and video infrastructure—from network to software updates. Check out his informative videos and read how Agora’s platform solves the challenges so you can focus on your innovation.

8.1 Putting It All Together

Watch time:
Category: Chapter 8: Summary

Bring all of the ideas from this video series together.

Dive deeper: Read how Agora’s RTE platform provides real-time, reliable, worldwide coverage with our intelligent network and cross-platform SDKs.


We are now at the end of our course and it is time to put everything together and see what we’ve learned. Let’s look at the chapters that we had. 

Read the full transcript

We started with an introduction, understanding what real time engagement is, how it works,  and what are the main requirements there. We’ve gone through the challenges of building such a platform—the things that we will need to deal with when developing the infrastructure and maintaining it. We’ve discussed about clients and devices, what users actually use on a day to day basis to communicate with each other in a real-time engagement platform. From there, we started looking at the backend infrastructure and started from the media servers. These are the servers that are needed in order to support group calls, for example, recording more broadcasts. Switching from there, we went to media processing…how we actually communicate and send media and what it means, what it does in the network and how we maintain quality. We then went ahead to discuss signaling how we find each other in such a network, and how we signal the things that we want to do. Last but not least, we talked about maintenance. What happens in the long run when we have a platform, but now we need to maintain it over time. Let’s see what happened with us we in each of these chapters.  

We started by looking at the architecture and trying to understand the components there. So we had the devices, either mobile devices, embedded devices or browsers. Web browsers are especially important if we plan to use WebRTC or vice versa. WebRTC is very important if we plan to have our services supported by a browser. We then started looking at the servers that we have. We have an application server that deals with application logic for our real-time engagement application. Then there needs to be a signaling server usually decoupled from the application server, but it can be at the same location or on the same server itself. We have NAT traversal servers that are in charge of making sure that the media that we’re sending can reach the destination and pass through firewalls and NATs. NATs, if you remember are network address translators. And then we have media servers.  

In most production settings, we will need media servers, especially if what we’re doing is group calling or broadcasting or recording. We started by looking at the clients. We had the WebRTC clients in the browsers and native clients on devices, embedded PCs or wherever we want it running outside of a browser. We’ve seen the differences between them and the challenges that they bring, and what we need to think, work, and act in order to make them work well for us, for example, on Native devices, we must match make them up to pace and keep them at pace with what the WebRTC in the browser is running to make sure that they are interoperable  to deal with multiple types of devices.  

We looked at media processing. We’ve seen that there are essentially three main architectures, a mesh architecture where everyone talks to everyone else immediately, at the same time. Which is like a bad architecture because it doesn’t scale on a single call. We’ve seen an MCU architecture where we mix the contents in a centralized server, which is great for the devices, but it is very expensive on the server side in addition to being less flexible. Then we’ve seen an SFU, or selective forwarding unit. Here, all of the clients send their media to the server, and the media server routes that to the client is needed to treat the session as a whole. It’s a lot more complicated than the other two. But it’s also a lot more flexible and this is the way today people build their platforms.  

Then we talked about media servers. How locating them is going to affect the distance and the latencies, and how we need to make sure that we route them properly. We’ve seen the dynamic devices that media servers need to deal with from different with different capabilities and on different networks. We’ve looked at bandwidth estimation and the fact that if we don’t estimate well, media quality is going to suffer and the need to estimate well and the challenges there. We’ve also dealt with capacity planning. How do we take these sessions and then allocate them to the different media servers that we have in our deployment because we’re going to have a lot more than one. 

When we talked about signaling, it was mainly about features and scale. Features means things that we need before the session, during the session, after the session and also unrelated to the session like chat capabilities. With scale, it was about handling the different ways of asynchronous messaging, and how we need to deal when we scale out signaling servers, which have to be stateful in their nature.  

We finished off with maintenance, were the things that we’ve touched were monitoring and optimization of the platform, the browser changes and the need to maintain that new devices coming out to market and asking to make sure that we run on them. Also the hassles of updates and upgrades and security patches that we need to put in our platform, both inside the infrastructure, and in our native applications and with the browser changes that we have. So if you are going to be build a real-time engagement platform, there’s a lot of work ahead of you. A lot of the work is really, really fun and really, really challenging. So, I’d like to thank you for joining me and good luck with your journey with real time engagement.