Skip to content
What it takes to build real-time voice and video infrastructure

In this series, WebRTC expert Tsahi Levent-Levi of provides an overview of the essential parts of a real-time voice and video infrastructure—from network to software updates. Check out his informative videos and read how Agora’s platform solves the challenges so you can focus on your innovation.

2.1 Network

Watch time:
Category: Chapter 2: Challenges

The biggest challenge in delivering a quality real-time experience is often the internet itself. Explore the potential pitfalls of network transmission.

Dive deeper: Read how a software-defined and managed virtual network, can significantly outperform peer-to-peer connections over the public internet.


Let us talk about networks and how they fit into the real-time communication challenges that we’ve got. Here’s what we’re going to do in this lesson: Understand networking basics, what it means to use digital networks, and review the effects of media quality.

Read the full transcript

Okay, we’re going to look at what is going to affect the media quality in our networks. First thing we need to understand is bitrate and bandwidth. Okay. When we’re talking about bitrate, we’re talking about how many bits per second can we send or receive over the network. This number changes and fluctuates over time dynamically. There is a different number or there’s a different value for incoming and outgoing. When I send data, there’s an amount of bitrate that I can send in when I receive data, there is an amount of bitrate that I can receive, and they’re not the same. So, bitrate is a given point in time and the number of bits that we can send bandwidth is an answer to the question: What is the available bitrate?

In real-time communication networks, usually what we will see is that there’s an estimate of what the bandwidth is. Then if the platform or the solution will try to use bitrate that is either up to that point or below that point, doing anything above that means that we’re going to lose some of the data. Why do we need all that? Well, the answer is that the networks that we use are messy and unpredictable. If I’m going to send packets to someone, I’m going to send packet 12345. Some of them, as in this example, might get lost. What we see here that packet number two got lost along the way who might have packet duplication. Okay, the same packet, packet number two here, was sent twice. There might be reordering of the packets. Here, what was sent was 12345. But what was received on the other end was 14325. So, someone might need to reorder these packets if the order is important to us.

Then there’s jitter. I’m sending packets at the given time frame or with a given latency between each packet. But they get received on the other side with a bit of jitteriness, not exactly at the exact differences in time in which I’ve sent them. So, what do we do about these problems? So, what can we do? So, here’s a few of the traditional ways that are the most commonsense ways of dealing with them. The first one is, well, let’s do some delivery acknowledgement. That’s how the internet solves this problem.

We’ve got two machines here, the first machine on the left is going to send packet number one. Okay, the second machine that received the message is going to send the result—a reply. It is going to say, “I’ve got packet number one.” This is an acknowledgment. Another solution is to retransmit it’s part of the first problem or the first solution to see that acknowledgement. The first machine is going to send the packet, this never got to the other end. At some point in time in the future, it’s going to see that it got no acknowledgment and it’s going to retransmit the first message and then it will get the acknowledgement that it wanted.

So that’s acknowledgments and retransmissions. How do we deal with ordering? Well, the answer is we put some kind of a number a sequence number in the messages. So, if we’re sending messages one, and message number two, well, the receiving end is going to receive message number two before message number one, and it is going to reorder them before passing that to the application above when we’ve got message number three that will arrive and will be received. So, what have we seen? We’ve seen delivery acknowledgments, we’ve seen retransmissions, and we’ve seen how we handle order delivery. Now the problem is that these require time. Okay, I need time in order to wait for an acknowledgment to do a retransmission to order things. But the problem is that time is exactly what we don’t have when we’re talking about real-time communications or real-time engagement. So, in these cases, besides reordering, we use other solutions. And the mechanisms that we see in real time communications are first and foremost well, packet loss concealment.

If I’m sending packets, you know and the other side is receiving packets, packet number 123456. Then packet number seven got lost, packet number eight also got lost and then we received packet number nine. What do we do? What we need to do is use PLC, packet loss concealment. We’re going to conceal the fact that we couldn’t receive packet seven and eight. So, if this is voice, for example, when I might take packet number six, and then replay that again to the user, maybe reduce the volume a bit or try to extract extrapolate what should be there. If we’re doing video, we might just drop that frame or do something else. Okay, all of these solutions of dealing with what happens when we didn’t receive a packet, and we don’t have that packet anymore, is packet loss concealment. And in this case, it’s the responsibility of the receiver after the fact… after he saw that the packets were lost.

There’s another solution called FEC for the error correction. What we do here, we send the packet more than once. So, at the beginning here, I send packet number one, then I send that packet number two, but also retransmitted packet number one, that’s for the correction that I’m sending packet number three, with packet number two, and one, when I go on like this, here, what I do is I simply use redundancy encoding, and I’m sending each packet three times. Okay, it takes more bitrate, and will use more of the bandwidth that I have available. But it means that now if packets are lost, like we see with, you know, some of the packets here in the middle, I can still solve that problem. So, receive packet number one, in packet number two and packet number three, that’s fine, then I find out that there’s packet number four somewhere in the future. So, I take that one and five I got there as well. And from there, I can continue with the rest of the packets. What’s different here between from packet loss concealment is that I’ve got all the packets I just need to find them. And this responsibility here falls on the sender before the fact I’m going to send more that I need in order to keep robustness in the network.

The different characteristics that are going to affect the quality of our solution or of the media that we’re going to send. The first one is bandwidth, how much bandwidth we have available; the more bandwidth, the higher the quality should be. Then there’s packet loss. With packet loss, it’s the opposite; the more or the higher the packet loss, the worse the quality is going to be, because we’re going to lose some of the packets and we’re going to find or need to devise ways to overcome that. And the third one is latency and jitter. Here, it means the time it takes for a packet, from the moment I send it until it gets received on the other end. The longer the time, the harder it is for me to do things in real time because of this latency. And if I want to talk about latency, we need to talk also about distances. Distances is going to play a huge part and a huge factor in the quality of the media that we’re going to have. Let’s say that we have a call. For that call, I’ve got someone from India. And that someone allocated the machine or the media server that he’s going to be used for the session. We’ll talk about a lot more about media servers later. But you know, we got the allocated machine closer to us, because it makes sense. And now the other people join this meeting all of them from the US. If they joined the exact same media server that I have, they’re going to have a lot of distance between them to the media server, which is going to add to the latency and the packet loss. So, we need to be able to adjust for that or to do something about that. Another problem that we have is network types. The type of network that we’re going to use is going to affect quality. If we have an ADSL or Wi Fi, that’s going to be different than Fiber to the Home (FTTH) or connecting directly to the Ethernet or using cellular networks. Okay, so the network we are on is going to play a factor and we’ll need to test and optimize for that as well.

So, what did we see? We saw that networks are messy and noisy. We’ve got packet loss reordering, jitter a lot of different aspects that affect our networks. Many of these network related factors are out of our control. We don’t control where the user is, where he’s located, what network he’s on, okay. And these are going to affect media quality. What we’re going to see in the next lessons in the future is how we deal and overcome these issues of the network. Thank you.