Skip to content
What it takes to build real-time voice and video infrastructure

In this series, WebRTC expert Tsahi Levent-Levi of BlogGeek.me provides an overview of the essential parts of a real-time voice and video infrastructure—from network to software updates. Check out his informative videos and read how Agora’s platform solves the challenges so you can focus on your innovation.

4.3 Broadcast

Watch time:
Category: Chapter 4: Media Processing

Learn what broadcasting is and what it means to broadcast live and interactively.

Dive deeper: Check out Agora’s Developer Resources to for tutorials on how to stream interactive live audio and video.

Transcript

When we’re talking about media processing, we need to also talk about broadcast. Here’s what we’re going to do in this lesson: we’re going to understand what live means in broadcasting, because that’s what we’re going to deal with in real time engagement, and we’re going to list the various new live broadcast scenarios that are available out there. 

Read the full transcript

One thing to remember, when we look at broadcasts or just processing, the faster that we want to go, the less latency we want to have in a solution, then the more complex a solution is going to be. In this graph, we’ve got the time—the latency that we want to deal with. From the point in time, we have the data until it gets viewed on the other end. On the y axis, we’ve got the complexity—the higher the value, the bigger the complexity. If I want to place here, what we deal with in broadcast, then we’ve got the sub second approach here and then one to five seconds. If we want to look at real time engagement, then it will be here, within the sub second domain. So, we’re going to need to deal with a lot of complexity, because we want lower latency of sub second.  

One more thing is that the bigger the audience, the higher the complexity. This is a component of component of broadcasting live broadcast. So, on the x axis, we’ve got audience size, and on the y axis, we’ve got complexity. Again if we grow the audience size, the complexity is going to rise with that.  

Let’s see what type of use cases and scenarios we’ve got out there. Besides the broadcaster where there’s one live person speaking, we have multiple broadcasters. Here we have three people in green boxes on the left-hand side, and we have the viewers on the right side. The green boxes speak to each other. This is usually done today using the SFU, selective forwarding unit where we route media around. So, each user here, each broadcaster is going to send its data to our media servers in the cloud. The media servers in the cloud are going to route that data between the three people so they can speak to each other. So, each one has one outgoing stream of his own data and two incoming streams, one for each other person in the session. 

How we’re going to send all that to the viewer? One approach is to send a single media stream towards the viewer. So now the media servers in the cloud will need to mix the data from all three participants that are speaking, it’s going to get mixed to a single stream, and then sent out to the viewer. Another approach would be to send it the same way as we do with the broadcasters themselves. We’re going to send multiple streams, one for each participant that is speaking. Now the viewer has access to all of these different data streams. This is a more flexible approach. It’s a bit more complex to get it done and it’s more flexible, because now I can change the layout per viewer differently than I’m doing to the others. And I can deal with different bitrate. Different bitrates I handle using simulcast usually, where the cloud services access to multiple bitrates that are received from the broadcaster’s directly, or it can generate its own its own. I’ll have something like low bitrate, medium bitrate and high bitrate. Each one can go to different viewers depending on what the viewer has, what type of a device it has, the CPU that it is running the display size, the network capabilities, okay, all of these are going to be factored in. The cloud will decide what to send per user to give him the best experience.  

Now the thing is that this decision is not a one time decision per user. It needs to be adaptive and dynamically change over time. So, during the course of a call—in the course of a session, I might start by having a medium medium bitrate central user, then switch to high bitrate, when the network allows you to then move back to medium bitrate, if the network is now poor, or something else is running on the machine of the user that is viewing the data. So over time, we need to adapt live adaptively, play with the bitrate and change it in order to get the best quality for each and every viewer.  

Then want to think about the backend and how we’re going to scale it. Let’s say that I have a broadcast and in this a live broadcast I have 500 people viewing. So, you’ve got 500 or 200 viewers that watch the same stream. I can definitely do that with a single media server. One stream, put it on one media server, send it out to everyone. That’s a done deal, that can be done. But what happens if what I have is 10,000 viewers for the same stream, or I want to have multiple sessions being viewed in parallel? All of that brings us to the question of scaling. How do we scale such a back end? The solution for that is cascading.  

We take our machines, our media servers and would cascade them to one another. So, the one on the left here, sends the media routes—routes media towards other media servers, that then in turn can send it to the viewers, or send it to another tier of additional media servers. And it all depends on the scale that we want to provide.  

The other thing that we need to think about when we talk about scaling is geographic location. Where exactly are our viewers around the globe? Wherever they are, we would like to get our media servers as close to them as possible. Why? Because then we can control the latency and the quality a lot better than if the machine is on one end of the globe, and the viewer is on the other end of the globe, and the data is sent over the open Internet. We have no control over the routing in that way, so, we’d like to be closer to our users. We need to understand where they are and launch machines and add them towards the broadcast dynamically as as the broadcast evolves with the users joining.  

To summarize, when we look at live broadcasts, we need to clearly define the requirements that we’ve got. How many viewers do we want? How many broadcasting sessions you’d run in parallel? Where are our viewers? Where are broadcasters? Okay, how do we want to give the experience there singly stream going through to everyone multiple streams for more flexibility? What kind of adaptive bitrate mechanisms are we going to offer? All of these requirements needs to be written down and factored in to the decisions that we’re going to make. Once we have these requirements, we’re going to put a thought into our infrastructure architecture that we are going to have. The way we design and building infrastructure is going to affect the media quality and user experience that our end users are going to have. Thank you.