Ben Weekes is a Senior Architect at Agora. An original pioneer and innovator of WebRTC technologies, Ben was Founder and CTO of Requestec which was acquired by Blackboard in 2014 for expertise in WebRTC conferencing. After serving as Chief Architect at Blackboard, he joined Agora to continue his journey in pushing the boundaries of video telephony.
I recently set out to see if I could tackle the problem of displaying large video call grids in a browser environment. Until now, the general consensus has been that grids as large as 7×7 can not be done—at least not in a way that would deploy reliably across a range of devices with varying CPU/GPU capacity. In this blog and the accompanying video of my recent Kranky Geek session, I outline the challenges associated with large grids and demonstrate a solution.
This type of large WebRTC grid could be very useful for events where the energy of a large group has the potential to contribute to the experience. Audience “fan walls” for virtual concerts, talk shows, or other live programming come to mind as well as large company meetings. These types of walls or large video grids are typically accomplished by compositing all of the video boxes into one stream, instead of including each individual live video stream. The technology has long existed for prioritizing the audio stream (speaker recognition and following), but processing high numbers of video streams poses a number of significant challenges.
There are three primary challenges with displaying large grids (where each participant maintains full interactivity):
- Large multi-stream grids require a lot of CPU/GPU to decode
- Browsers don’t provide much information about the hardware they are running on and there are no real-time CPU/GPU stats
- Network conditions are constantly changing
Let’s take a look at each of the solution based on each of the challenge areas:
The secret to solving processor load issues is to ensure you are not delivering any more pixels than are required. The individual streams should be adjusted on-the-fly in order to maintain an overall resolution of 720 x 960 regardless of the number of participants displayed.
Consider these two examples:
While browsers don’t provide information about the underlying CPU/GPU, it is possible to deduce when a device is beginning to struggle and take immediate action. This can be done by monitoring the volatility in the render (display) frame rate of all the remote streams together with the volatility in the outgoing frame rate from the local camera stream encoder. When a device starts to run out of CPU/GPU, these volatilities increase and it is possible to detect and react to a struggling CPU/GPU before a user notices.
Changing Network Conditions
We can monitor real-time estimations of available downlink bandwidth, along with the average NACK rate, to stay ahead of changes to stay one step ahead of changing network conditions. It is possible to stay very close to the congestion limit without exceeding—again reacting before a user notices.
In this video of my recent Kranky Geek session, I demonstrate the strategy I have outlined here:
To learn more about this large WebRTC grid experiment—including the open-source algorithm that I wrote to make this possible, check out the advanced SDK on GitHub here: Agora Multichannel SDK.