Skip to content
What it takes to build real-time voice and video infrastructure

In this series, WebRTC expert Tsahi Levent-Levi of provides an overview of the essential parts of a real-time voice and video infrastructure—from network to software updates. Check out his informative videos and read how Agora’s platform solves the challenges so you can focus on your innovation.

3.3 Interoperability

Watch time:
Category: Chapter 3: Clients

Learn about client and device interoperability and why it is such an important consideration in the delivery of real-time experiences.

Dive deeper: Read how Agora makes it easy to build apps with interoperability.


Let’s talk a bit about interoperability. What we’re going to do is to first understand what exactly is interoperability. Then review interoperability in the scope and context of real-time engagement. 

Read the full transcript

If we want to talk about interoperability, we need to start from standardization. Standardization for me is having a bit that fits the screw. We’ve got the box of bits here, and there is one screw, and we need to fit a certain bit towards the screw. So there is a standard that would say, this is how the screw looks like and then some vendor would go and build bits that would fit that screw. Interoperability is having a matching bit, and screw. So given a screw or a set of screws for some vendor, I want them to be interoperable with the bits that I’m going to use that might be provider that manufactured by another vendor.  

Let’s take it to the world of free time engagement and communications as a whole. And the interoperability dimensions that we’ve got here include the media codecs, and how we’re going to fit them, we must have both devices know at least the same media codec in order for them to communicate with each other. We’re going to look at client to client behavior, client to server behavior, and devices as a whole. Let’s start with media codecs.  

First and foremost, we want two things. The third is that we want both clients to be able to use the same media codec. If they don’t use the same media codec, they can’t communicate. But we also want the implementations of these media codecs to be interoperable. It’s not enough for me that both client support H264, which is a video codec. I wanted the encoder on one end. Okay, we’ll be able to send data that the decoder on the other end, will actually understand I need the implementations of these to be interoperable.  

Why am I saving that and putting my finger on that specific topic? Because we have two types of codecs: software codecs and hardware codecs, the implementations there are not the same. The software codecs are more flexible, because, hey, it’s software. And the other ones come with a device, we have no control over them. Software is going to require us more CPU. But it’s in our control and for hardware, it’s out of our control. So, hardware implementation of a codec might include or force the inclusion of high profile H264. Or it might not support high profile rate H264. Some hardware codecs have issues and challenges in decoding low bitrate, or encoding low bitrate. So, there are different challenges here when we use hard recordings that don’t exist in software codecs, but for software codecs, the challenges we’ve got our memory and CPU issues. We need to take care of that in our implementation—the media codecs.  

Then there’s client to client behavior. Let’s say that we’re using browsers and we have two browsers running one chrome and the other Safari, you need to understand the different browser implemented in browsers are implementing WebRTC. differently. So, the behavior of these browsers is different as well. If we want to make sure that the browser’s are capable of communicating with each other, and exchanging that WebRTC information for our real-time engagement application, then we need to put some kind of shim of an abstraction layer on top of it, that would allow the behavior to work well between the two. Because there are differences that are nuanced, but they’re going to cause our application to break. So, we need to take care of that.  

Then there is client to server behavior. We have different browser and muniment implementations and these implementations are going to also behave differently in front of our servers, both the signaling servers and the media servers. So again, we need to take care of that. Whatever we put on the infrastructure, when we deploy our service needs to be tested and thoroughly tested against different browser implementations. That means different browsers Safari, Firefox, Chrome, but also different versions of the same browser. A newer browser version might be have differently than an old one for the exact same browser. 

The other angle that we have are devices and we’ve seen that in the past. Devices are vastly different from one another, especially the mobile ones. They have different hardware, they will run on different networks, they offer different resolutions of the screen and the camera, and they offer different performance. Different CPUs, different memory—all of these differences need to be factored in. So, if you want to support more devices willing to test against more devices, for their interoperability, their ability to speak to each other in an RTE session.  

To summarize here, interoperability in real-time engagement is key. We want to allow and enable our users to use the devices that are comfortable with and that they want to use. This means that we need to know on one hand, have as many devices as possible supported. On the other hand, we want to reduce the amount of work that we need to do. This all means that we need to understand and figure out what our clients want to use. Create a kind of a matrix of the client devices that we want to support, and then test all the permutations in there and prioritize those that are more important to us and that are more are more common in our use cases with our users. Thank you.