Skip to content
What it takes to build real-time voice and video infrastructure

In this series, WebRTC expert Tsahi Levent-Levi of BlogGeek.me provides an overview of the essential parts of a real-time voice and video infrastructure—from network to software updates. Check out his informative videos and read how Agora’s platform solves the challenges so you can focus on your innovation.

6.1 Signaling

Watch time:
Category: Chapter 6: Signaling

Learn about the differences between messaging, signaling, and the requirements for each.

Dive deeper: Agora provides cross-platform Real-Time Messaging and Chat SDKs.

Transcript

One of the things that we need to think of in signaling is the feature list. Here’s what we’re going to do today, we’re going to understand the difference between signaling and messaging. We’re also going to review the various features that are needed, especially when we talk about signaling. 

Read the full transcript

So, signaling or messaging, what is it that we really need or deal with? If you’d take a media session, then we need certain types of messages, or signals to be passed in order to succeed with running that session. First and foremost, we need session negotiation, the ability for me to join a session—what are my capabilities, you know which codecs I’m supporting, If I want to do a voice call or video call, who do I want to reach out to, etc. In large sessions or groups, there’s also the matter of joining and leaving the session. Then there are media updates. I’m muting my audio, I’m adding audio removing audio, muting all of these things that happen throughout the session itself that relate to the media negotiation. In most cases today, we also have chat—the ability to send text messages or reach messages between participants in the session, either publicly global to all participants in the session, or privately to someone in the session. Sometimes, we’ll even send chat messages to a smaller group or within the bigger chat. Think about hosts in a large broadcast. They want to synchronize between themselves only.  

Then there are the type of signals or messages that we need to send before the session. Things like registration. I want to register to the service, I want to say that I’m online and present. This is important if I want someone to be able to reach me in the service. Sometimes we’ll want scheduling, the ability to schedule a session in advanced.  

There is a part of messaging or signaling that comes after the session. Things like collecting the summary and spreading it out among the participants, follow ups, the history of the session, all of these topics. There are also a lot of other things that are unrelated to the session itself, that we still need within our real time engagement platform. The main thing here is probably chat. I want to be able to chat with someone over text, or synchronously without having a media session at all. I don’t want audio, voice and video most of the time, but then I might want that at some point in time in the future. So, there is no media session, but I still need messaging. 

 The other types of messages and signals that we would like to pass on top of all these and these are related to our specific application. If this is a healthcare application, for example, we might want to pass telemetry of, you know, media related, health related metrics that we are collecting through a sensor. If this is an education platform, then we might want to deal with whiteboard collaboration and that goes through messaging as well, the collaboration part. In a way, the session signaling is a small part of what messaging requirements are, in real time engagement platform, there is a lot more to it than just opening up a group call and starting a video. Let’s see what exactly messaging means—what are the atoms or primitives that we need? We’ll look at the basics.  

At the heart of it, I want to be able to send and receive messages. But we’ll start to unravel it and sending can be sending to a specific device. I can decide that sending goes to a user that has multiple devices and he is registered using all devices. So which device am I sending? Or which will user am I sending this to? This can all be also be sending to a channel, to multiple users at the same time. There is presence and with presence I can say things such as being online or offline, am I registered to the service and reachable now?Also the status such as being in do-not-disturb mode.  There’s also presence in terms of membership, which groups sessions channels am I apart of or am I present in the moment. 

We’ve got the story forward problem or challenge with messaging. Usually, in the best case scenario, we’re going to have these two devices to send a message from one device through the server to the other device. Everyone is happy and the message was received on the other end. But what happens if the device we’re sending the message to is currently offline, and cannot receive that message? The server might be might be willing to store that message. For that message, once the the other device is online again, another challenge might be when I’m not connected to the server at the moment. Probably because I don’t have a network.It happens to me all the time for example, we’re going to go from my apartment to the basement where the car is and the elevator has no reception, but at that point in time is usually when I will start messaging people because I’m there and I can start synchronizing things that I need to do. In that case, I would like the RTE platform to be able to store the messages locally and when the network comes back online, send these messages for them to the server and from there to the other participants.  

After we’ve got the messaging platform with all of these features, we need to think of what items we need on the user interface. I’m sending a message—is this a chat message? Is this some kind of a negotiation message that goes between me to the other participant when I’m starting to make new calling? What is it exactly, and if we look only on the chat messages and what they have to do on the user interface, we’ll see a lot of things. There is typing indicator. When someone is starting to type, you might want to show that to the others that he is typing. I might want to add read receipts, the ability to know that the second person or that the people in the group have received and read my message.  

There might be a need to do threading in replies to messages. I might want to allow editing or deleting messages after the fact. I might want to add things like support for rich text and emojis and images and sharing files. All of these things are application-level capabilities that sit on top of the baseline messaging function of me sending something and you receiving something. You need to decide which ones of these your application requires and then see how you’re going to implement them.  

To summarize, signaling is bigger than messaging. We can deal with chat messages, we can send emojis, but we need to signaling a lot more than just messages in a session. There’s a lot more to it than just being able to send a message and receive it on the other end. So, bear that in mind in list the requirements that you will need in your messaging and signaling solution. Thank you.