The Past, Present, and Future of WebRTC

This blog was written by Shriya Ramakrishnan, an Agora Superstar. The Agora Superstar program empowers developers around the world to share their passion and technical expertise, and create innovative real-time communications apps and projects using Agora’s customizable SDKs. Think you’ve got what it takes to be an Agora Superstar? Apply here.

Real-Time Communication (RTC), by definition, is a simultaneous information exchange process between point A to point B with negligible latency. In the RTC industry, WebRTC is definitely a pioneer technology.

WebRTC (Web Real-Time Communication) is an open-source protocol pioneered by Google for in-browser RTC. Later, it went on to be standardized as a part of the browser spec by the World Wide Web Consortium (W3C). As the name goes, it was created as a real-time communication tool for one to one video/audio calling or transmission of any kind of data over the web. Many technology experts even boldly predicted when it was first released that WebRTC would become a breakthrough in video communication technology. WebRTC gradually gained more popularity and fame over the years. If you are seeing apps using web browsers to make video calls, chances are they use WebRTC to power them. WebRTC unlocks the possibility that people can create any video chat app they like. The freedom that WebRTC grants makes the technology highly competitive against traditional video chat apps like Skype.

Usually, WebRTC provides scenarios for people to play around video chat features. For example, integrating WebRTC on a contact sales page, rather than asking customers to reach out via email or phone, instantly connects sales representatives with customers through video calling on that page, increasing the customer engagement and decreasing the churn rate.

But today, WebRTC has grown to much more than just video or audio calling. It is presently used for a plethora of use cases, including but not limited to:

  1. Gaming: WebRTC is the mode of communication in many gaming and eSports applications. eSports applications are forums where people play sports professionally. Several famous telecommunication applications’ screen sharing ability has been used for the same technology but their low frame rate and use of older technologies, like traditional IP Telephony or outdated RTMP, renders the service inefficient. WebRTC performs transcoding in order for wider connectivity with various VoIP applications and for broadcasting. WebRTC usually uses a STUN or TURN server along with RTCPeerConnections and RTCDataChannels for achieving communication. We will elaborate more in the next section.
  2. File-Sharing: RTCDataChannels are used by several file-sharing applications, an example of them being ‘ShareDrop’. The app lets you share files with others in the same network. There is also an extremely popular concept known today as WebTorrents. This is, in essence, a peer-to-peer data sharing network implemented over WebRTC.
  3. Internet of Things: Several sensors in IoT projects require data transmission and this can be achieved using WebRTC effectively by:
    1. Reducing ICE (Interactive Connectivity Establishment, a standard method of NAT traversal used in WebRTC) connectivity checks
    2. Sending arbitrary data reliably by encrypting it with standard AES (Advanced Encryption Standard) Encryption
  4. Machine Learning: WebRTC is used for capturing raw media from the media inputs and transmitting them over the web (MediaStreams and RTCPeerConnection) for processing and gaining inferences and sending the data, i.e. the inferences, back to the user using RTCDataChannels. Another ML use-case being obtaining usage statistics of a certain commodity over WebRTC and performing analytics on it and sending back the inferences.

How does WebRTC work?

In today’s world, communication has been made possible using the internet. Earlier we used to use IPV4 (Internet Protocol Version 4) for every device which only has a 32-bit address space. As the number of devices connected to the internet grew, exhausting this address space, we switched to a much larger 128-bit address space with IPV6. But this puts an excessive overhead of packets carrying the real-time data (128 bits instead of 32 bits). This gives rise to NATs (Network Address Translation). NATs provide a single external IPV4 address to the nodes inside a local network that uses IPV6. This method allows much of the communication to take place via IPV4 whereas the nodes themselves use IPV6, making it a win-win.

This also provides a layer of security to the nodes in the local network from unknown nodes outside the network. Only an authorized node that wishes to communicate is given the internal IP address maintained in what is called the ‘NAT table’.

STUN (Session Traversal Utilities for NAT) servers are designed to perform a lookup on the NAT table and translate the IPV4 to IPV6 and vice versa. The STUN server helps obtain the IP: port (external/public address) for communication with a peer. After obtaining the IP: port with the help of a STUN server, the address is sent to the peer using signaling and a connection is set up using ICE (Interactive Connectivity Establishment) Negotiations.

RTCPeerConnection is an interface in WebRTC that represents a connection between the local computer and a remote peer. This connection by default sets up a connection over UDP. If that fails, it is done over TCP and as a last resort TURN (Traversal Using Relays around NAT) servers are used to relay network traffic.

ICE is used for finding the best path for transmission between peers. The peers exchange information about the network connection (UDP[preferable]/TCP/ TURN server). These information exchanges are called ICE candidates.

Signaling is the process of discovering potential peers. ICE Negotiation is done after signaling when the potential peer has been found. WebRTC is made secure by a few procedures:

  • All data transmitted via WebRTC are mandatorily encrypted using standard AES (Advanced Encryption Standard) encryption which is the default cipher via SRTP (Secure Real-Time Transport Protocol), which is the secure extension for network protocol designed for multimedia telephony along with DTLS (Datagram Transport Layer Security), which provides a secure communication protocol to prevent eavesdropping, modification, replaying and other such security attacks on datagrams.
  • Every node needs to grant access for data transmission, hence it is not arbitrary data transmission.
  • WebRTC is sandboxed and has no plug-ins. Plug-ins can be dangerous as they may have malicious extensions.

Let’s take a look at how you can build your own ‘vanilla’ WebRTC app:

Step 1:

The first step is to get the local stream by obtaining permission for MediaStream using MediaDevices.getUserMedia().

MediaDevices.getUserMedia() :

This method prompts the user for access to devices such as the webcam, microphone, and permission to carry out screen-sharing. The user selects the devices he wishes to grant access to using ‘allow’ or ‘decline’ for audio, video, and screen-sharing.

The function returns a Promise that resolves to a MediaStream object. If the user denies permission, or matching media is not available, then the promise is rejected with NotAllowedError or NotFoundError respectively.

The above code written within a function called start() displays the local stream on clicking the start button to start the call.

Step 2:

The next thing to do after obtaining the local stream is to connect to a suitable peer (found by signaling and connected to by ICE negotiation).

An interface is set up between the local computer and the remote peer known as the RTCPeerConnection.

To connect to the peer we click on the call button. The call button triggers the ‘call()’ function. This function carries out the following process (represented diagrammatically) after which it obtains the remote stream and displays it to the local user and displays the local stream to the remote user.

The overall contribution of RTCPeerConnection looks like this (src:

The RTCPeerConnection has several functions such as:

  1. Signal Processing: It focuses on analyzing, modifying and synthesizing signals such as sound, images, and biological measurements.
  2. Codec Handling: Finding the best-suited codec for media transmission (codecs: vp8/h.264).
  3. Peer-to-peer Communication: Connecting to the peer.
  4. Security: Ensuring a secure connection (AES Encryption).
  5. Bandwidth Management: Ensuring minimum bandwidth consumption.

Step 3:

Finally, after the call, to terminate it, we simply close the RTCPeerConnection using the .close() function on the connection object.

Which brings us to the end of the sample app!

For the full code click here!

What are some limitations of ‘vanilla’ WebRTC?

  • Higher Learning Curve: WebRTC is a difficult protocol to start working with, even if one is a good JS developer. It requires ample domain knowledge and expertise on video solutions, infrastructure, etc. to build a scalable solution.
  • Quality of Experience: WebRTC transmits the video/audio data through the internet. Since the Internet is a public domain, the quality of experience is hard to guarantee. Although most of the time it can maintain good quality, there will be many instances where you have poor video quality experience, such as high latency and echo.
  • Scalability/User Limit: Scalability is very poor on group video calls when we use ‘vanilla’ WebRTC due to the peer-to-peer nature of WebRTC. This means that with every new participant on the video call, there would be a steep, excessive amount of ingress and egress on every node, slowing down/ eating up the bandwidth of the browser client. This means lower bandwidth than what is possible.
  • Data Privacy: WebRTC uses public domain internet to connect two end-users. Because of the non-pluggable nature of WebRTC, it does not support the use of third-party services where users can implement data protection procedures on every node of the network. This is especially important for industries that are often dealing with highly sensitive information, such as health care and finance industries. If your organization involves several regulations, such as GDPR or HIPA, you may want to consider other reliable solutions. In addition to this, there have been many cases where the P2P nature of WebRTC is exploited to gain access and expose sensitive information like protected IP addresses behind a VPN. (
  • Cross-Platform Limitations: Although WebRTC was originally built to transmit video/audio over the Internet through browsers, there are some bindings available today for Android and IOS. But these bindings don’t have first-class support and are mostly community-driven. Also, 1:1 feature parity may not be available between Web and Native.

How has Agora solved these problems?

  • Learning Curve: Agora provides a wrapper for users to easily use WebRTC. Agora has made WebRTC comprehensible to everyone with simplified functions and extensive, instructive documentation. As traditional WebRTC would require you to maintain TURN/STUN servers for relaying data and obtaining IP Addresses respectively, Agora’s WebRTC does all of this under the hood, leaving the user with very little to do. The user has no hardware over-heads to manage, making the process hassle-free. This drastically cuts down the cost and complexity of implementation. All that the user requires is to sign up for an account with Agora and obtain the unique App ID generated to utilize its services.
  • Quality of Experience: Agora has 200+ data centers distributed around the globe dedicated to processing real-time audio and video data. These data centers run the proprietary, high-performance SOLO and NOVA codecs for audio and video to ensure smooth and fast transmission of data. Agora also has intelligent dynamic routing algorithms that ensure milli-second latency and exhibit extreme resiliency towards packet loss. This overlay network known as SD-RTNᵀᴹ (Software Defined Real-time Network) is a virtual and UDP (User Datagram Protocol)-based network architecture designed specifically for real-time communications. By deploying software networking units, which work in synergy with one another, at different data centers across the Internet, Agora has managed to add a virtual layer. To ensure stable transmission and low latency, particularly on weak networks, the SD-RTN™ automatically assigns an optimal path according to the following node conditions in real-time:
    • Transmission status
    • Load conditions
    • Distance to the users
    • Response time
  • Scalability/User Limit: Agora is built to scale and serve millions of users’ real-time video content. Agora supports 17 active co-speakers and up to one million passive listeners (it can also scale up based upon request). Redundant connections are avoided by utilizing a channel-based architecture where all peers obtain data about one another rather than the mesh architecture of connections shown below for traditional WebRTC.

  • As shown in the above diagram, the bandwidth consumed while using Agora is much lower comparatively as the number of participants scale up.

    While in vanilla WebRTC, egress scales up with additional participants (n-1), it remains constant when using Agora. In other words, you publish your video only once and not once per every participant in the video call. This ensures that you have extra bandwidth to accommodate additional users.
  • Data Privacy: Agora provides a high level of security in any application by allowing the developers to maintain a self-hosted open-source token server, and also supports end-to-end encryption of every video packet using algorithms like AES128XTS, AES256XTS, AES128ECB. Agora is also fully GDPR and HIPPA compliant making this an ideal choice for sensitive businesses, such as healthcare companies.
  • Cross-Platform Limitations: Agora has also made possible the use of this technology on web browsers, Android, iOS, Unity, Mac, Windows, and Linux, among others. You can learn more about the available SDKs here.

Now, let’s run a quick demo of WebRTC with Agora!

Step 1:

The first step to building a video call application is to create and configure a client. ‘Configure’ here means to define whether the video is a ‘live’ broadcast where the host sends and receives voice/video, while the audience can only receive voice/video, or set to ‘RTC’ to signify that the client is set to communication and this is typically used in one-to-one calls or group calls where all the users in the channel can talk freely. The codec stands for encoding-decoding and is a software used for compression and decompression of a digital media file. We specify the codec standard under ‘codec’. Here we are using the ‘h.264’ standard which is a highly efficient standard for media files.

We now create a function named ‘myfunction()’ which will be called by the ‘join’ button and is put into action on clicking the ‘join’ button on our HTML webpage.

This function contains the major portion of the execution.

‘handlefail’ — this variable has been defined with a function whose sole purpose is to log an error in the console every time that an error arises.

‘remotecontainer’, ‘appid’ are variables that are created to manipulate the element with ids ‘remote’ and ‘appid’ respectively.

‘addremotestream’ this function is used to add a new child div tag to the parent remotecontainer div tag and display the video of every new user in the channel on your webpage.

For one’s video to be published into the stream, the client object is first initialized using the ‘.init’ function. We pass the App ID as the unique identifier for every client and a logger function to track the activity and flow of the program in the console, as parameters to this function.

Step 2:

After the client has been initialized with an App ID and the channel name, the ‘.join’ function is used to add the client to a channel as specified as one of the parameters of the function along with the logger function.

Now, the local stream (‘localstream’ variable holds the configuration’) is created using the ‘.createStream’ function. This function is used to indicate the need of audio, video, screen(-sharing) and passes on a unique id for the stream to be identified.

Following the creation of the local stream, it is initialized by a function which logs the activity of the initialization in the console and a ‘.play’ function to play video captured from the webcam of the local user in the local stream in the tag specified as an argument passed to the ‘.play’ function.

The client then publishes, i.e. shares, his local video to the rest of the channel using the ‘.publish’ with the local stream and a logger function (‘handlefail’) as parameters.

Step 3

Now that the local stream (our stream) is created and published, how do we see the video of the other users in the channel?

For this we have enabled event listeners.

The ‘stream-added’ is an Agora defined event. Every time another user creates and publishes his stream into the channel, this event is triggered and goes from passive to active.

The ‘.on’ function listens to the change of an event (active to passive) that is passed as its argument and informs the object (client object here) that it is called by. Along with that, it takes another argument i.e. a function here that ‘subscribes’ to the new stream in the channel using the ‘.subscribe’ function.

The next event listener listens to the event ‘stream-subscribed’ and does the job of adding the new stream to our webpage and playing the video published by the remote user for the local user.

‘removeVideoStream’– this function as the name goes is meant to remove any video stream from the webpage and stop the viewing of that stream.

We again use another pair of event listeners to do this.

The first listener listens to the event ‘stream-removed’ this event becomes active when the remote stream is removed; for example, a peer user calls Client.unpublish to remove his stream from the channel. The listener then enforces the removeVideoStream.

The second event listener listens to the event ‘peer-leave’ this occurs when the peer user leaves the channel; for example, the peer user calls Client.leave for the client to leave the channel. The listener then enforces the removeVideoStream.

The number of lines of code for the same functionality is lesser using rather than using just webRTC. This brings us to the conclusion that carries out the same task with greater ease and with lesser tasks for the user to perform in terms of hardware maintenance and software implementation as afore-mentioned.

For the full code, click here.

By now you should have a comprehensive knowledge about WebRTC and how this pioneering technology has opened up the possibilities of data and media transfer over the public internet. As with many open-source projects, there are known limitations, especially for a WebRTC novice. The Agora platform offers a cost-effective alternative solution with proprietary networks, and reliable encoding/decoding technologies, professional enterprise-level technical support, and dedicated teams that you can trust for your project or application.

More importantly, it is FREE to start. You are guaranteed to receive 10,000 minutes of free EVERY MONTH.

Add high-quality voice, video and streaming to any app with ease.