Skip to content
What It Takes to Build a Real-time Voice and Video Infrastructure hero

What It Takes to Build a Real-time Voice and Video Infrastructure

By Author: Patricia Finlayson In Business, Developer

In today’s digital-first business landscape, real-time communication (RTC) via voice and video are cornerstones of immediate and impactful communication and user engagement. From live customer support to interactive live streaming, low-latency, real-time experiences are growing in importance across industries.

Building your own infrastructure with WebRTC

When looking to implement real-time communication functionality into applications, companies are faced with the decision to build out real-time communication infrastructure or find a service with existing infrastructure.

WebRTC (Web Real-Time Communication) is an open-source protocol developed by Google that enables real-time communication for web applications. This versatile technology allows developers to build robust voice, video, and data-sharing solutions, making it a foundational component for many real-time voice and video experiences.

While WebRTC is relatively easy to set up and can support some small-scale use cases, it becomes increasingly complex as it scales. Beyond the architecture, developers building WebRTC backends face the perfect storm of technical hurdles like handling and mitigating buffering and latency while optimizing video quality. Any one of these can, on its own, require complex logic and development, and any issues can severely impact user experience.

Let us review some of the challenges developers face when building a DIY WebRTC backend solution.

Addressing Technical Hurdles


Businesses have limited control over internet routing, making it difficult to ensure smooth video chats across different ISPs. This lack of control stems from the complex and decentralized nature of the Internet, where data packets traverse through multiple networks, often managed by different Internet Service Providers (ISPs). Each ISP may have its own routing policies, traffic management techniques, and peering agreements, which can impact the quality and consistency of data transmission.

As video and live streaming require stable and high-bandwidth connections, any disruption or bottleneck in the data path can result in latency, jitter, and packet loss, leading to poor video and audio quality. Additionally, ISPs might implement traffic shaping or throttling, further complicating the delivery of real-time communication services. Because businesses cannot dictate how data packets are routed once they leave their immediate network, so they face challenges in maintaining a consistent user experience across diverse geographical locations and network conditions.

Agora tackles this challenge by deploying Selective Forwarding Units (SFUs) worldwide. It uses machine learning to optimize routing and ensure stable, high-speed connections through its Software Defined Real Time Network ™ (SD-RTN ™).


Selecting a video codec is a crucial decision in developing a Real-Time Communication (RTC) application, as it involves balancing multiple factors, including video quality, compression efficiency, and compatibility.

The primary consideration in codec selection is the quality of the video output. High-quality video ensures a clear, crisp, and visually pleasing experience for users. This is particularly important in RTC applications, where the clarity of visual communication can impact the effectiveness of interactions, such as in teleconferencing, online education, or virtual meetings.

Compression efficiency refers to the codec’s ability to reduce the size of the video data without significantly compromising quality. Efficient compression is vital because it minimizes the bandwidth required for transmitting video over the internet. This is essential for ensuring smooth, uninterrupted video streams, particularly in environments with limited bandwidth or varying network condition.

Compatibility ensures that the chosen codec works seamlessly across different devices, operating systems, and platforms. An RTC application needs to be accessible to a wide range of users, each potentially using different hardware and software configurations.

The challenge lies in finding the optimal balance among these factors. Agora Coding Technology (ACT) dynamically adjusts codecs based on CPU load and network conditions, maintaining high-quality video streaming despite fluctuations. We have a team of expert engineers developing upon publicly available codecs and patents constantly innovating to ensure the best quality of experience.

Device Support

The diverse range of devices and operating systems—from Intel and ARM architectures to various desktop and mobile OS—necessitates comprehensive support for seamless Real-Time Communication (RTC) implementation. This diversity presents unique challenges and opportunities that developers must address to ensure a consistent and high-quality user experience.

  • Device Architecture Diversity: Optimize the application for both Intel (x86/x64) and ARM architectures to ensure efficient performance on desktops, laptops, and mobile devices. 
  • Operating System Variability: Adapt the application to work smoothly on different operating systems, such as Windows, macOS, Linux, iOS, and Android, considering their unique APIs and libraries. 
  • Cross-Platform Compatibility: Utilize cross-platform development tools and thoroughly test the application to ensure uniform functionality across all platforms. 
  • User Experience Consistency: Design an adaptable user interface that offers an intuitive experience across various screen sizes and input methods. 
  • Performance Optimization: Implement techniques like adaptive bitrate streaming and hardware acceleration to manage resource-intensive tasks efficiently on diverse hardware. 
  • Security Considerations: Ensure robust security measures across all platforms, including end-to-end encryption and secure authentication. 
  • Interoperability: Adhere to industry standards and protocols to enable seamless communication between users on different platforms. 

Through extensive testing of thousands of different device and OS combinations in our test lab, we ensure compatibility across platforms, frameworks, and peripherals. Connected with this is that new versions of browsers often introduce features that affect WebRTC; Agora works very closely with the browser companies to beta test and apply fixes for WebRTC.


Real-Time Communication (RTC) applications rely on a complex infrastructure of servers to function effectively, each posing its own set of cost and resource challenges. These include:

Application Servers: These servers handle the core functionality of the RTC application, such as managing user sessions, maintaining state, and handling business logic. They require robust computing resources to ensure smooth and efficient operation, especially under heavy load, leading to increased costs for powerful hardware and scalable cloud solutions.

Signaling Servers: Responsible for setting up and managing the communication sessions, signaling servers handle tasks like user discovery, session initiation, and call setup. They must be universally available and responsive to handle real-time signaling protocols, which can be resource-intensive and costly to maintain at scale.

NAT Traversal Servers: Network Address Translation (NAT) traversal servers, such as STUN and TURN servers, are crucial for establishing peer-to-peer connections across different networks. They help in navigating firewalls and NAT devices, ensuring connectivity. Maintaining these servers can be costly due to the need for high bandwidth and low-latency connections to facilitate smooth communication.

Media Servers: These servers process and relay media streams (audio, video, and data) between users. They handle tasks like encoding, decoding, mixing, and transcoding media streams, which require significant computational power and bandwidth. The cost of maintaining media servers can be substantial, given the need for high-performance hardware and large-scale data transfer capabilities.

Collectively, these servers must operate with high availability, low latency, and robustness to provide a seamless RTC experience. The associated infrastructure, bandwidth, and operational costs can be significant, posing substantial financial and resource challenges for businesses developing and maintaining RTC applications.

Agora’s managed network of servers is strategically deployed worldwide in 200+ countries and regions, minimizing distance and latency and delivering superior quality experiences.

Distance and Latency

Network routing priorities, influenced by carrier decisions, often lead to discrepancies between the physical and network distances data packets travel, significantly impacting latency. Factors such as peering agreements, traffic management policies, network topology, geographical constraints, redundancy measures, and economic considerations can cause data to take less direct routes. These priorities prioritize business and operational efficiencies over the shortest or fastest path, resulting in varied and often longer travel times for data packets. Consequently, this can degrade the performance of real-time communication applications by increasing latency and reducing overall network efficiency.

Agora’s server-assisted routing minimizes latency and optimizes network pathways, enhancing digital communication.

Maintenance and Support

Keeping pace with changing technologies, user requirements, and behaviors is complex, time-intensive, and costly.

This involves:

  • Adapting to new protocols, standards, and hardware innovations. 
  • Continuously adding new features, optimizing performance, and meeting higher user expectations. 
  • Scaling to handle increased usage and diverse use cases 
  • Regular software updates and ensuring hardware compatibility require extensive testing and quality assurance. 
  • Significant investments in development, maintenance, infrastructure, training, and support. 
  • Implementing new security measures and ensuring regulatory compliance. 

Overall, maintaining an up-to-date RTE platform requires continuous development and investment to provide a reliable and high-quality user experience.

Agora’s continuous monitoring and optimization strategies include:

  • Constant Monitoring – Tracking server performance metrics, simulating user activity, and passively monitoring user experiences across different devices to ensure seamless operation. 
  • Ongoing Optimization – Addressing system bottlenecks, managing technical debt, and retaining skilled engineers to refine and enhance the platform over time. 

Conclusion: The Strategic Decision for Real-Time Technology

Choosing between an in-house WebRTC solution and partnering with a specialized provider like Agora is more than a technical decision—it is strategic. By leveraging Agora’s expertise, businesses can focus on core objectives and innovation rather than the complexities of real-time communication technology. Agora provides developers with easy-to-use, scalable solutions that allow teams to quickly add real-time to their apps while focusing on their core business.