Skip to content
Creating Composite AR and Video Experiences with ARVideoKit and Agora Featured

Creating Composite AR and Video Experiences with ARVideoKit and Agora

By Author: Content Contributor In Developer

This blog was written by Ahmed Fathi, the CTO of Magic Studio, Inc. Magic Studio is the creator of HMU, the “one stop shop” for watching live gaming streams. Do you want to write for Agora and build your brand? You can reach out to community@agora.io for more information.


Over the past decade, Augmented Reality (AR) has evolved quickly and became a mainstream technology due to the various applications AR had impacted. AR today is integrated into many industries such as the film industry, medical research, and entertainment. There are 2 key principals that are required to achieve AR:

  1. Real-World Object Mapping
  2. Displaying Digital Graphics on real-world objects

Object mapping integrates various computer vision techniques such as Simultaneous Localization And Mapping (SLAM) and Visual-Inertial state estimation (VINs) to scan 2D video frames and reconstruct them into 3D space. Once the mapping is successful and a 3D map is starting to form, we use the 3D map coordinates to attach and display digital graphics on the constructed 3D map. A constructed 3D map can represent any kind of real-world objects such as horizontal surfaces, vertical surfaces, human faces, animal faces… etc.

Source: SLAM 3D Mapping Example

Once the digital graphics/objects are displayed on the 3D map, the camera position may change to view different angles of the digital object and move closer or further from the digital object. What makes AR/VR so unique is the ability for the user to control part of their virtual experience. It’s also what makes AR/VR immersive. The creator of the experience only has so much control. The rest is open to the user to explore and experience. This creates a new depth to interaction with a virtual experience. Like everything else in life, people are social and the end users will always want to share what they are seeing. For these reasons, the majority of AR applications today require some sort of recording or streaming of the AR experience. Because the computation under the hood to perform the AR experience gets complex, it becomes very inefficient to stream or record the full AR experience. Therefore, the best and most efficient solution to achieve AR streaming or recording is compositing the video frame with the displayed digital objects into one single video frame.

In 2017, composite AR video rendering was introduced to iOS and iPadOS by ARVideoKit. Since then, ARVideoKit has had a great impact on the mobile AR community and has been integrated into software for the film industry, medical research, social, and entertainment industries. ARVideoKit has enabled thousands of developers and hundreds of organizations to stream and record AR content from iOS and iPadOS devices since it is the only framework enabling developers to stream and record AR video.

Why use ARVideoKit?

As you develop an Augmented Reality experience in your application, you will likely get to a point where you would need to either:

  1. Record the AR experience
  2. Stream the AR experience (i.e. video chat, live streaming… etc)

To achieve efficient AR recording/streaming, the AR content and the video frame must be composited; this technique was introduced by ARVideoKit in 2017. ARVideoKit handles all the complex computation and processing of the AR content + video frame compositing under the hood and offers many features including:

  • GIF Capturing
  • Live Photo Capturing
  • Pause / Resume Video Recording
  • Access to both original raw video buffers and composited video buffers
  • Recording / Streaming with background audio player on

ARVideoKit Under the Hood

ARVideoKit’s core functionality of compositing video frames consists of two core components:

  1. Scene Renderer
  2. Display Link Timer

The scene renderer handles displaying the AR Scene in a single frame at a rate of 30 to 60 frames per second. To assure minimal lag when rendering the AR scene, the operations of the renderer are processed on the GPU of the device. The Graphics Processing Unit (GPU) is a powerful hardware chipset for efficiently processing and computing complex operations for graphics processing.

The display link timer is used to composite the video frames from the scene renderer to the refresh rate of the display. Similar to the scene renderer, the display link timer runs on the GPU.

ARVideoKit High-Level Architecture

Once the composite video frames are generated, we can use the composited frames to do various operations such as writing the frames to a video file, streaming the frames to another device via a local or online network, apply additional image processing techniques and more.

When it comes to writing the composited frames into a video file, ARVideoKit offers easy-to-use methods to write the frames to a video file with an audio source. To stream or apply additional processing to the composited frames, the framework returns both the original frames and the composited frames for streaming and additional custom processing.

Implementing ARVideoKit with Live Streaming

To take the ARVideoKit concept beyond recorded video, the Agora Team has developed a framework that integrates ARVideoKit’s technologies with Agora’s RTE (live broadcasting and communications) technologies called: AgoraARKit. Using AgoraARKit, you can stream video with AR content efficiently, allowing users to share their AR experiences. AR experiences shared in real-time allows for new use-cases in education with AR classrooms, or in business with Remote Support. Just like AR, the possibilities are endless.

For more information on how to implement AgoraARKit into your application, check out this article.

Augmented reality applications are increasing and quickly becoming essential in many existing apps. As a software engineer, it’s important to stay up to date with the latest tech. ARKit, ARVideoKit, and AgoraARKit enables you to integrate powerful AR functionalities quickly and efficiently!