How to Build a VR Video Chat App with Spatial Audio on Oculus

The latest innovation in virtual reality technology brings us a more realistic experience to meet and talk to people in an immersive space through VR headsets. Technology for real-time engagement enables video chat and voice chat in the VR world. Many VR headset makers give consideration to making spatial audio available for environmental sounds. However, the audio from a 3D source (e.g., your peer in a video chat) does not always get the same treatment.

In this tutorial, we are going to walk through building a VR chat application that enables spatial sound from the speakers. It is important to understand that the members of the chat group can come from other platforms, like web or Unity Desktop apps, in addition to a VR headset. For this project, we will demonstrate our implementation on an Oculus Quest headset. The same underlying technology should apply to other compatible headsets.

Project Overview

This project consists of four parts. The first part will walk through how to set up the project, integrating the Oculus packages along with the Agora SDK and sample API scripts.

The second part will walk through creating the scene, including setting up the XR Rig with controllers, creating the UI, and implementing the Agora API hooks.

The third part will show how to use another sample Unity mobile app to test the video streaming between VR and non-VR users.

The last part will be a discussion of the technology used for the audio frame and audio source handling. It is for programmers who want to better understand the implementation. Reading this section is not required for getting this tutorial to run.

The link to the complete project or package can be found in the last section of this blog.

Prerequisites

Unity Editor (version 2019 LTS used here)
Oculus headset and knowing how to Get Started
An understanding of the Unity Editor and Game Objects
Basic understanding of Unity’s XR Framework
An Agora developer account (see How to Get Started with Agora)
Agora Video SDK for Unity

Part 1: Project Setup

Import the Oculus Integration Package

The Get Started page on the Oculus website provides detailed steps on setting up a Unity project for VR. We will go over how the current project was set up using Unity 2019. For versions 2020 and 2021, minor package import steps may vary.

To begin, open a new project targeting the Android platform.

Next, import the Oculus Integration package from the Asset Store. This will take a bit of time:

After the import, Unity may ask if you want to update to a newer version of the plug-in. This may seem a little strange because the latest has been downloaded. You may choose either Yes or No and continue. Unity will restart and finish the import and compilation:

Now, modify the Build Settings as follows:

Change Texture Compression to ASTC.
In Player Settings, change the color space to Linear, deselect Auto Graphics API, and use only OpenGLES 3 for the Graphics APIs.
Provide the company name and product name information in Player Settings.
More recommended Android settings: Set the Scripting back end to IL2CPP, the target architecture to ARM64, and the minimum API Level to Android 7.0.
In Project Settings > XR-Plug-in Management, enable Oculus. Another time-consuming package import and compilation process will occur. (You can take a coffee break.).

Import the Agora Video SDK for Unity

Get the Agora SDK from the Asset Store and import everything:

If this is the first time you’ve used Agora, you should try out the demo app from the Assets folder on the Unity Editor before moving on to the VR project. Check the accompanying README file from the SDK for helpful information about the demo and the SDK. You should be able to run the SDK demo in no time.

You will need an App ID for running Agora applications. Head to your Agora developer console, create a test project, and get the App ID:

For simplicity in this sample project, we will omit the step for token generation. But for better security, you should use an App ID with tokens enabled in your production application.

Copy the Agora API Sample Code

Agora provides a collection of small sample projects that shows the APIs examples for common interests. From the repo, copy the PlaybackAudioFrame folder and its dependent scripts from tools:

Logger.cs
PermissionHelper.cs
RingBuffer.cs

For your convenience, the files are put into this package archive so you don’t have to handpick them from the API-Example. You project folder hierarchy should now look like the following screenshot:

Part 2: Create the Scene

Clone the Sample Scene

We will create a new scene by reusing a sample scene from one of the Oculus Integration packages. Find the RedBallGreenBall scene from the Project navigator and clone this file. Rename the file AgoraSpatialTest and move it to the Assets/Scenes folder.

AgoraSpatialTest Scene

In the scene, a FirstPersonController prefab was used, as shown above. Remove this object, search and find the OVRPlayerContoller prefab from the project, and drag and drop it to the AgoraSpatialTest scene. The FirstPersonController only allows you to see the scene’s objects fixed to the view positions, i.e., the balls move when you turn your head. In contrast, the OVRPlayerController allows you to see the true VR space, and the objects stay at that initial position while your head is turning.

CheckPoint: Now you can build and run the project and experience the spatial audio that is produced by the audio source from the red ball and the green ball. As you turn your head with the Oculus headset, you should hear the sound come from the direction of the individual balls.

Create the UIs

Ball positions: Modify the Z position of the red ball to -2.1 and the Z position of the green ball to 3.
Log text: Create a Text UI, name it LogText, with position = (0,0,0), width = 600, height = 400, and center the text horizontally and vertically.
Canvas: Change the Canvas position to (0,20,27.7), width = 600, height = 400, scale = (0.1, 0.1, 0.1).
Event Camera: In the Canvas, change the Canvas Render Mode to World Space. Drag the CenterEyeAnchor from OVRPlayerController’s children into the Event Camera field:

Remote user prefab: Create a primitive 3D object Capsule, flip it upside down by setting the rotation to 180 degrees on the Z-axis, and attach an AudioSource component to it. Take a look at the SpatializedSound1 object (the green ball), and copy the values from its AudioSource into the Capsule. Clear the Audio Clip value from the AudioSource component on Capsule. Attach an ONSPAudioSource component to the Capsule as well. (The complete path to this script is Assets/Oculus/Spatializer/scripts/ONSPAudioSource.cs.) Drop this object to the Assets/Resources folder so it becomes a prefab. Delete the Capsule object from the scene.

Control point: Create a GameObject, name it AgoraRoot, and position it at (0.58, -1.17, -4.06), which is near the red ball.
Sample scripts association: Drag the UserAudioFrame2SourceSample script to the AgoraRoot object. Enter your Agora App ID and your channel name. Here, unity3d is the channel name that I will use for testing.
User Prefab: Drag the Capsule prefab from the Resources folder into the User Prefab field of the UserAudioFrame2SourceSample component on AgoraRoot.

This screenshot summarizes the actions:

Awesome! Don’t forget to save the scene – this is always important! We are now ready to do a test.

Part 3 : Test VR Spatial Sound

The Oculus headset can now connect to other users via the integrated Agora Video SDK. Since the headset doesn’t produce a video stream in the example, we would like to get the video stream from a non-VR headset source. There are several ways to provide that:

Choice 1: Use the SDK demo app. From the sample project, we can open the demo app from the AgoraEngine folder and run it either from the Editor or a build to a device. Place the device next to a TV or radio for some random audio input.

Choice 2: Use Agora Web demo app. This one can be a good setup with a colleague who can help out the test from a remote location.

Choice 3: Use this demo app for a looping sound sample. The code is from the same repo where we downloaded the scripts earlier. I like this choice because it works well for a solo tester, and it has constant sound samples to validate in the test as expected.

We can’t show the same spatial audio experience to a VR headset in this tutorial. However, we can share what is being presented in the view of the Oculus Quest. In the test, the helper app is running from a mobile phone, which is placed in front of the VR tester. As a result, a Capsule was spawned near the red ball. The sound of the sample music comes from the back-right position when the test starts. The direction of the sound changes as the tester turns their head to look at the Capsule.

Consideration: The original sound from the green ball, which is vocal1 in the Oculus sample resource, is a bit too loud compared to the testing music sample. Try turning its volume down in the AudioSource component to 0.5 or less.

If there are enough resources, more non-VR remote users can be added to the test. Each of the users will be represented by a Capsule instance in the VR scene. The VR tester can walk around, get closer to any other user in the virtual space, and listen to the amplified sound of that remote audio stream!

That’s so easy for a video chat experience with spatial audio in a VR environment, isn’t it? And just by reusing the sample projects, we haven’t written a single line of code.

This completes the tutorial on how to do video chat with spatial audio on Oculus. But if you want to learn more about the technology that makes this possible, read on.

Part 4 : Programmer’s Digest

To understand how this project works, let’s take a look at the building blocks of this project by looking at the key APIs, data structure, and algorithm. Since quickstart tutorials on Oculus are already available, we will focus on how we use the Agora SDK in this project.

You can find the following scripts in the user-audioframe-audiochannel folder:

UserAudioFrame2SourceSample
IUserAudioFrameDelegate
UserAudioFrameHandler

Here, UserAudioFrame2SourceSample is a controller that sets up an Agora RTC engine, registers event callbacks, and manages users as objects. The setup of the engine is pretty straightforward. If you are new to Agora, see this guide for a quickstart tutorial.

Mixed or Not Mixed

This is the most important API that separates individual audio stream:

_audioRawDataManager.SetOnPlaybackAudioFrameBeforeMixingCallback(OnPlaybackAudioFrameBeforeMixingHandler);

The descriptive method name suggests the following implications:

There is a mixing process in the audio stream that we normally hear from application audio output.
This API callback sends the audio stream from a remote user before the mixing. It is exactly what we needed for setting up the audio source in a spatial audio setting.
You can play the separate audio stream before the mixing, but the audio is also mixed and played in the normal process.

For the third implication, it is important to tell the engine that you want to turn off the normal mixed audio output and let you take control of what is being played. Here is the second important API call:

mRtcEngine.SetParameter("che.audio.external_render", true);

MainThread Dispatcher

In Unity, a dynamic component attachment or UI update must be done on the main thread. Playing an audio clip on an AudioSource object is considered an UI update. But the OnPlaybackAudioFrameBeforeMixingHandler callback runs on a background thread. So how do we deal with that? The answer is MainThread Dispatcher. By using the BlockingCollection data structure, we have thread-safe data struct that can queue up the actions and let them run by the Unity main thread, accessed in the Update() function. To send actions to the queue, call the dispatch method like this:

dispatch(()=> { <code statement 1> ; <code statement 2>; <etc.> });

The dispatch method takes a C# System Action as parameter, which is also an object, and adds to the BlockingCollection. Very simple.

UserAudioFrameHandler

This handler class runs individually on the Capsule object that is spawned when a remote user joins. The handler converts the audio frames that came from the Agora engine into audio clip data and plays them on the AudioSource component. Of course, we also need the Oculus SDK’s script ONSPAudioSource to extend that capability to spatial sound.

The properties of the AudioSource component are filled by the information passed from the first audio frame packet. One technical challenge here is making the smooth playback on an interval based audio frames collection on the AudioSource. It is a classical Consumer-Producer concurrency model. The answer to that is the use of a Ring Buffer data structure, thanks to Joe Osborn, the GitHub contributor who implemented the code. In the sample code, we allocate about 10 seconds of audio data according to the audio frame rate and channels. Basically, the user audio frame callback function (discussed in the previous section) acts as the producer, and the AudioSource’s OnAudioRead function acts as the consumer. If there is enough data, AudioSource will consume the buffered audio from this data structure.

The following diagram illustrates this architecture:

Considerations

Performance: Programmers should optimize the implementation to their application for the memory usage on the ring buffer and CPU time on the audio frame conversion.
Editor testing: You should run the code in the Unity Editor before launching it to the headset. However, there is an erroneous behavior from the current Oculus SDK (version 29.0 at time of writing) that may stop the application. It is something like this:

NullReferenceException: Object reference not set to an instance of an object
OculusSampleFrameworkUtil.HandlePlayModeState (UnityEditor.PlayModeStateChange state) (at Assets/Oculus/SampleFramework/Editor/OculusSampleFrameworkUtil.cs:45)
UnityEditor.EditorApplication.Internal_PlayModeStateChanged (UnityEditor.PlayModeStateChange state) (at /Users/bokken/buildslave/unity/build/Editor/Mono/EditorApplication.cs:415)

In this case, you can update the OculusSampleFrameworkUtil.cs script with the following code snippet:

private static void HandlePlayModeState(PlayModeStateChange state)
{
        if (state == PlayModeStateChange.EnteredPlayMode)
        {
            System.Version v0 = new System.Version(0, 0, 0);
            UnityEngine.Debug.Log("V0=" + v0.ToString());
#if UNITY_EDITOR
            OVRPlugin.SendEvent("load", v0.ToString(), "sample_framework");
#else
            OVRPlugin.SendEvent("load", OVRPlugin.wrapperVersion.ToString(), "sample_framework");
#endif
        }
 }

All Done!

The tutorial should work best on the Oculus Quest. And since we didn’t write a line of code for this project, there won’t be a separate code sample to maintain in a GitHub repo. However, the scene and scripts in the dependency are archived in a custom package, so you can just import the package into a Oculus project to try it out. See this repo link for the package.

If you are developing for other VR headsets, the technology we described in part 4 should be universal, and you can leverage APIs on the other headset. Best of luck on your immersive journey!

Other Resources

For more information about the Agora Video SDK, see the Agora Unity SDK API Reference. For more information about the Oculus Integration Unity SDK, see the Oculus App Development page.

For technical support with Agora, I invite you to join the Agora Developer Slack community.

Learn more about Agora's video and voice solutions

Ready to chat through your real-time video and voice needs? We're here to help! Current Twilio customers get up to 2 months FREE.

Complete the form, and one of our experts will be in touch.

Try Agora for Free

Try for Free

TEN

App Builder

Flexible Classroom

Download SDKs

Support Plans and Pricing