Skip to content
Implementing Spatial Audio Chat in Unity Using Agora Featured

Implementing Spatial Audio Chat in Unity Using Agora

By Author: Joel Thomas In Developer

Hello everyone. In this tutorial you implement Agora’s spatial audio functionality in a 3D Unity example. With two or more players in the scene, the audio modulates based on the distance between players and pans between the left and right speakers based on which side of the player the speaker is on while talking. In this project, you use Agora for the RTE spatial audio chat and PUN2 for the networking.

The basic functionality:

  • When a new player joins the scene, add them to a list of tracker players.
  • Adjust the grain based on your local player distance to remote player X.
  • Adjust the pan based on your local player orientation to remove player X.
  • When a play leaves the game, remove them from the list.

To get started, you need a valid Agora account. If you dont have one, here is a guide to setting up an account.

This project builds on the agora-party-chat demo. If you don’t have experience with that demo, you can download it and check it out to see basic Agora functionality and PUN2 networking in action.

Networking Setup

The AgoraVideoChat.cs script is used as the main driver for the Agora engine. Before you do any major coding, you lay the groundwork for the spatial audio script and add some functions to AgoraVideoChat to call from SpatialAudio.

First, you update the video profile to fit the square aspect ratio for the images used in the left UI panel.

You must include EnableSoundPositionIndication after enableVideo and before Join channel to properly prime the Agora engine for RTE spatial audio. Pay attention to the order in which Agora initialization functions are called in the reference image.

Next, add a uint variable called networkedUID. This variable mirrors the myUID variable across the network.

private uint networkedUID;
public uint GetNetworkedUID() => networkedUID;
public IRtcEngine GetRtcEngine()
    if (mRtcEngine != null)
        return mRtcEngine;
    return null;

To update the value across the network, add an UpdateNetworkedPlayerUID RPC function.

public void UpdateNetworkedPlayerUID(string newUID)
    networkedUID = uint.Parse(newUID);

Add this RPC call to the local and remote user join callbacks. This RPC function fires whenever a local player (you) or a remote player (your friend) joins the Agora channel.

This is being called on every PhotonTarget in the scene — in this case, the CharPrefab player (Assets > DemoVikings > Resources) — and passing this UID across the network. Photon can’t pass uint variables, so you have to pass it as a string and then parse it in the RPC.

This grants visibility of the Agora UIDs to other players in the game. These UIDs are used to tell the Agora engine what pan and gain amounts to be hearing from each player.

Spatial Audio Setup

In order to properly spatialize the audio, you must know how far a player is from your character to affect the gain (louder when they are closer, quieter when they are farther away). The pan is affected by the side of my body the player is on while speaking, and it adjusts how much audio is broadcast through the left and right speakers.

Create a new script called SpatialAudio.cs and attach it to the CharPrefab. Be sure to inherit from Photon.MonoBehaviour.

Remote Player Lists

In order to consistently track these pan and gain values, you must know which players to track. Two lists suffice: one list holding a reference to each player’s transform and one holding a reference to that player’s networked UID.

Create the lists:

Serialize the private lists so you can see them in the editor and test if the players and UIDs are being properly added, removed, and synchronized across the network.

Get an IRtcEngine reference, and create callbacks for when a local player and a remote player join.

In the callbacks, you write the code to add and remove players to and from the lists when they join or leave the game.

The network takes a variable amount of time to sync this data, causing a delay in retrieving the networked UIDs. A coroutine is used for adding players to the list, with a 2 second timeout in case the data is missing or dropped. (During my testing, the data was synced after about .25 seconds.)

In this coroutine you wait until the UID has been retrieved and then check if the UID is already in the stored player list. If it is not, add the player.

Update Spatial Audio

Next, you create the main driver of the audio. Create a function called UpdateSpatialAudio():

Let’s first look at what we are doing in this function. You work out the pan and gain values in the next step.

For player X in the list, get the gain, get the pan, and then spatialize player X’s audio for the feed.

IMPORANT: Agora requires a pan value between -1 and 1, and a gain value between 0 and 100.

Get Pan By Player Orientation

The pan determines how much audio is broadcast from each speaker. If the player is perfectly in front or behind the player, the audio is balanced 50–50 between the left and right speakers. If the player is perfectly to the left or right of the player, the audio is broadcast only from the corresponding speaker. Any pan value between those precise extremes creates a blend of volume balances between the left and right outputs.

For the Agora engine, a pan value of -1 represents full left output, 0 is balanced left and right, and 1 is a full right output. In practice, our pan value is a gradient between those amounts, and it smoothly updates the volume balance wherever the player is.

Coding Logic:

  • Get a vector pointing from my local player to the remote player.
  • Normalize that vector to a length of 1.
  • Get the dot product of that vector, and the local player’s right-pointing vector.

Here is an image of the function:

Get Gain By Player Distance

The gain determines how loud the player is in relation to how close you are. The closer the louder, the further away the quieter.

The Agora engine needs a double value between 0 and 100 for gain. Knowing that, you must set a maximum and minimum distance to smoothly leap between those values.

I use the radius of the spherical trigger volume for my CharPrefab character, which is 6 meters as the minimum volume boundary. As a maximum volume boundary, I use 1.5 meters.

Add a variable for the max volume boundary:

private const float MAX_CHAT_PROXIMITY = 1.5f;

Put Simply:

  • You get the distance between the two players and clamp the value between the minimum and maximum bounds.
  • Normalize the distance result between 1.5f and 6f to a value between 0 and 1.
  • Multiply the normalized gain value by 100 to produce a number between 0 and 100, and pass the number to the Agora engine.

With that, you complete the UpdateSpatialAudio() function, which spatializes the audio for every player in the list for each frame.
Now all that’s required is a good test! You need two different clients running this project in a build or in the Unity editor. Unfortunately, on Agora web call does not suffice, because the position of each player is required.
This demo is not optimized for performance. It is intended only to showcase the spatial audio functionality at maximum fidelity. If you have any optimizations or improvements to the code, fee free to submit a pull request and contribute to the Agora community!
Optimization starting points:

  • Disabling Agora audio after the players are out of range of each other.
  • Using trigger volumes to enable or disable functionality.
  • Checking for pan and gain every fourth, tenth, or X frames instead of every single frame.

Thanks for checking out the demo. If you learned something from it, make sure to teach someone else!
— Joel