Skip to content
Build a Voice Chat App with Live Transcriptions Using React Native and Agora SDK featured

Build a Voice Chat App with Live Transcriptions Using React Native and Agora SDK

By Author: David Ubong Ekanem In Developer

David Ubong Ekanem is a creative software engineer who enjoys building applications that explore new technologies and the link between learning and writing.

Interested in writing an Agora tutorial blog? Send us your topic today!


Communication is an essential part of our daily lives. With the proliferation of mobile technology, voice chat applications enable interconnectivity. Keeping track of what was said in the meeting can be challenging during video meetings.

Come with me as we build a service that’s useful for taking notes by students or businesses. We will build a React Native application that supports speech-to-text transcription. We’ll go over the structure, setup, and execution of the app before diving into how it works.

We’ll be using the Agora RTC SDK for React Native for the example below. I’m using v4.1.0 at the time of writing.

Create an Agora account

Go to https://console.agora.io/ to create an account and log in to the dashboard. You can follow this guide for reference: https://www.agora.io/en/blog/how-to-get-started-with-agora.

Build a Voice Chat App with Live Transcriptions Using React Native and Agora SDK - Agora Console
Agora Console

Click the sidebar then click the Project Management tab, under Project Management. Click the Create Project button to create a new project.

Once the project is created, retrieve the App ID. Select an App ID with a token, and obtain a temporary token for the project. On the edit page, click the link to generate a temporary token. The temporary token is used for authorizing requests while developing the application.

Note: Token authentication is recommended for all RTE apps running in production environments. For more information about token-based authentication on the Agora platform, see this guide: https://docs.agora.io/en/Video/token?platform=All%20Platforms

Create an OpenAI account

Go to https://platform.openai.com/ to create an account, and log in to the dashboard. Click the user section tab and select View API keys. Retrieve the API key, or create a new one specifically for this project.

Build a Video Call Application with Transcription With React Native and Agora SDK- API reference OpenAI
API reference OpenAI

Structure of our application

Build a Video Call Application with Transcription With React Native and Agora SDK - Structure of the application

Let’s run the app

The LTS version of Node.js and NPM need to be installed on your system.

  1. Ensure that you have registered an Agora account, set up a project, and generated an App ID (and temporary token).
  2. Ensure that you have registered an OpenAI account and retrieved your API key.
  3. Download and extract the ZIP file from the master branch.
  4. Run yarn or npm install to install the app dependencies in the unzipped directory.
  5. Navigate to ./App.tsx and enter the App ID that was obtained from the Agora Console appId: ‘<YourAppIDHere>’. If you’re using tokens, enter your token and channel name.
  6. If you’re building for iOS, open a terminal and execute cd ios && pod install
  7. Run yarn start or npm run start to start the metro server.
  8. Once the metro server starts, you should see the following display in your console.
Build a Video Call Application with Transcription With React Native and Agora SDK - React Native loaded successfully
React Native loaded successfully
  • The iOS and Android simulator does not support the camera. Use a physical device instead.
  • The application should start on your physical device.

That’s it. You should be able to join or leave a call. The app uses the agoraReactNativeStream as the channel name.

Getting to how it works

We’re exporting a component that accepts the transcribed text and interimTranscribedText and displays the TranscribedText. This is used for displaying the transcribed text from our video call.

The App.tsx file contains the core logic of our video call.

We start by writing the import statements. Next, we have some constants for our App ID, token, open-AI key, and channel name. The other constants (sample rate, number of channels, and samples per call) specify how we want to store our audio file.

We define a functional component in which the agoraEngineClass variable stores the IRtcEngine class. This class provides methods that we can invoke in our application to manage the video call. The useState hook stores the data of our transcribed text.

The setupVideoSDKEngine initializes the Agora engine on app start. The application requests needed permissions, and if they are granted, the Agora engine initializes. We register the audioFrameObserver to be able to access onPlayAudioFrame methods if needed.

The join function is particularly useful as it enables us to join our desired channel. If transcribedText is available in the state, we clear it before joining the channel. We trigger the startAudioRecording method. We pass in the desired filePath for storing our audio.

We rely on the React Native File System to access the native file system of our React Native application. Encoding is set to false, and the constant SAMPLE_RATE represents the sample rate. Set the fileRecordingType to AudioFileRecordingMixed. We are set to record the mixed audio of the local and all remote users.

The leave function enables the user to exit the channel and stops the audio recording. We unregister the earlier register audioFrameObserver. The application triggers the startTranscribe function once the audioRecording ends.

In the startTranscribe function, the recordingPath and fileName make up the filePath. This filePath shows the way to the audioData. The whisperAPI receives the audioData, and we receive the transcribed text.

We define the render function for displaying the buttons for joining or leaving a call. The render function also displays our transcribed text.

Our finished application.

Conclusion

That’s how easy it is to build a transcription application inside a video call application. You can refer to the Agora React Native API Reference to see methods that can help us quickly add additional features to our application.

If you’re deploying your app to production, you can read more about how to use tokens in this blog post.

I invite you to join the Agora Developer Slack Community. Feel free to ask any React Native questions in the #react-native-help-me channel.