Build an Agora Conversational AI Service using Golang

Conversational AI is revolutionizing how people interact with artificial intelligence. Instead of carefully crafting text prompts, users can have natural, real-time voice conversations with AI agents. This opens exciting opportunities for more intuitive and efficient interactions.

Many developers have already invested significant time building custom LLM workflows for text-based agents. Agora’s Conversational AI Engine allows you to connect these existing workflows to an Agora channel, enabling real-time voice conversations without abandoning your current AI infrastructure.

In this guide, I’ll walk you through building a Go server that handles the connection between your users and Agora’s Conversational AI. By the end, you’ll have a production-ready backend that can power voice-based AI conversations for your applications.

Prerequisites

Before getting started, make sure you have:

Go (version 1.18 or higher)
Basic knowledge of Go and the Gin framework
An Agora account — the first 10k minutes each month are free
Conversational AI service activated on your AppID

Project Setup

Let’s start by setting up our Golang project with the necessary dependencies. First, create a new directory and initialize a Go module:

mkdir agora-convo-ai-go-server
cd agora-convo-ai-go-server
go mod init github.com/AgoraIO-Community/convo-ai-go-server

Next, we’ll add the key dependencies for our server:

go get github.com/gin-gonic/gin
go get github.com/joho/godotenv
go get github.com/AgoraIO-Community/go-tokenbuilder

Create the initial directory structure, and as we go through the guide, we’ll fill these directories with the files we need.

mkdir -p convoai token_service http_headers validation
touch .env

Your project directory should now have a structure like this:

├── convoai/
├── token_service/
├── http_headers/
├── validation/
├── .env
├── go.mod
├── go.sum

Server Entry Point

Start by setting up the main application file, which will be the entry point for our server. We’ll then load the environment variables, set up the configuration, and initialize the router with the appropriate middleware and routes.

Create the main.go file:

touch main.go

package main

import (
	"context"
	"log"
	"net/http"
	"os"
	"os/signal"
	"syscall"
	"time"

	"github.com/gin-gonic/gin"
	"github.com/joho/godotenv"
)

func loadConfig() (*convoai.ConvoAIConfig, error) {
	config := &convoai.ConvoAIConfig{
		// Agora Configuration
		AppID:          os.Getenv("AGORA_APP_ID"),
		AppCertificate: os.Getenv("AGORA_APP_CERTIFICATE"),
		CustomerID:     os.Getenv("AGORA_CUSTOMER_ID"),
		CustomerSecret: os.Getenv("AGORA_CUSTOMER_SECRET"),
		BaseURL:        os.Getenv("AGORA_CONVO_AI_BASE_URL"),
		AgentUID:       os.Getenv("AGENT_UID"),

		// LLM Configuration
		LLMModel: os.Getenv("LLM_MODEL"),
		LLMURL:   os.Getenv("LLM_URL"),
		LLMToken: os.Getenv("LLM_TOKEN"),

		// TTS Configuration
		TTSVendor: os.Getenv("TTS_VENDOR"),
	}

	// Microsoft TTS Configuration
	if msKey := os.Getenv("MICROSOFT_TTS_KEY"); msKey != "" {
		config.MicrosoftTTS = &convoai.MicrosoftTTSConfig{
			Key:       msKey,
			Region:    os.Getenv("MICROSOFT_TTS_REGION"),
			VoiceName: os.Getenv("MICROSOFT_TTS_VOICE_NAME"),
			Rate:      os.Getenv("MICROSOFT_TTS_RATE"),
			Volume:    os.Getenv("MICROSOFT_TTS_VOLUME"),
		}
	}

	// ElevenLabs TTS Configuration
	if elKey := os.Getenv("ELEVENLABS_API_KEY"); elKey != "" {
		config.ElevenLabsTTS = &convoai.ElevenLabsTTSConfig{
			Key:     elKey,
			VoiceID: os.Getenv("ELEVENLABS_VOICE_ID"),
			ModelID: os.Getenv("ELEVENLABS_MODEL_ID"),
		}
	}

	// Modalities Configuration
	config.InputModalities = os.Getenv("INPUT_MODALITIES")
	config.OutputModalities = os.Getenv("OUTPUT_MODALITIES")

	return config, nil
}

func setupServer() *http.Server {
	log.Println("Starting setupServer")
	if err := godotenv.Load(); err != nil {
		log.Println("Warning: Error loading .env file. Using existing environment variables.")
	}

	// Load configuration
	config, err := loadConfig()
	if err != nil {
		log.Fatal("Failed to load configuration:", err)
	}

        // TODO: Validate environment configuration

	// Server Configuration
	serverPort := os.Getenv("PORT")
	if serverPort == "" {
		serverPort = "8080"
	}

	// CORS Configuration
	corsAllowOrigin := os.Getenv("CORS_ALLOW_ORIGIN")

	// Set up router with headers
	router := gin.Default()
	//TODO: Register headers

	// TODO: Initialize services & register routes

	// Register healthcheck route
	router.GET("/ping", Ping)

	// Configure and start the HTTP server
	server := &http.Server{
		Addr:    ":" + serverPort,
		Handler: router,
	}

	log.Println("Server setup completed")
	log.Println("- listening on port", serverPort)
	return server
}

func main() {
	server := setupServer()

	// Start the server in a separate goroutine to handle graceful shutdown.
	go func() {
		if err := server.ListenAndServe(); err != nil && err != http.ErrServerClosed {
			log.Fatalf("listen: %s\n", err)
		}

	}()

	// Prepare to handle graceful shutdown.
	quit := make(chan os.Signal, 1)
	signal.Notify(quit, os.Interrupt, syscall.SIGTERM)

	// Wait for a shutdown signal.
	<-quit
	log.Println("Shutting down server...")

	// Attempt to gracefully shutdown the server with a timeout of 5 seconds.
	ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
	defer cancel()
	if err := server.Shutdown(ctx); err != nil {
		log.Fatal("Server forced to shutdown:", err)
	}

	log.Println("Server exiting")
}

// Ping is a handler function that serves as a basic health check endpoint.
func Ping(c *gin.Context) {
	c.JSON(200, gin.H{
		"message": "pong",
	})
}

Note: We are loading the PORT from the environment variables, it will default to 8080 if not set in your .env file.

Let’s test our basic Go server by running:

go run main.go

You should see “Server setup completed” and “- listening on port 8080” in your console.

You can now visit http://localhost:8080/ping in your browser to verify the server is working, you should see {“message”: “pong”} as the response.

To test the server using curl, run:

curl http://localhost:8080/ping

You should see the response: {“message”: “pong”} ‍

Type Definitions

Next, let's define the types needed for our ConvoAI service. Create a file called convoai-types.go in the convoai directory.

touch convoai/convoai-types.go

Add the following types:

package convoai

// InviteAgentRequest represents the request body for inviting an AI agent
type InviteAgentRequest struct {
	RequesterID      string   `json:"requester_id"`
	ChannelName      string   `json:"channel_name"`
	RtcCodec         *int     `json:"rtc_codec,omitempty"`
	InputModalities  []string `json:"input_modalities,omitempty"`
	OutputModalities []string `json:"output_modalities,omitempty"`
}

// RemoveAgentRequest represents the request body for removing an AI agent
type RemoveAgentRequest struct {
	AgentID string `json:"agent_id"`
}

// TTSVendor represents the text-to-speech vendor type
type TTSVendor string

const (
	TTSVendorMicrosoft  TTSVendor = "microsoft"
	TTSVendorElevenLabs TTSVendor = "elevenlabs"
)

// TTSConfig represents the text-to-speech configuration
type TTSConfig struct {
	Vendor TTSVendor   `json:"vendor"`
	Params interface{} `json:"params"`
}

// AgoraStartRequest represents the request to start a conversation
type AgoraStartRequest struct {
	Name       string     `json:"name"`
	Properties Properties `json:"properties"`
}

// Properties represents the configuration properties for the conversation
type Properties struct {
	Channel          string    `json:"channel"`
	Token            string    `json:"token"`
	AgentRtcUID      string    `json:"agent_rtc_uid"`
	RemoteRtcUIDs    []string  `json:"remote_rtc_uids"`
	EnableStringUID  bool      `json:"enable_string_uid"`
	IdleTimeout      int       `json:"idle_timeout"`
	ASR              ASR       `json:"asr"`
	LLM              LLM       `json:"llm"`
	TTS              TTSConfig `json:"tts"`
	VAD              VAD       `json:"vad"`
	AdvancedFeatures Features  `json:"advanced_features"`
}

// ASR represents the Automatic Speech Recognition configuration
type ASR struct {
	Language string `json:"language"`
	Task     string `json:"task"`
}

// LLM represents the Language Learning Model configuration
type LLM struct {
	URL              string          `json:"url"`
	APIKey           string          `json:"api_key"`
	SystemMessages   []SystemMessage `json:"system_messages"`
	GreetingMessage  string          `json:"greeting_message"`
	FailureMessage   string          `json:"failure_message"`
	MaxHistory       int             `json:"max_history"`
	Params           LLMParams       `json:"params"`
	InputModalities  []string        `json:"input_modalities"`
	OutputModalities []string        `json:"output_modalities"`
}

// SystemMessage represents a system message in the conversation
type SystemMessage struct {
	Role    string `json:"role"`
	Content string `json:"content"`
}

// LLMParams represents the parameters for the Language Learning Model
type LLMParams struct {
	Model       string  `json:"model"`
	MaxTokens   int     `json:"max_tokens"`
	Temperature float64 `json:"temperature"`
	TopP        float64 `json:"top_p"`
}

// VAD represents the Voice Activity Detection configuration
type VAD struct {
	SilenceDurationMS   int     `json:"silence_duration_ms"`
	SpeechDurationMS    int     `json:"speech_duration_ms"`
	Threshold           float64 `json:"threshold"`
	InterruptDurationMS int     `json:"interrupt_duration_ms"`
	PrefixPaddingMS     int     `json:"prefix_padding_ms"`
}

// Features represents advanced features configuration
type Features struct {
	EnableAIVAD bool `json:"enable_aivad"`
	EnableBHVS  bool `json:"enable_bhvs"`
}

// InviteAgentResponse represents the response for an agent invitation
type InviteAgentResponse struct {
	AgentID  string `json:"agent_id"`
	CreateTS int64  `json:"create_ts"`
	Status   string `json:"status"`
}

// RemoveAgentResponse represents the response for an agent removal
type RemoveAgentResponse struct {
	Success bool   `json:"success"`
	AgentID string `json:"agent_id"`
}

// ConvoAIConfig holds all configuration for the ConvoAI service
type ConvoAIConfig struct {
	// Agora Configuration
	AppID          string
	AppCertificate string
	CustomerID     string
	CustomerSecret string
	BaseURL        string
	AgentUID       string

	// LLM Configuration
	LLMModel string
	LLMURL   string
	LLMToken string

	// TTS Configuration
	TTSVendor     string
	MicrosoftTTS  *MicrosoftTTSConfig
	ElevenLabsTTS *ElevenLabsTTSConfig

	// Modalities Configuration
	InputModalities  string
	OutputModalities string
}

// MicrosoftTTSConfig holds Microsoft TTS specific configuration
type MicrosoftTTSConfig struct {
	Key       string `json:"key"`
	Region    string `json:"region"`
	VoiceName string `json:"voice_name"`
	Rate      string `json:"rate"`
	Volume    string `json:"volume"`
}

// ElevenLabsTTSConfig holds ElevenLabs TTS specific configuration
type ElevenLabsTTSConfig struct {
	Key     string `json:"key"`
	VoiceID string `json:"voice_id"`
	ModelID string `json:"model_id"`
}

These new types give some insight into all the parts we'll be assembling in the next steps. We'll take the client request, and use it to configure the AgoraStartRequest and send it to Agora's Conversational AI Engine. Agora's Convo AI engine will add the agent to the conversation.

ConvoAI Service

With our types defined, let's implement the agent routes for inviting and removing agents from conversations.

Create the convoai-service.go file:

touch convoai/convoai-service.go

Start with importing gin and the agora-token library, because we'll need to generate tokens for the agent. Then we'll register and set up the agent routes. These functions will validate the request before passing it to their respective handlers.

package convoai

import (
	"net/http"

	"github.com/AgoraIO-Community/convo-ai-go-server/token_service"
	"github.com/gin-gonic/gin"
)

// ConvoAIService handles AI conversation functionality
type ConvoAIService struct {
	config       *ConvoAIConfig
	tokenService *token_service.TokenService
}

// NewConvoAIService creates a new ConvoAIService instance
func NewConvoAIService(config *ConvoAIConfig, tokenService *token_service.TokenService) *ConvoAIService {
	return &ConvoAIService{
		config:       config,
		tokenService: tokenService,
	}
}

// Register the ConvoAI service routes
func (s *ConvoAIService) RegisterRoutes(router *gin.Engine) {
	agent := router.Group("/agent")
	agent.POST("/invite", s.InviteAgent)
	agent.POST("/remove", s.RemoveAgent)
}

// InviteAgent handles the agent invitation request
func (s *ConvoAIService) InviteAgent(c *gin.Context) {
	var req InviteAgentRequest
	if err := c.ShouldBindJSON(&req); err != nil {
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

	// Validate the request
	if err := s.validateInviteRequest(&req); err != nil {
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

	// Call the handler
	response, err := s.HandleInviteAgent(req)
	if err != nil {
		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
		return
	}

	c.JSON(http.StatusOK, response)
}

// RemoveAgent handles the agent removal request
func (s *ConvoAIService) RemoveAgent(c *gin.Context) {
	var req RemoveAgentRequest
	if err := c.ShouldBindJSON(&req); err != nil {
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

	// Validate the request
	if err := s.validateRemoveRequest(&req); err != nil {
		c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()})
		return
	}

	// Call the handler
	response, err := s.HandleRemoveAgent(req)
	if err != nil {
		c.JSON(http.StatusInternalServerError, gin.H{"error": err.Error()})
		return
	}

	c.JSON(http.StatusOK, response)
}

Invite Agent Handler

Next, we'll implement the invite handler, which needs to handle several key tasks:

Generate a token for the AI agent to access the RTC channel.
Configure Text-to-Speech (Microsoft or ElevenLabs)
Define the AI agent's prompt and greeting message.
Configure the Voice Activity Detection (VAD), which controls conversation flow.
Sends the start request to Agora's Conversational AI Engine.
Returns the response to the client that contains the AgentID from Agora's Convo AI Engine response.

Create the file convoai_handler_invite.go :

touch convoai/convoai_handler_invite.go

Add the following content:

package convoai

import (
	"bytes"
	"encoding/json"
	"fmt"
	"io"
	"net/http"
	"time"

	"crypto/rand"

	"github.com/AgoraIO-Community/convo-ai-go-server/token_service"
)

// HandleInviteAgent processes the agent invitation request
func (s *ConvoAIService) HandleInviteAgent(req InviteAgentRequest) (*InviteAgentResponse, error) {
	// Generate token for the agent
	tokenReq := token_service.TokenRequest{
		TokenType: "rtc",
		Channel:   req.ChannelName,
		Uid:       "0",
		RtcRole:   "publisher",
	}

	token, err := s.tokenService.GenRtcToken(tokenReq)
	if err != nil {
		return nil, fmt.Errorf("failed to generate token: %v", err)
	}

	// Get TTS config based on vendor
	ttsConfig, err := s.getTTSConfig()
	if err != nil {
		return nil, fmt.Errorf("failed to get TTS config: %v", err)
	}

	// Set up system message for AI behavior
	systemMessage := SystemMessage{
		Role:    "system",
		Content: "You are a helpful assistant. Pretend that the text input is audio, and you are responding to it. Speak fast, clearly, and concisely.",
	}

	// Set default modalities if not provided
	inputModalities := req.InputModalities
	if len(inputModalities) == 0 {
		inputModalities = []string{"text"}
	}

	outputModalities := req.OutputModalities
	if len(outputModalities) == 0 {
		outputModalities = []string{"text", "audio"}
	}

	// Build the request body for Agora Conversation AI service
	agoraReq := AgoraStartRequest{
		Name: fmt.Sprintf("agent-%d-%s", time.Now().UnixNano(), randomString(6)),
		Properties: Properties{
			Channel:         req.ChannelName,
			Token:           token,
			AgentRtcUID:     s.config.AgentUID,
			RemoteRtcUIDs:   getRemoteRtcUIDs(req.RequesterID),
			EnableStringUID: isStringUID(req.RequesterID),
			IdleTimeout:     30,
			ASR: ASR{
				Language: "en-US",
				Task:     "conversation",
			},
			LLM: LLM{
				URL:             s.config.LLMURL,
				APIKey:          s.config.LLMToken,
				SystemMessages:  []SystemMessage{systemMessage},
				GreetingMessage: "Hello! How can I assist you today?",
				FailureMessage:  "Please wait a moment.",
				MaxHistory:      10,
				Params: LLMParams{
					Model:       s.config.LLMModel,
					MaxTokens:   1024,
					Temperature: 0.7,
					TopP:        0.95,
				},
				InputModalities:  inputModalities,
				OutputModalities: outputModalities,
			},
			TTS: *ttsConfig,
			VAD: VAD{
				SilenceDurationMS:   480,
				SpeechDurationMS:    15000,
				Threshold:           0.5,
				InterruptDurationMS: 160,
				PrefixPaddingMS:     300,
			},
			AdvancedFeatures: Features{
				EnableAIVAD: false,
				EnableBHVS:  false,
			},
		},
	}

	// Debug logging
	prettyJSON, _ := json.MarshalIndent(agoraReq, "", "  ")
	fmt.Printf("Sending request to start agent: %s\n", string(prettyJSON))

	// Convert request to JSON
	jsonData, err := json.Marshal(agoraReq)
	if err != nil {
		return nil, fmt.Errorf("failed to marshal request: %v", err)
	}

	// Create the HTTP request
	url := fmt.Sprintf("%s/%s/join", s.config.BaseURL, s.config.AppID)
	fmt.Printf("URL: %s\n", url)
	httpReq, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
	if err != nil {
		return nil, fmt.Errorf("failed to create request: %v", err)
	}

	// Add headers
	httpReq.Header.Set("Content-Type", "application/json")
	httpReq.Header.Set("Authorization", s.getBasicAuth())

	// Send the request using a client with a timeout
	client := &http.Client{Timeout: 30 * time.Second}
	resp, err := client.Do(httpReq)
	if err != nil {
		return nil, fmt.Errorf("failed to send request: %v (URL: %s)", err, url)
	}
	defer resp.Body.Close()

	// Handle response
	if resp.StatusCode != http.StatusOK {
		body, _ := io.ReadAll(resp.Body)
		return nil, fmt.Errorf("failed to start conversation: status=%d, body=%s, url=%s, headers=%v",
			resp.StatusCode, string(body), url, httpReq.Header)
	}

	// Parse the response
	var agoraResp map[string]interface{}
	if err := json.NewDecoder(resp.Body).Decode(&agoraResp); err != nil {
		return nil, fmt.Errorf("failed to decode response: %v", err)
	}

	// Create the response
	response := &InviteAgentResponse{
		AgentID:  agoraResp["agent_id"].(string),
		CreateTS: time.Now().Unix(),
		Status:   "RUNNING",
	}

	return response, nil
}

// getRemoteRtcUIDs returns the appropriate RemoteRtcUIDs array based on the requesterID
func getRemoteRtcUIDs(requesterID string) []string {
	return []string{requesterID}
}

// Add this helper function
func randomString(n int) string {
	const letters = "abcdefghijklmnopqrstuvwxyz"
	b := make([]byte, n)
	rand.Read(b)
	for i := range b {
		b[i] = letters[int(b[i])%len(letters)]
	}
	return string(b)
}

Remove Agent Handler

After the agent joins the conversation, we need a way to remove them from the conversation. This is where the remove handler comes in. It takes the agentID and sends a request to the Agora’s Conversational AI Engine to remove the agent from the channel.

Create the file convoai_handler_remove.go :

touch convoai/convoai_handler_remove.go

Add the following:

package convoai

import (
 "fmt"
 "net/http"
 "time"
)

// HandleRemoveAgent processes the agent removal request
func (s *ConvoAIService) HandleRemoveAgent(req RemoveAgentRequest) (*RemoveAgentResponse, error) {
 // Create the HTTP request
 url := fmt.Sprintf("%s/%s/agents/%s/leave", s.config.BaseURL, s.config.AppID, req.AgentID)
 httpReq, err := http.NewRequest("POST", url, nil)
 if err != nil {
  return nil, fmt.Errorf("failed to create request: %v", err)
 }

 // Add headers
 auth := s.getBasicAuth()
 httpReq.Header.Set("Content-Type", "application/json")
 httpReq.Header.Set("Authorization", auth)
 // Send the request using a client with a timeout
 client := &http.Client{Timeout: 10 * time.Second}
 resp, err := client.Do(httpReq)
 if err != nil {
  return nil, fmt.Errorf("failed to send request: %v", err)
 }
 defer resp.Body.Close()
 if resp.StatusCode != http.StatusOK {
  return nil, fmt.Errorf("failed to remove agent: %d", resp.StatusCode)
 }

 // Return success response
 response := &RemoveAgentResponse{
  Success: true,
  AgentID: req.AgentID,
 }
 return response, nil
}

Utility Functions

In both the invite and remove routes, we need to use BasicAuthorization in the headers of our requests, so we’ll set up a utility function to handle this.

Another utility we need to build is the getTTSConfig. I need to call out, because normally you would have a single TTS config. For demo purposes, I've built it this way to show how to implement the configs for all TTS vendors supported by Agora's Convo AI Engine.

Create the file convoai-utils.go:

touch convoai/convoai-utils.go

Add the following content:

package convoai

import (
 "encoding/base64"
 "errors"
 "fmt"
 "strconv"
)

func (s *ConvoAIService) getBasicAuth() string {
 auth := fmt.Sprintf("%s:%s", s.config.CustomerID, s.config.CustomerSecret)
 return "Basic " + base64.StdEncoding.EncodeToString([]byte(auth))
}

// Helper function to check if the string is purely numeric (false) or contains any non-digit characters (true)
func isStringUID(s string) bool {
 for _, r := range s {
  if r < '0' || r > '9' {
   return true // Contains non-digit character
  }
 }
 return false // Contains only digits
}

// getTTSConfig returns the appropriate TTS configuration based on the configured vendor
func (s *ConvoAIService) getTTSConfig() (*TTSConfig, error) {
 switch s.config.TTSVendor {
 case string(TTSVendorMicrosoft):
  if s.config.MicrosoftTTS == nil ||
   s.config.MicrosoftTTS.Key == "" ||
   s.config.MicrosoftTTS.Region == "" ||
   s.config.MicrosoftTTS.VoiceName == "" ||
   s.config.MicrosoftTTS.Rate == "" ||
   s.config.MicrosoftTTS.Volume == "" {
   return nil, fmt.Errorf("missing Microsoft TTS configuration")
  }
  // Convert rate and volume from string to float64
  rate, err := strconv.ParseFloat(s.config.MicrosoftTTS.Rate, 64)
  if err != nil {
   return nil, fmt.Errorf("invalid rate value: %v", err)
  }
  volume, err := strconv.ParseFloat(s.config.MicrosoftTTS.Volume, 64)
  if err != nil {
   return nil, fmt.Errorf("invalid volume value: %v", err)
  }
  return &TTSConfig{
   Vendor: TTSVendorMicrosoft,
   Params: map[string]interface{}{
    "key":        s.config.MicrosoftTTS.Key,
    "region":     s.config.MicrosoftTTS.Region,
    "voice_name": s.config.MicrosoftTTS.VoiceName,
    "rate":       rate,
    "volume":     volume,
   },
  }, nil
 case string(TTSVendorElevenLabs):
  if s.config.ElevenLabsTTS == nil ||
   s.config.ElevenLabsTTS.Key == "" ||
   s.config.ElevenLabsTTS.ModelID == "" ||
   s.config.ElevenLabsTTS.VoiceID == "" {
   return nil, fmt.Errorf("missing ElevenLabs TTS configuration")
  }
  return &TTSConfig{
   Vendor: TTSVendorElevenLabs,
   Params: map[string]interface{}{
    "api_key":  s.config.ElevenLabsTTS.Key,
    "model_id": s.config.ElevenLabsTTS.ModelID,
    "voice_id": s.config.ElevenLabsTTS.VoiceID,
   },
  }, nil
 default:
  return nil, fmt.Errorf("unsupported TTS vendor: %s", s.config.TTSVendor)
 }
}

// validateInviteRequest validates the invite agent request
func (s *ConvoAIService) validateInviteRequest(req *InviteAgentRequest) error {
 if req.RequesterID == "" {
  return errors.New("requester_id is required")
 }
 if req.ChannelName == "" {
  return errors.New("channel_name is required")
 }

 // Validate channel_name length
 if len(req.ChannelName) < 3 || len(req.ChannelName) > 64 {
  return errors.New("channel_name length must be between 3 and 64 characters")
 }
 return nil
}

// validateRemoveRequest validates the remove agent request
func (s *ConvoAIService) validateRemoveRequest(req *RemoveAgentRequest) error {
 if req.AgentID == "" {
  return errors.New("agent_id is required")
 }
 return nil
}

HTTP Headers

To handle all header-related logic, create the httpHeaders.go file:

touch http_headers/httpHeaders.go

Add the following content:

package http_headers

import (
 "net/http"
 "strings"
 "time"
 "github.com/gin-gonic/gin"
)

// HttpHeaders holds configurations for handling requests, such as CORS settings.
type HttpHeaders struct {
 AllowOrigin string // List of origins allowed to access the resources.
}

// NewHttpHeaders initializes and returns a new Middleware object with specified CORS settings.
func NewHttpHeaders(allowOrigin string) *HttpHeaders {
 return &HttpHeaders{AllowOrigin: allowOrigin}
}

// NoCache sets HTTP headers to prevent client-side caching of responses.
func (m *HttpHeaders) NoCache() gin.HandlerFunc {
 return func(c *gin.Context) {
  // Set multiple cache-related headers to ensure responses are not cached.
  c.Header("Cache-Control", "private, no-cache, no-store, must-revalidate")
  c.Header("Expires", "-1")
  c.Header("Pragma", "no-cache")
 }
}

// CORShttpHeaders adds CORS (Cross-Origin Resource Sharing) headers to responses and handles pre-flight requests.
// It allows web applications at different domains to interact more securely.
func (m *HttpHeaders) CORShttpHeaders() gin.HandlerFunc {
 return func(c *gin.Context) {
  origin := c.Request.Header.Get("Origin")
  // Check if the origin of the request is allowed to access the resource.
  if !m.isOriginAllowed(origin) {
   // If not allowed, return a JSON error and abort the request.
   c.Header("Content-Type", "application/json")
   c.JSON(http.StatusForbidden, gin.H{
    "error": "Origin not allowed",
   })
   c.Abort()
   return
  }

  // Set CORS headers to allow requests from the specified origin.
  c.Header("Access-Control-Allow-Origin", origin)
  c.Header("Access-Control-Allow-Methods", "GET, POST, DELETE, PATCH, OPTIONS")
  c.Header("Access-Control-Allow-Headers", "Origin, Content-Type")
  // Handle pre-flight OPTIONS requests.
  if c.Request.Method == "OPTIONS" {
   c.AbortWithStatus(http.StatusNoContent)
   return
  }
  c.Next()
 }
}

// isOriginAllowed checks whether the provided origin is in the list of allowed origins.
func (m *HttpHeaders) isOriginAllowed(origin string) bool {
 if m.AllowOrigin == "*" {
  // Allow any origin if the configured setting is "*".
  return true
 }
 allowedOrigins := strings.Split(m.AllowOrigin, ",")
 for _, allowed := range allowedOrigins {
  if origin == allowed {
   return true
  }
 }
 return false
}

// Timestamp adds a timestamp header to responses.
// This can be useful for debugging and logging purposes to track when a response was generated.
func (m *HttpHeaders) Timestamp() gin.HandlerFunc {
 return func(c *gin.Context) {
  c.Next() // Proceed to the next middleware/handler.
  // Add the current timestamp to the response header after handling the request.
  timestamp := time.Now().Format(time.RFC3339)
  c.Writer.Header().Set("X-Timestamp", timestamp)
 }
}

Update Main Server

Let’s update our main main.go file to add our headers and register the convoai-service.

Open the cmd/main.go and add:

import(
// Previous imports remain the same
	"github.com/AgoraIO-Community/convo-ai-go-server/convoai"
	"github.com/AgoraIO-Community/convo-ai-go-server/http_headers"
);

// Previous code remains the same..
func setupServer() *http.Server {
    // Previous code remains the same..
    // Set up router with headers
    router := gin.Default()
    // Replace headers TODO:
    var httpHeaders = http_headers.NewHttpHeaders(corsAllowOrigin)
    router.Use(httpHeaders.NoCache())
    router.Use(httpHeaders.CORShttpHeaders())
    router.Use(httpHeaders.Timestamp())

    // Initialize services & register routes
    tokenService := token_service.NewTokenService(config.AppID, config.AppCertificate)
    tokenService.RegisterRoutes(router)
    convoAIService := convoai.NewConvoAIService(config, tokenService)
    convoAIService.RegisterRoutes(router)

// Rest of the code remains the same...

By now, you’ve noticed that we added a token service that doesn’t exist. Ignore the error for now because in the next step, we’ll implement the token service, which will make it easier to test and integrate with front-end applications.

Token Generation

In the convoai-service we use a token service. While you could tie this to your auth service and have it generate the tokens. For this guide, we'll implement a token service for both the convoai-service and our client apps if needed.

Explaining this code is a bit outside the scope of this guide, but if you are new to tokens, I would recommend checking out my guide Building a Token Server for Agora Applications using Golang.

Token Service

Create the token service and handler files:

touch token_service/token-service.go
touch token_service/token_handlers.go

First, add the token service definition in token-service.go:

package token_service

import (
 "encoding/json"
 "net/http"
 "os"
 "github.com/gin-gonic/gin"
)

// TokenService represents the main application token service.
type TokenService struct {
 Server         *http.Server   // The HTTP server for the application
 Sigint         chan os.Signal // Channel to handle OS signals, such as Ctrl+C
 appID          string         // The Agora app ID
 appCertificate string         // The Agora app certificate
}

// TokenRequest is a struct representing the JSON payload structure for token generation requests.
type TokenRequest struct {
 TokenType         string `json:"tokenType"`         // The token type: "rtc", "rtm", or "chat"
 Channel           string `json:"channel,omitempty"` // The channel name (used for RTC and RTM tokens)
 RtcRole           string `json:"role,omitempty"`    // The role of the user for RTC tokens (publisher or subscriber)
 Uid               string `json:"uid,omitempty"`     // The user ID or account (used for RTC, RTM, and some chat tokens)
 ExpirationSeconds int    `json:"expire,omitempty"`  // The token expiration time in seconds (used for all token types)
}

// NewTokenService initializes and returns a TokenService pointer with all configurations set.
func NewTokenService(appIDEnv string, appCertEnv string) *TokenService {
 return &TokenService{
  appID:          appIDEnv,
  appCertificate: appCertEnv,
 }
}

// RegisterRoutes registers the routes for the TokenService.
func (s *TokenService) RegisterRoutes(r *gin.Engine) {
 api := r.Group("/token")
 api.POST("/getNew", s.GetToken)
}

// GetToken handles the HTTP request to generate a token based on the provided TokenRequest.
func (s *TokenService) GetToken(c *gin.Context) {
 var req = c.Request
 var respWriter = c.Writer
 var tokenReq TokenRequest
 // Parse the request body into a TokenRequest struct
 err := json.NewDecoder(req.Body).Decode(&tokenReq)
 if err != nil {
  http.Error(respWriter, err.Error(), http.StatusBadRequest)
  return
 }
 s.HandleGetToken(tokenReq, respWriter)
}

Next, add the token handlers in token_handlers.go:

package token_service

import (
 "encoding/json"
 "errors"
 "net/http"
 "strconv"
 "github.com/AgoraIO-Community/go-tokenbuilder/chatTokenBuilder"
 rtctokenbuilder2 "github.com/AgoraIO-Community/go-tokenbuilder/rtctokenbuilder"
 rtmtokenbuilder2 "github.com/AgoraIO-Community/go-tokenbuilder/rtmtokenbuilder"
)

// HandleGetToken handles the HTTP request to generate a token based on the provided tokenType.
func (s *TokenService) HandleGetToken(tokenReq TokenRequest, w http.ResponseWriter) {
 var token string
 var tokenErr error
 switch tokenReq.TokenType {
 case "rtc":
  token, tokenErr = s.GenRtcToken(tokenReq)
 case "rtm":
  token, tokenErr = s.GenRtmToken(tokenReq)
 case "chat":
  token, tokenErr = s.GenChatToken(tokenReq)
 default:
  http.Error(w, "Unsupported tokenType", http.StatusBadRequest)
  return
 }
 if tokenErr != nil {
  http.Error(w, tokenErr.Error(), http.StatusBadRequest)
  return
 }
 response := struct {
  Token string `json:"token"`
 }{Token: token}
 w.Header().Set("Content-Type", "application/json")
 w.WriteHeader(http.StatusOK)
 json.NewEncoder(w).Encode(response)
}

// GenRtcToken generates an RTC token based on the provided TokenRequest and returns it.
func (s *TokenService) GenRtcToken(tokenRequest TokenRequest) (string, error) {
 if tokenRequest.Channel == "" {
  return "", errors.New("invalid: missing channel name")
 }
 if tokenRequest.Uid == "" {
  return "", errors.New("invalid: missing user ID or account")
 }
 var userRole rtctokenbuilder2.Role
 if tokenRequest.RtcRole == "publisher" {
  userRole = rtctokenbuilder2.RolePublisher
 } else {
  userRole = rtctokenbuilder2.RoleSubscriber
 }
 if tokenRequest.ExpirationSeconds == 0 {
  tokenRequest.ExpirationSeconds = 3600
 }
 uid64, parseErr := strconv.ParseUint(tokenRequest.Uid, 10, 64)
 if parseErr != nil {
  return rtctokenbuilder2.BuildTokenWithAccount(
   s.appID, s.appCertificate, tokenRequest.Channel,
   tokenRequest.Uid, userRole, uint32(tokenRequest.ExpirationSeconds),
  )
 }
 return rtctokenbuilder2.BuildTokenWithUid(
  s.appID, s.appCertificate, tokenRequest.Channel,
  uint32(uid64), userRole, uint32(tokenRequest.ExpirationSeconds),
 )
}

// GenRtmToken generates an RTM (Real-Time Messaging) token based on the provided TokenRequest and returns it.
func (s *TokenService) GenRtmToken(tokenRequest TokenRequest) (string, error) {
 if tokenRequest.Uid == "" {
  return "", errors.New("invalid: missing user ID or account")
 }
 if tokenRequest.ExpirationSeconds == 0 {
  tokenRequest.ExpirationSeconds = 3600
 }
 return rtmtokenbuilder2.BuildToken(
  s.appID, s.appCertificate,
  tokenRequest.Uid,
  uint32(tokenRequest.ExpirationSeconds),
  tokenRequest.Channel,
 )
}

// GenChatToken generates a chat token based on the provided TokenRequest and returns it.
func (s *TokenService) GenChatToken(tokenRequest TokenRequest) (string, error) {
 if tokenRequest.ExpirationSeconds == 0 {
  tokenRequest.ExpirationSeconds = 3600
 }
 var chatToken string
 var tokenErr error
 if tokenRequest.Uid == "" {
  chatToken, tokenErr = chatTokenBuilder.BuildChatAppToken(
   s.appID, s.appCertificate, uint32(tokenRequest.ExpirationSeconds),
  )
 } else {
  chatToken, tokenErr = chatTokenBuilder.BuildChatUserToken(
   s.appID, s.appCertificate,
   tokenRequest.Uid,
   uint32(tokenRequest.ExpirationSeconds),
  )
 }
 return chatToken, tokenErr
}

With the token generation in place, let’s add some validation middleware to ensure our API is robust and secure.

Environment Validation

Create a validation utility to check that all required environment variables are set. Create the file validation/validation.go :

touch validation/validation.go

Add the following content:

package validation

import (
 "errors"
 "strings"
 "github.com/AgoraIO-Community/convo-ai-go-server/convoai"
)

// ValidateEnvironment checks if all required environment variables are set
func ValidateEnvironment(config *convoai.ConvoAIConfig) error {

 // Validate Agora Configuration
 if config.AppID == "" || config.AppCertificate == "" {
  return errors.New("config error: Agora credentials (APP_ID, APP_CERTIFICATE) are not set")
 }

 if config.CustomerID == "" || config.CustomerSecret == "" || config.BaseURL == "" {
  return errors.New("config error: Agora Conversation AI credentials (CUSTOMER_ID, CUSTOMER_SECRET, BASE_URL) are not set")
 }

 // Validate LLM Configuration
 if config.LLMURL == "" || config.LLMToken == "" {
  return errors.New("config error: LLM configuration (LLM_URL, LLM_TOKEN) is not set")
 }

 // Validate TTS Configuration
 if config.TTSVendor == "" {
  return errors.New("config error: TTS_VENDOR is not set")
 }

 if err := validateTTSConfig(config); err != nil {
  return err
 }

 // Validate Modalities (optional, using defaults if not set)
 if config.InputModalities != "" && !validateModalities(config.InputModalities) {
  return errors.New("config error: Invalid INPUT_MODALITIES format")
 }

 if config.OutputModalities != "" && !validateModalities(config.OutputModalities) {
  return errors.New("config error: Invalid OUTPUT_MODALITIES format")
 }
 return nil
}

// Validates the TTS configuration based on the vendor
func validateTTSConfig(config *convoai.ConvoAIConfig) error {
 switch config.TTSVendor {
 case "microsoft":
  if config.MicrosoftTTS == nil {
   return errors.New("config error: Microsoft TTS configuration is missing")
  }
  if config.MicrosoftTTS.Key == "" ||
   config.MicrosoftTTS.Region == "" ||
   config.MicrosoftTTS.VoiceName == "" {
   return errors.New("config error: Microsoft TTS configuration is incomplete")
  }
 case "elevenlabs":
  if config.ElevenLabsTTS == nil {
   return errors.New("config error: ElevenLabs TTS configuration is missing")
  }
  if config.ElevenLabsTTS.Key == "" ||
   config.ElevenLabsTTS.VoiceID == "" ||
   config.ElevenLabsTTS.ModelID == "" {
   return errors.New("config error: ElevenLabs TTS configuration is incomplete")
  }
 default:
  return errors.New("config error: Unsupported TTS vendor: " + config.TTSVendor)
 }
 return nil
}

// Checks if the modalities string is properly formatted
func validateModalities(modalities string) bool {
 // map of valid modalities
 validModalities := map[string]bool{
  "text":  true,
  "audio": true,
 }

 // split the modalities string and check if each modality is valid
 for _, modality := range strings.Split(modalities, ",") {
  if !validModalities[strings.TrimSpace(modality)] {
   return false
  }
 }
 return true
}

This validation utility ensures that all required environment variables are properly set before the server starts.

Open the main.go and update the setupServer function to use the validation utility:

// Just below: Load configuration
// Replace the TODO: comment with the following:
    // Validate environment configuration
    if err := validation.ValidateEnvironment(config); err != nil {
        log.Fatal("FATAL ERROR: ", err)
    }
// Rest of the code remains the same...

Running the Server

Now that we have all the components in place, let’s run the server. First, make sure you have set up the .env file with all the necessary credentials. The server will automatically load these environment variables at startup.

Build and run the server:

go build -o server
./server

If you’ve set up everything correctly, you should see the server starting up and listening on the configured port (default is 8080).

Testing the Server

Before testing the endpoints, make sure you have a client-side app running. You can use any application that implements Agora’s video SDK (web, mobile, or desktop). If you don’t have an app, you can use Agora’s Voice Demo, just make sure to make a token request before joining the channel.

Let’s test our API endpoints using curl:

1. Generate a Token

curl -X POST http://localhost:8080/token/getNew \
  -H "Content-Type: application/json" \
  -d '{
    "tokenType": "rtc",
    "channel": "test-channel",
    "uid": "1234",
    "role": "publisher"
  }'

Expected response:

{
  "token": "007eJxTYBAxNdgrlvnEfm3o..."
}

2. Invite an AI Agent

curl -X POST http://localhost:8080/agent/invite \
  -H "Content-Type: application/json" \
  -d '{
    "requester_id": "1234",
    "channel_name": "test-channel",
    "input_modalities": ["text"],
    "output_modalities": ["text", "audio"]
  }'

Expected response:

{
  "agent_id": "agent-123abc",
  "create_ts": 1665481725000,
  "status": "RUNNING"
}

3. Remove an AI Agent

curl -X POST http://localhost:8080/agent/remove \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agent-123abc"
  }'

Expected response:

{
  "success": true,
  "agent_id": "agent-123abc"
}

Customizations

Agora Conversational AI Engine supports several customizations.

Customizing the Agent

In the convoai_handler_invite.go file, you can modify the system message to customize the agent's behavior:

systemMessage := SystemMessage{
    Role:    "system",
    Content: "You are a technical support specialist named Alex. Your responses should be friendly but concise, focused on helping users solve their technical problems. Use simple language but don't oversimplify technical concepts.",
}

You can also update the greeting message to control the initial message the agent speaks when joining the channel:

LLM: LLM{
    // ... other configurations
    GreetingMessage: "Hello! I'm Alex, your technical support specialist. How can I assist you today?",
    FailureMessage:  "I'm processing your request. Please give me a moment.",
    // ... rest of the configuration
}

Customizing Speech Synthesis

Choose the right voice for your application by exploring the voice libraries:

For Microsoft Azure TTS: Visit the Microsoft Azure TTS Voice Gallery
For ElevenLabs TTS: Explore the ElevenLabs Voice Library

Update the .env file with the appropriate voice settings.

Fine-tuning Voice Activity Detection

Adjust VAD settings in convoai_handler_invite.go to optimize conversation flow:

VAD: VAD{
    SilenceDurationMS:   600,      // How long to wait after silence to end turn
    SpeechDurationMS:    10000,     // Maximum duration for a single speech segment
    Threshold:           0.6,       // Speech detection sensitivity
    InterruptDurationMS: 200,       // How quickly interruptions are detected
    PrefixPaddingMS:     400,       // Audio padding at the beginning of speech
},

Complete Environment Variables Reference

Here’s a complete list of environment variables for your .env file:

# Server Configuration
PORT=8080
CORS_ALLOW_ORIGIN=*

# Agora Configuration
AGORA_APP_ID=your_app_id
AGORA_APP_CERTIFICATE=your_app_certificate
AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects
AGORA_CUSTOMER_ID=your_customer_id
AGORA_CUSTOMER_SECRET=your_customer_secret
AGENT_UID=Agent

# LLM Configuration
LLM_URL=https://api.openai.com/v1/chat/completions
LLM_TOKEN=your_openai_api_key
LLM_MODEL=gpt-4o-mini

# Input/Output Modalities
INPUT_MODALITIES=text
OUTPUT_MODALITIES=text,audio

# TTS Configuration
TTS_VENDOR=microsoft  # or elevenlabs
# Microsoft TTS Configuration
MICROSOFT_TTS_KEY=your_microsoft_tts_key
MICROSOFT_TTS_REGION=your_microsoft_tts_region
MICROSOFT_TTS_VOICE_NAME=en-US-GuyNeural
MICROSOFT_TTS_RATE=1.0
MICROSOFT_TTS_VOLUME=100.0

# ElevenLabs TTS Configuration
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id
ELEVENLABS_MODEL_ID=eleven_monolingual_v1

Next Steps

Congratulations! You’ve built a Go server that integrates with Agora’s Conversational AI Engine. Take this microservice and integrate it with your existing Agora backends.

For more information about Agora’s Conversational AI Engine check out the official documentation.

For the full source code, check out the GitHub repository.

Happy building!

Learn more about Agora's video and voice solutions

Ready to chat through your real-time video and voice needs? We're here to help! Current Twilio customers get up to 2 months FREE.

Complete the form, and one of our experts will be in touch.

Try Agora for Free

Try for Free

TEN

App Builder

Flexible Classroom

Download SDKs

Support Plans and Pricing

Build an Agora Conversational AI Service using Golang

Prerequisites

Project Setup

Server Entry Point

Type Definitions

ConvoAI Service

Invite Agent Handler

Remove Agent Handler

Utility Functions

HTTP Headers

Update Main Server

Token Generation

Token Service

Environment Validation

Running the Server

Testing the Server

1. Generate a Token

2. Invite an AI Agent

3. Remove an AI Agent

Customizations

Customizing the Agent

Customizing Speech Synthesis

Fine-tuning Voice Activity Detection

Complete Environment Variables Reference

Next Steps

Learn more about Agora's video and voice solutions

Try Agora for Free