대화형 AI는 사람들이 인공지능과 상호작용하는 방식을 혁신하고 있습니다. 사용자는 단일 텍스트 프롬프트로 모든 것을 설명하려고 시도하는 대신, AI 에이전트와 자연스럽고 실시간 음성 대화를 나눌 수 있습니다. 이는 직관적이고 효율적인 상호작용을 위한 흥미로운 기회를 열어줍니다.

많은 개발자들은 텍스트 기반 에이전트를 위해 맞춤형 LLM 워크플로우를 구축하는 데 상당한 시간을 투자해 왔습니다. Agora의 대화형 AI 엔진은 이러한 기존 워크플로우를 Agora 채널에 연결하여 현재 AI 인프라를 포기하지 않고도 실시간 음성 대화를 가능하게 합니다.

이 가이드에서는 사용자와 아고라의 대화형 AI 사이의 연결을 처리하는 Fastify 백엔드 서버를 구축합니다. 완료 시에는 애플리케이션에 음성 기반 AI 대화를 지원할 수 있는 생산 환경에 적합한 백엔드를 갖게 됩니다.

사전 요구 사항

시작하기 전에 다음을 확인하세요:

Node.js (v18 이상)
TypeScript 및 Fastify에 대한 기본 지식
Agora 계정 - 매월 첫 10,000분 무료
AppID에 활성화된 Conversational AI 서비스

프로젝트 설정

TypeScript를 사용하여 Fastify 서버를 설정해 보겠습니다. 새로운 프로젝트를 생성하고 필요한 의존성을 설치합니다.

먼저 새로운 디렉토리를 생성하고 프로젝트를 초기화합니다:

mkdir agora-convo-ai-server
cd agora-convo-ai-server
npm init -y

다음으로 필요한 종속성을 설치합니다:

npm install fastify @fastify/cors dotenv agora-token
npm install --save-dev typescript ts-node nodemon @types/node

프로젝트에서 TypeScript를 초기화합니다:

루트 디렉토리에 tsconfig.json 파일을 생성합니다:

{
  "compilerOptions": {
    "target": "ES2020",
    "module": "commonjs",
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true
  },
  "include": ["src/**/*"],
  "exclude": ["node_modules"]
}

이제 package.json 파일의 scripts 섹션을 업데이트하여 Typescript를 사용하도록 설정하세요:

"scripts": {
  "start": "node dist/index.js",
  "dev": "ts-node src/index.ts",
  "build": "tsc"
}

이 가이드를 진행하면서 특정 디렉토리에 새로운 파일을 생성해야 합니다. 따라서 시작하기 전에 먼저 이 디렉토리를 생성해 보겠습니다.

프로젝트 루트 디렉토리에서 src/routes/, components/, 및 types/ 디렉토리를 생성하고, .env 파일을 추가합니다:

mkdir -p src/routes src/types src/utils
touch .env

Your project directory should now have a structure like this:

├── node_modules/ ├── src/ │ ├── routes/ │ ├── types/ │ └── utils/ ├── .env ├── package.json ├── package-lock.json ├── tsconfig.json

Fastify 서버 설정

Fastify 인스턴스의 서버 진입점을 구현하고 기본 건강 점검 엔드포인트를 포함합니다.

src/index.ts 파일을 생성합니다.:

touch src/index.ts

현재는 기본적인 Fastify 앱을 생성하고, 가이드를 진행하면서 점차 기능을 추가해 나갈 것입니다. 코드 곳곳에 설명을 추가해 두었으니 참고하시기 바랍니다.

전체적으로 보면, 새로운 Fastify 앱을 설정하고 간단한 라우터 구조를 통해 요청을 처리하도록 구성합니다. 건강 점검을 위해 사용할 수 있는 ping 엔드포인트를 생성합니다.

다음 코드를 src/index.ts에 추가하세요:

import Fastify from 'fastify';
import cors from '@fastify/cors';
import dotenv from 'dotenv';

// Load environment variables
dotenv.config();

// Create Fastify instance
const fastify = Fastify({
  logger: true,
});

// Register CORS to allow cross-origin requests (important for web clients)
fastify.register(cors, {
  origin: '*', // In production, restrict this to your frontend domain
});

// Define a health check route
fastify.get('/ping', async () => {
  return { message: 'pong' };
});

// Start the server
const start = async () => {
  try {
    // Use provided port or default to 3030
    const port = parseInt(process.env.PORT || '3030');

    await fastify.listen({ port, host: '0.0.0.0' });
    console.log(`Server is running on port ${port}`);
  } catch (err) {
    fastify.log.error(err);
    process.exit(1);
  }
};

start();

참고: 환경 변수에서 PORT를 로드하고 있으며, .env 파일에 설정되지 않은 경우 기본값은 3030입니다.

기본 Fastify 앱을 테스트하려면 다음을 실행하세요:

npm run dev

콘솔에 “서버가 포트 3030에서 실행 중입니다”라는 메시지가 표시되어야 합니다. 이제 http://localhost:3030/ping로 이동하여 서버가 정상적으로 작동하는지 확인할 수 있습니다.

아고라 대화형 AI 경로

우리 서버의 진정한 힘은 Agora 대화형 AI 통합에서 나옵니다. 먼저 지루한 부분을 처리해 보겠습니다. Agora의 대화형 AI API와 함께 작업하기 위해 필요한 유형의 파일을 생성합니다:

touch src/types/agora-convo-ai-types.ts

다음 인터페이스를 src/types/agora-convo-ai-types.ts에 추가하세요::

export enum TTSVendor {
  Microsoft = 'microsoft',
  ElevenLabs = 'elevenlabs',
}

// Response from Agora when adding an agent to a channel
export interface AgentResponse {
  agent_id: string;
  create_ts: number;
  state: string;
}

// Request body for Agora's API to join a conversation
export interface AgoraStartRequest {
  name: string;
  properties: {
    channel: string;
    token: string;
    agent_rtc_uid: string;
    remote_rtc_uids: string[];
    enable_string_uid?: boolean;
    idle_timeout?: number;
    advanced_features?: {
      enable_aivad?: boolean;
      enable_bhvs?: boolean;
    };
    asr: {
      language: string;
      task?: string;
    };
    llm: {
      url?: string;
      api_key?: string;
      system_messages: Array<{
        role: string;
        content: string;
      }>;
      greeting_message: string;
      failure_message: string;
      max_history?: number;
      input_modalities?: string[];
      output_modalities?: string[];
      params: {
        model: string;
        max_tokens: number;
        temperature?: number;
        top_p?: number;
      };
    };
    vad: {
      silence_duration_ms: number;
      speech_duration_ms?: number;
      threshold?: number;
      interrupt_duration_ms?: number;
      prefix_padding_ms?: number;
    };
    tts: TTSConfig;
  };
}

export interface TTSConfig {
  vendor: TTSVendor;
  params: MicrosoftTTSParams | ElevenLabsTTSParams;
}

interface MicrosoftTTSParams {
  key: string;
  region: string;
  voice_name: string;
  rate?: number;
  volume?: number;
}

interface ElevenLabsTTSParams {
  key: string;
  voice_id: string;
  model_id: string;
}

export interface AgoraTokenData {
  token: string;
  uid: string;
  channel: string;
  agentId?: string;
}

이제 클라이언트 요청 유형을 정의해 보겠습니다.

client-request-types.ts를 생성합니다:

touch src/types/client-request-types.ts

다음 인터페이스를 src/types/client-request-types.ts에 추가하십시오::

export interface InviteAgentRequest {
  requester_id: string | number;
  channel_name: string;
  input_modalities?: string[];
  output_modalities?: string[];
}

export interface RemoveAgentRequest {
  agent_id: string;
}

이 새로운 유형들은 다음 단계에서 조립할 모든 구성 요소에 대한 이해를 제공합니다. 고객 요청을 받아 AgoraStartRequest를 구성하고 Agora의 대화형 AI 엔진으로 전송합니다. Agora의 Convo AI 엔진은 에이전트를 대화로 추가합니다.

에이전트 경로

유형이 정의되었으니, 대화에서 에이전트를 초대하고 제거하는 에이전트 경로를 구현해 보겠습니다.

agent 경로를 생성합니다:

touch src/routes/agent.ts

먼저 express, 새로운 유형 및 agora-token 라이브러리를 가져옵니다. 에이전트용 토큰을 생성해야 하기 때문입니다. 그 다음 agentRoutes` 함수를 정의합니다.

import { FastifyInstance, FastifyRequest, FastifyReply } from 'fastify';
import { RtcTokenBuilder, RtcRole } from 'agora-token';
import {
  AgoraStartRequest,
  TTSConfig,
  TTSVendor,
  AgentResponse,
} from '../types/agora-convo-ai-types';
import {
  InviteAgentRequest,
  RemoveAgentRequest,
} from '../types/client-request-types';

/**
 * Registers agent-related routes for Agora Conversational AI
 */
export async function agentRoutes(fastify: FastifyInstance) {
  // TODO: Add the routes here
}

에이전트 초대 경로

먼저 /agent/invite 엔드포인트를 구현합니다. 이 경로는 다음과 같은 주요 작업을 처리해야 합니다:

사용자 요청을 파싱하고 이를 사용하여 아고라의 대화형 AI 엔진에 대한 시작 요청을 생성합니다.
AI 에이전트가 RTC 채널에 액세스하기 위한 토큰을 생성합니다.
텍스트-투-스피치(Microsoft 또는 ElevenLabs)를 구성합니다.
AI 에이전트의 프롬프트 및 인사 메시지를 정의합니다.
음성 활동 감지(VAD)를 구성하여 대화 흐름을 제어합니다.
아고라의 대화형 AI 엔진으로 시작 요청을 전송합니다.
아고라의 대화형 AI 엔진 응답에 포함된 에이전트 ID를 포함한 응답을 클라이언트에 반환합니다.

다음 코드를 agentRoutes 함수에 추가합니다:

/**
 * POST /agent/invite
 * Invites an AI agent to join a specified channel
 */
fastify.post<{
  Body: InviteAgentRequest;
}>('/invite', async (request, reply) => {
  try {
    // Extract request parameters
    const {
      requester_id,
      channel_name,
      input_modalities = ['text'],
      output_modalities = ['text', 'audio'],
    } = request.body;

    // Validate environment variables
    if (!process.env.AGORA_APP_ID || !process.env.AGORA_APP_CERTIFICATE) {
      return reply
        .code(500)
        .send({ error: 'Agora credentials not configured' });
    }

    if (
      !process.env.AGORA_CONVO_AI_BASE_URL ||
      !process.env.AGORA_CUSTOMER_ID ||
      !process.env.AGORA_CUSTOMER_SECRET
    ) {
      return reply.code(500).send({
        error: 'Agora Conversational AI credentials not configured',
      });
    }

    // Configure agent and generate token
    const agentUid = process.env.AGENT_UID || 'Agent';
    const timestamp = Date.now();
    const expirationTime = Math.floor(timestamp / 1000) + 3600; // 1 hour

    // Generate a token for the agent to join the channel
    const token = RtcTokenBuilder.buildTokenWithUid(
      process.env.AGORA_APP_ID,
      process.env.AGORA_APP_CERTIFICATE,
      channel_name,
      agentUid,
      RtcRole.PUBLISHER,
      expirationTime,
      expirationTime
    );

    // Configure TTS settings based on environment variables
    const ttsVendor =
      (process.env.TTS_VENDOR as TTSVendor) || TTSVendor.Microsoft;
    const ttsConfig = getTTSConfig(ttsVendor);

    // Convert requester_id to string (Agora API expects string values)
    const requesterUid = requester_id.toString();

    // Create a descriptive name for this conversation instance
    const random = Math.random().toString(36).substring(2, 8);
    const uniqueName = `conversation-${timestamp}-${random}`;

    // Prepare request for Agora's Conversational AI API
    const requestBody: AgoraStartRequest = {
      name: uniqueName,
      properties: {
        channel: channel_name,
        token: token,
        agent_rtc_uid: agentUid,
        remote_rtc_uids: [requesterUid],
        enable_string_uid: /[a-zA-Z]/.test(requesterUid),
        idle_timeout: 30,
        asr: {
          language: 'en-US',
          task: 'conversation',
        },
        llm: {
          url: process.env.LLM_URL,
          api_key: process.env.LLM_TOKEN,
          system_messages: [
            {
              role: 'system',
              content:
                'You are a helpful assistant. Speak naturally and concisely.',
            },
          ],
          greeting_message: 'Hello! How can I assist you today?',
          failure_message: 'Please wait a moment.',
          max_history: 10,
          params: {
            model: process.env.LLM_MODEL || 'gpt-3.5-turbo',
            max_tokens: 1024,
            temperature: 0.7,
            top_p: 0.95,
          },
          input_modalities: input_modalities,
          output_modalities: output_modalities,
        },
        tts: ttsConfig,
        vad: {
          silence_duration_ms: 480,
          speech_duration_ms: 15000,
          threshold: 0.5,
          interrupt_duration_ms: 160,
          prefix_padding_ms: 300,
        },
        // These advanced features require special account permissions
        advanced_features: {
          enable_aivad: false,
          enable_bhvs: false,
        },
      },
    };

    // Send request to Agora Conversational AI API
    const response = await fetch(
      `${process.env.AGORA_CONVO_AI_BASE_URL}/${process.env.AGORA_APP_ID}/join`,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Basic ${Buffer.from(
            `${process.env.AGORA_CUSTOMER_ID}:${process.env.AGORA_CUSTOMER_SECRET}`
          ).toString('base64')}`,
        },
        body: JSON.stringify(requestBody),
      }
    );

    // Handle API response
    if (!response.ok) {
      const errorText = await response.text();
      fastify.log.error(
        {
          status: response.status,
          body: errorText,
        },
        'Agora API error'
      );

      return reply.code(response.status).send({
        error: `Failed to start conversation: ${response.status}`,
        details: errorText,
      });
    }

    // Return successful response with agent details
    const data: AgentResponse = await response.json();
    return reply.send(data);
  } catch (error) {
    fastify.log.error('Error starting conversation:', error);
    return reply.code(500).send({
      error:
        error instanceof Error ? error.message : 'Failed to start conversation',
    });
  }
});

에이전트 제거

에이전트가 대화 참여 후 대화에서 제거하는 방법이 필요합니다. 이때 /agent/remove 경로가 사용됩니다. 이 경로는 에이전트 ID를 받아 Agora의 Conversational AI Engine에 요청을 전송하여 채널에서 에이전트를 제거합니다.

다음 코드를 agentRoutes 함수의 /invite 경로 바로 아래에 추가하세요:

/**
   * POST /agent/remove
   * Removes an AI agent from a conversation
   */
  fastify.post<{
    Body: RemoveAgentRequest;
  }>('/remove', async (request, reply) => {
    try {
      const { agent_id } = request.body;

      if (!agent_id) {
        return reply.code(400).send({ error: 'agent_id is required' });
      }

      // Validate Agora credentials
      if (
        !process.env.AGORA_CONVO_AI_BASE_URL ||
        !process.env.AGORA_APP_ID ||
        !process.env.AGORA_CUSTOMER_ID ||
        !process.env.AGORA_CUSTOMER_SECRET
      ) {
        return reply
          .code(500)
          .send({ error: 'Agora credentials not configured' });
      }

      // Create authentication for API request
      const authCredential = Buffer.from(
        `${process.env.AGORA_CUSTOMER_ID}:${process.env.AGORA_CUSTOMER_SECRET}`
      ).toString('base64');

      // Send request to Agora API to remove the agent
      const response = await fetch(
        `${process.env.AGORA_CONVO_AI_BASE_URL}/${process.env.AGORA_APP_ID}/agents/${agent_id}/leave`,
        {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            Authorization: `Basic ${authCredential}`,
          },
        }
      );

      if (!response.ok) {
        const errorText = await response.text();
        fastify.log.error(
          {
            status: response.status,
            body: errorText,
          },
          'Agora API error on agent removal'
        );

        return reply.code(response.status).send({
          error: `Failed to remove agent: ${response.status}`,
          details: errorText,
        });
      }

      return reply.send({ success: true });
    } catch (error) {
      fastify.log.error('Error removing agent:', error);
      return reply.code(500).send({
        error:
          error instanceof Error ? error.message : 'Failed to remove agent',
      });
    }
  });
}

헬퍼 함수

시작 루트에서 ttsConfig라는 변수를 사용합니다. 이 변수는 getTTSConfig를 호출합니다. 이 호출을 분리해야 하는 이유는 일반적으로 단일 TTS 구성만 사용하기 때문입니다. 데모 목적으로 모든 TTS 공급업체를 지원하는 Agora의 Convo AI 엔진에 대한 구성 구현 방법을 보여주기 위해 이 방식으로 구현했습니다.

/**
 * Helper function to generate TTS configuration based on vendor
 */
function getTTSConfig(vendor: TTSVendor): TTSConfig {
  if (vendor === TTSVendor.Microsoft) {
    // Validate Microsoft TTS configuration
    if (
      !process.env.MICROSOFT_TTS_KEY ||
      !process.env.MICROSOFT_TTS_REGION ||
      !process.env.MICROSOFT_TTS_VOICE_NAME
    ) {
      throw new Error('Microsoft TTS configuration missing');
    }

    return {
      vendor: TTSVendor.Microsoft,
      params: {
        key: process.env.MICROSOFT_TTS_KEY,
        region: process.env.MICROSOFT_TTS_REGION,
        voice_name: process.env.MICROSOFT_TTS_VOICE_NAME,
        rate: parseFloat(process.env.MICROSOFT_TTS_RATE || '1.0'),
        volume: parseFloat(process.env.MICROSOFT_TTS_VOLUME || '100.0'),
      },
    };
  }

  if (vendor === TTSVendor.ElevenLabs) {
    // Validate ElevenLabs TTS configuration
    if (
      !process.env.ELEVENLABS_API_KEY ||
      !process.env.ELEVENLABS_VOICE_ID ||
      !process.env.ELEVENLABS_MODEL_ID
    ) {
      throw new Error('ElevenLabs TTS configuration missing');
    }

    return {
      vendor: TTSVendor.ElevenLabs,
      params: {
        key: process.env.ELEVENLABS_API_KEY,
        voice_id: process.env.ELEVENLABS_VOICE_ID,
        model_id: process.env.ELEVENLABS_MODEL_ID,
      },
    };
  }

  throw new Error(`Unsupported TTS vendor: ${vendor}`);
}

agentRoutes 함수는 두 개의 주요 엔드포인트를 정의합니다:

POST /agent/invite: 지정된 채널에 AI 에이전트를 생성하고 추가합니다:
- 에이전트용 보안 토큰 생성
- TTS (Text-to-Speech) 설정 구성
- 시스템 메시지를 통해 AI의 행동 설정
- Agora의 Conversational AI API로 요청 전송
POST /agent/remove: 대화에서 AI 에이전트를 제거합니다. 다음과 같은 작업을 수행합니다:
- 요청에서 agent_id를 가져옵니다
- Agora의 API에 leave 요청을 전송합니다

참고: 에이전트 라우트는 여러 환경 변수를 로드합니다. 이 변수들을 .env 파일에 설정해야 합니다. 이 가이드의 마지막 부분에 설정해야 할 모든 환경 변수 목록을 포함했습니다.

메인 서버에 에이전트 라우트 추가

메인 index.ts 파일을 업데이트하여 에이전트 라우트를 등록합니다. src/index.ts 파일을 열고 다음을 추가합니다:

// Previous imports remain the same
import { agentRoutes } from './routes/agent';

// Previous code remains the same..

// Register routes
fastify.register(agentRoutes, { prefix: '/agent' }); // register the agent routes

// Rest of the code remains the same...

이제 대화형 AI의 핵심 기능이 정상적으로 작동합니다! 이제 토큰 생성 경로를 구현하여 프론트엔드 애플리케이션과의 테스트 및 통합을 더 쉽게 만들겠습니다.

토큰 생성

이 가이드의 목표는 기존 Agora 클라이언트 애플리케이션과 함께 작동하는 독립형 마이크로 서비스를 구축하는 것이므로, 완성도를 위해 토큰 생성 경로를 구현하겠습니다.

src/routes/token.ts에 새로운 파일을 생성합니다.:

touch src/routes/token.ts

이 코드를 설명하는 것은 이 가이드의 범위를 약간 벗어납니다. 토큰에 익숙하지 않다면 제 가이드 Agora 애플리케이션을 위한 토큰 서버 구축를 참고하시기 바랍니다.

토큰 경로의 독특한 요소 중 하나는 uid 또는 채널 이름이 제공되지 않을 경우 이 코드가 uid에 0을 사용하고 고유한 채널 이름을 생성한다는 점입니다. 채널 이름과 UID는 각 토큰과 함께 반환됩니다.

다음 코드를 src/routes/token.ts 파일에 추가하세요:

import { FastifyInstance } from 'fastify';
import { RtcTokenBuilder, RtcRole } from 'agora-token';

// Interface for token request parameters
interface TokenQuerystring {
  uid?: string;
  channel?: string;
}

/**
 * Registers routes for Agora token generation
 */
export async function tokenRoutes(fastify: FastifyInstance) {
  // Define validation schema for query parameters
  const schema = {
    querystring: {
      type: 'object',
      properties: {
        uid: { type: 'string', description: 'User ID (optional)' },
        channel: {
          type: 'string',
          pattern: '^[a-zA-Z0-9-]{1,64}$',
          description:
            'Channel name (optional, will be generated if not provided)',
        },
      },
    },
  };

  /**
   * GET /token - Generate an Agora token for RTC communication
   */
  fastify.get<{
    Querystring: TokenQuerystring;
  }>('/', {
    schema,
    handler: async (request, reply) => {
      fastify.log.info('Generating Agora token...');

      // Validate Agora credentials
      if (!process.env.AGORA_APP_ID || !process.env.AGORA_APP_CERTIFICATE) {
        return reply
          .code(500)
          .send({ error: 'Agora credentials not configured' });
      }

      // Get parameters from query or use defaults
      const { uid: uidStr, channel } = request.query;
      const uid = parseInt(uidStr || '0');
      const channelName = channel || generateChannelName();

      // Set token expiration (1 hour from now)
      const expirationTime = Math.floor(Date.now() / 1000) + 3600;

      try {
        fastify.log.info(
          `Building token with UID: ${uid}, Channel: ${channelName}`
        );

        // Generate the Agora token
        const token = RtcTokenBuilder.buildTokenWithUid(
          process.env.AGORA_APP_ID,
          process.env.AGORA_APP_CERTIFICATE,
          channelName,
          uid,
          RtcRole.PUBLISHER, // Allow publishing audio/video
          expirationTime,
          expirationTime
        );

        fastify.log.info('Token generated successfully');

        // Return token data to client
        return {
          token,
          uid: uid.toString(),
          channel: channelName,
        };
      } catch (error) {
        fastify.log.error('Error generating Agora token:', error);
        return reply.code(500).send({
          error: 'Failed to generate Agora token',
          details: error instanceof Error ? error.message : 'Unknown error',
        });
      }
    },
  });
}

/**
 * Generates a unique channel name
 * Format: ai-conversation-{timestamp}-{random}
 */
function generateChannelName(): string {
  const timestamp = Date.now();
  const random = Math.random().toString(36).substring(2, 8);
  return `ai-conversation-${timestamp}-${random}`;
}
```

Now, update the main `index.ts` file to register the token routes. Update `src/index.ts`:

```typescript
// Previous imports remain the same
import { tokenRoutes } from './routes/token';

// Previous code remains the same...

// Register routes
// - previous agent routes remain the same
fastify.register(tokenRoutes, { prefix: '/token' }); // register the token routes

// Rest of the code remains the same...

토큰 생성이 완료되었으니, API가 견고하고 안전하도록 일부 검증 미들웨어를 추가해 보겠습니다.

환경 변수 및 요청 검증

적절한 검증은 API가 올바르게 형식화된 요청을 수신하도록 보장하고 오류 발생을 방지합니다.

src/middlewares/validation.ts 파일에 파일을 생성합니다.:

touch src/middlewares/validation.ts

이 코드는 환경 변수가 올바르게 설정되었는지 확인하고, 모든 들어오는 요청이 우리가 정의한 클라이언트 요청 유형과 일치하는지 확인합니다.

다음 코드를 src/middlewares/validation.ts 파일에 추가하세요:

import { FastifyRequest, FastifyReply } from 'fastify';
import {
  InviteAgentRequest,
  RemoveAgentRequest,
} from '../types/client-request-types';

/**
 * Validates that required environment variables are configured
 * based on the route being accessed
 */
export async function validateEnvironment(
  request: FastifyRequest,
  reply: FastifyReply
) {
  // All routes need Agora credentials
  if (!process.env.AGORA_APP_ID || !process.env.AGORA_APP_CERTIFICATE) {
    request.log.error('Agora credentials are not set');
    return reply.code(500).send({ error: 'Agora credentials are not set' });
  }

  // Agent routes need additional validation
  if (request.url.startsWith('/agent')) {
    // Conversational AI credentials
    if (
      !process.env.AGORA_CONVO_AI_BASE_URL ||
      !process.env.AGORA_CUSTOMER_ID ||
      !process.env.AGORA_CUSTOMER_SECRET
    ) {
      request.log.error('Agora Conversation AI credentials are not set');
      return reply.code(500).send({
        error: 'Agora Conversation AI credentials are not set',
      });
    }

    // LLM configuration
    if (!process.env.LLM_URL || !process.env.LLM_TOKEN) {
      request.log.error('LLM configuration is not set');
      return reply.code(500).send({ error: 'LLM configuration is not set' });
    }

    // TTS configuration
    const ttsVendor = process.env.TTS_VENDOR;
    if (!ttsVendor) {
      request.log.error('TTS_VENDOR is not set');
      return reply.code(500).send({ error: 'TTS_VENDOR is not set' });
    }

    // Vendor-specific TTS validation
    if (ttsVendor === 'microsoft') {
      if (
        !process.env.MICROSOFT_TTS_KEY ||
        !process.env.MICROSOFT_TTS_REGION ||
        !process.env.MICROSOFT_TTS_VOICE_NAME
      ) {
        request.log.error('Microsoft TTS configuration is incomplete');
        return reply.code(500).send({
          error: 'Microsoft TTS configuration is incomplete',
        });
      }
    } else if (ttsVendor === 'elevenlabs') {
      if (
        !process.env.ELEVENLABS_API_KEY ||
        !process.env.ELEVENLABS_VOICE_ID ||
        !process.env.ELEVENLABS_MODEL_ID
      ) {
        request.log.error('ElevenLabs TTS configuration is incomplete');
        return reply.code(500).send({
          error: 'ElevenLabs TTS configuration is incomplete',
        });
      }
    } else {
      return reply.code(500).send({
        error: `Unsupported TTS vendor: ${ttsVendor}`,
      });
    }
  }
}

/**
 * Check that POST requests use the correct Content-Type header
 */
export async function validateContentType(
  request: FastifyRequest,
  reply: FastifyReply
) {
  if (
    request.method === 'POST' &&
    request.headers['content-type'] !== 'application/json'
  ) {
    return reply.code(415).send({
      error: 'Unsupported Media Type. Content-Type must be application/json',
    });
  }
}

/**
 * Validates request bodies for specific routes
 */
export async function validateRequestBody(
  request: FastifyRequest,
  reply: FastifyReply
) {
  // Skip validation for non-POST requests
  if (request.method !== 'POST') {
    return;
  }

  // Ensure request has a body
  const body = request.body as any;
  if (!body || Object.keys(body).length === 0) {
    return reply.code(400).send({ error: 'Request body is required' });
  }

  // Route-specific validation
  if (request.url === '/agent/invite') {
    const { requester_id, channel_name } = body as InviteAgentRequest;

    // Required fields check
    if (!requester_id) {
      return reply.code(400).send({ error: 'requester_id is required' });
    }

    if (!channel_name) {
      return reply.code(400).send({ error: 'channel_name is required' });
    }

    // Validate requester_id format
    if (typeof requester_id === 'string') {
      if (requester_id.trim() === '') {
        return reply.code(400).send({
          error: 'requester_id cannot be empty',
        });
      }
    } else if (typeof requester_id === 'number') {
      if (!Number.isInteger(requester_id) || requester_id < 0) {
        return reply.code(400).send({
          error:
            'requester_id must be a positive integer when provided as a number',
        });
      }
    } else {
      return reply.code(400).send({
        error: 'requester_id must be a string or number',
      });
    }

    // Validate channel_name format
    if (typeof channel_name !== 'string') {
      return reply.code(400).send({
        error: 'channel_name must be a string',
      });
    }

    if (channel_name.length < 3 || channel_name.length > 64) {
      return reply.code(400).send({
        error: 'channel_name length must be between 3 and 64 characters',
      });
    }
  } else if (request.url === '/agent/remove') {
    const { agent_id } = body as RemoveAgentRequest;

    if (!agent_id) {
      return reply.code(400).send({ error: 'agent_id is required' });
    }

    if (typeof agent_id !== 'string') {
      return reply.code(400).send({ error: 'agent_id must be a string' });
    }
  }
}

이제 메인 index.ts 파일을 업데이트하여 검증 미들웨어를 등록합니다. src/index.ts를 업데이트합니다:

// Previous imports remain the same
import {
  validateEnvironment,
  validateContentType,
  validateRequestBody,
} from './middlewares/validation';

// Previous code remains the same
// - Load environment variables
// - Create Fastify instance
// - Register CORS

// Register global middlewares
fastify.addHook('onRequest', validateContentType);
fastify.addHook('preValidation', validateEnvironment);
fastify.addHook('preValidation', validateRequestBody);

// Register routes remain the same

// Rest of the code remains the same...

개발 워크플로우 설정

핵심 기능이 준비되었으니 이제 적절한 개발 워크플로우를 설정해 보겠습니다. 파일 변경 시 서버를 자동으로 재시작하도록 nodemon을 구성하겠습니다.

프로젝트 루트 디렉토리에 nodemon.json 파일을 생성합니다:

touch nodemon.json

다음 내용을 추가하세요:

{
  "watch": ["src"],
  "ext": ".ts,.js",
  "ignore": [],
  "exec": "ts-node ./src/index.ts"
}

package.json 스크립트를 업데이트하여 Nodemon을 사용하도록 설정합니다:

"scripts": {
  "start": "node dist/index.js",
  "dev": "nodemon",
  "build": "tsc"
}

이제 개발 서버를 실행해 보겠습니다:

npm run dev

서버 테스트

엔드포인트를 테스트하기 전에 클라이언트 측 애플리케이션이 실행 중인지 확인하세요. Agora의 비디오 SDK를 구현한 어떤 애플리케이션(웹, 모바일, 데스크톱)을 사용해도 됩니다. 애플리케이션이 없다면 Agora의 Voice Demo를 사용할 수 있습니다. 채널에 참여하기 전에 토큰 요청을 반드시 수행해야 합니다.

서버가 정상적으로 작동하는지 테스트해 보겠습니다. 먼저 .env 파일이 필요한 모든 자격 증명 정보로 올바르게 구성되어 있는지 확인하세요.

개발 모드로 서버를 시작합니다:

npm run dev

참고: .env 파일이 필요한 모든 자격 증명으로 올바르게 구성되어 있는지 확인하세요. 이 가이드의 마지막 부분에 환경 변수의 전체 목록이 있습니다.

서버가 정상적으로 실행 중이라면 다음과 같은 출력이 표시됩니다:

서버가 포트 3030에서 실행 중입니다

서버 테스트

엔드포인트를 테스트하기 전에 클라이언트 측 애플리케이션이 실행 중인지 확인하세요. Agora의 비디오 SDK를 구현한 애플리케이션(웹, 모바일, 데스크톱)을 사용할 수 있습니다. 애플리케이션이 없다면 Agora의 Voice Demo를 사용하세요. 채널에 참여하기 전에 토큰 요청을 반드시 수행해야 합니다.

curl을 사용하여 API 엔드포인트를 테스트해 보겠습니다:

1. 토큰 생성

curl http://localhost:3030/token

예상 응답 (귀하의 값은 다를 수 있습니다):

{
  "token": "007eJxTYBAxNdgrlvnEfm3o...",
  "uid": "0",
  "channel": "ai-conversation-1665481623456-abc123"
}

2. 특정 매개변수를 사용하여 토큰 생성

curl "http://localhost:3000/token?channel=test-channel&uid=1234"

3. 에이전트 초대

curl -X POST http://localhost:3030/agent/invite \
  -H "Content-Type: application/json" \
  -d '{
    "requester_id": "1234",
    "channel_name": "YOUR_CHANNEL_NAME_FROM_PREVIOUS_STEP",
    "input_modalities": ["text"],
    "output_modalities": ["text", "audio"]
  }'

예상 응답 (귀하의 값은 다를 수 있습니다):

{
  "agent_id": "agent-abc123",
  "create_ts": 1665481725000,
  "state": "active"
}

4. AI 에이전트 제거

curl -X POST "http://localhost:3000/agent/remove" \
  -H "Content-Type: application/json" \
  -d '{
    "agent_id": "agent-123"
  }'

예상 응답:

{
  "success": true
}

오류 처리

서버는 모든 경로에서 일관된 오류 처리를 구현합니다:

try {
  // Route logic
} catch (error) {
  fastify.log.error('Error description:', error);
  reply.code(500).send({
    error: error instanceof Error ? error.message : 'Error description',
  });
}

일반적인 오류 응답:

400: 요청 오류 (무효한 매개변수)
415: 지원되지 않는 미디어 유형 (잘못된 Content-Type)
500: 서버 오류 (구성 파일 부족 또는 실행 시 오류)

맞춤 설정

Agora Conversational AI Engine은 다양한 맞춤 설정을 지원합니다.

에이전트 맞춤 설정

/agent/invite 엔드포인트에서 시스템 메시지를 수정하여 에이전트의 프로ンプ트를 맞춤 설정할 수 있습니다:

const systemMessage =
  "You are a technical support specialist named Alex. Your responses should be friendly but concise, focused on helping users solve their technical problems. Use simple language but don't oversimplify technical concepts.";

You can also update the greeting to control the initial message it speaks into the channel.

llm {
    greeting_message: 'Hello! How can I assist you today?',
    failure_message: 'Please wait a moment.',
}

음성 합성 맞춤 설정

응용 프로그램에 적합한 음성을 선택하려면 음성 라이브러리를 탐색하세요:

Microsoft Azure TTS의 경우: Microsoft Azure TTS 음성 갤러리를 방문하세요
ElevenLabs TTS의 경우: ElevenLabs 음성 라이브러리를 탐색하세요

음성 활동 감지(VAD) 미세 조정

대화 흐름을 최적화하기 위해 VAD 설정을 조정하세요:

vad: {
  silence_duration_ms: 600,      // How long to wait after silence to end turn
  speech_duration_ms: 10000,     // Maximum duration for a single speech segment
  threshold: 0.6,                // Speech detection sensitivity
  interrupt_duration_ms: 200,    // How quickly interruptions are detected
  prefix_padding_ms: 400,        // Audio padding at the beginning of speech
}

환경 변수 참조 가이드

다음은 .env 파일용 환경 변수의 전체 목록입니다:

# Server Configuration
PORT=3000

# Agora Configuration
AGORA_APP_ID=your_app_id
AGORA_APP_CERTIFICATE=your_app_certificate
AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects
AGORA_CUSTOMER_ID=your_customer_id
AGORA_CUSTOMER_SECRET=your_customer_secret
AGENT_UID=Agent

# LLM Configuration
LLM_URL=https://api.openai.com/v1/chat/completions
LLM_TOKEN=your_openai_api_key
LLM_MODEL=gpt-4o-mini

# Input/Output Modalities
INPUT_MODALITIES=text
OUTPUT_MODALITIES=text,audio

# TTS Configuration
TTS_VENDOR=microsoft  # or elevenlabs

# Microsoft TTS Configuration
MICROSOFT_TTS_KEY=your_microsoft_tts_key
MICROSOFT_TTS_REGION=your_microsoft_tts_region
MICROSOFT_TTS_VOICE_NAME=en-US-AndrewMultilingualNeural
MICROSOFT_TTS_RATE=1.0  # Range: 0.5 to 2.0
MICROSOFT_TTS_VOLUME=100.0  # Range: 0.0 to 100.0

# ElevenLabs TTS Configuration
ELEVENLABS_API_KEY=your_elevenlabs_api_key
ELEVENLABS_VOICE_ID=your_elevenlabs_voice_id
ELEVENLABS_MODEL_ID=eleven_flash_v2_5

다음 단계

축하합니다! Agora의 Conversational AI Engine과 통합된 Express 서버를 구축하셨습니다. 이 마이크로서비스를 기존 Agora 백엔드와 통합하세요.

아고라의 Conversational AI Engine에 대한 자세한 내용은 공식 문서를 참고하세요.

개발을 즐겁게 진행하세요!

‍

Learn more about Agora's video and voice solutions

Ready to chat through your real-time video and voice needs? We're here to help! Current Twilio customers get up to 2 months FREE.

Complete the form, and one of our experts will be in touch.

Try Agora for Free

Try for Free

TEN

App Builder

유연한 강의실

SDK 다운로드

지원 계획 및 가격

Fastify를 사용한 아고라 대화형 AI 백엔드 구축

사전 요구 사항

프로젝트 설정

Fastify 서버 설정

아고라 대화형 AI 경로

에이전트 경로

에이전트 초대 경로

에이전트 제거

헬퍼 함수

메인 서버에 에이전트 라우트 추가

토큰 생성

환경 변수 및 요청 검증

개발 워크플로우 설정

서버 테스트

서버 테스트

1. 토큰 생성

2. 특정 매개변수를 사용하여 토큰 생성

3. 에이전트 초대

4. AI 에이전트 제거

오류 처리

맞춤 설정

에이전트 맞춤 설정

음성 합성 맞춤 설정

음성 활동 감지(VAD) 미세 조정

환경 변수 참조 가이드

다음 단계

Learn more about Agora's video and voice solutions

Try Agora for Free