대화형 AI는 현재 가장 뜨거운 트렌드입니다. 이 기술은 사용자가 AI 에이전트와 실시간으로 대화를 나누며, 생각을 타이핑하고 현명하게 프롬프트로 형식화하는 시간을 낭비하지 않고 실제로 작업을 완료할 수 있도록 합니다. 이는 사람들이 AI와 상호작용하는 방식에 있어 큰 변화를 의미합니다.

그러나 개발자와 기업들이 맞춤형 LLM 워크플로우를 통해 실행되는 텍스트 기반 에이전트를 구축하는 데 투자한 점을 고려할 때, 이 새로운 패러다임을 채택하는 데 주저함이 있습니다. 특히 이로 인해 기존 투자를 포기해야 하거나, 더 나쁘게는 이를 도구/함수 호출로만 연결해 기능이 제한되는 경우라면 더욱 그렇습니다.

이것이 바로 우리가 Agora Conversational AI Engine을 개발한 이유입니다. 이 엔진은 기존 LLM 워크플로우를 Agora 채널에 연결하여 AI 에이전트와 실시간 대화를 가능하게 합니다.

이 가이드에서는 Agora의 Conversational AI Engine을 기반으로 한 AI 에이전트와 사용자를 연결하는 실시간 오디오 대화 애플리케이션을 구축할 것입니다. 이 앱은 NextJS, React, TypeScript로 개발됩니다. 단계별 접근 방식을 취해 핵심 실시간 통신 구성 요소를 먼저 구현한 후 Agora의 Convo AI Engine을 추가할 것입니다.

이 가이드를 완료하면 Agora의 Conversational AI Engine으로 구동되는 AI 에이전트와 사용자를 연결하는 실시간 오디오 대화 애플리케이션을 갖게 됩니다.

사전 요구 사항

시작하기 전에 다음을 준비해야 합니다:

Node.js (v18 이상)
React, TypeScript, NextJS에 대한 기본 이해.
Agora 계정 - 매월 첫 10,000분은 무료
Conversational AI 서비스 AppID에 활성화됨

프로젝트 설정

TypeScript 지원을 포함한 새로운 NextJS 프로젝트를 생성해 보겠습니다.

pnpm create next-app@latest ai-conversation-app
cd ai-conversation-app

요청 시 다음 옵션을 선택하세요:

TypeScript: 예
ESLint: 예
Tailwind CSS: 예
src/ 디렉토리 사용: 아니오
앱 라우터: 예
Turbopack 사용: 아니오
임포트 별칭 사용자 정의: 예 (기본값 @/* 사용)

다음으로 필요한 Agora 의존성을 설치하세요:

아고라의 React SDK: agora-rtc-react
아고라의 토큰 빌더: agora-token

pnpm add agora-rtc-react agora-token

UI 구성 요소에는 이 가이드에서 shadcn/ui를 사용할 것입니다. 하지만 원하는 UI 라이브러리를 사용하거나 맞춤형 구성 요소를 생성할 수 있습니다:

pnpm dlx shadcn@latest init

이 가이드에서는 Lucide 아이콘도 사용할 것이므로, 해당 아이콘도 함께 설치해 주세요:

pnpm add lucide-react

이 가이드를 따라가면서 특정 디렉토리에 새로운 파일을 생성해야 합니다. 따라서 시작하기 전에 먼저 이 새로운 디렉토리를 생성해 보겠습니다.

프로젝트 루트 디렉토리에서 app/api/, components/, 및 types/ 디렉토리를 생성하고, .env.local 파일을 추가합니다.

mkdir app/api components types
touch .env.local

프로젝트 디렉토리는 이제 다음과 같은 구조를 가져야 합니다:

├── app/ │ ├── api/ │ ├── globals.css │ ├── layout.tsx │ └── page.tsx ├── components/ ├── types/ ├── .env.local └── (... Existing files and directories)

랜딩 페이지 컴포넌트

아고라 클라이언트를 초기화하고 AgoraProvider를 설정하는 랜딩 페이지를 설정해 보겠습니다.

LandingPage

컴포넌트 파일을

components/LandingPage.tsx

에 생성합니다:

touch components/LandingPage.tsx

현재는 이 컴포넌트를 간단하게 유지하고, 가이드를 진행하면서 더 많은 기능을 추가해 나갈 것입니다. 코드 곳곳에 설명을 추가해 두었으니 참고하시기 바랍니다. 전체적으로 보면, Agora React SDK를 import하고 AgoraRTC 클라이언트를 생성한 후, 이를 AgoraProvider에 전달하여 모든 자식 컴포넌트가 동일한 client 인스턴스를 사용하도록 합니다.

다음 코드를 LandingPage.tsx 파일에 추가하세요:

'use client';

import { useState, useMemo } from 'react';
import dynamic from 'next/dynamic';

// Agora requires access to the browser's WebRTC API,
// - which throws an error if it's loaded via SSR
// Create a component that has SSR disabled,
// - and use it to load the AgoraRTC components on the client side
const AgoraProvider = dynamic(
  async () => {
    // Dynamically import Agora's components
    const { AgoraRTCProvider, default: AgoraRTC } = await import(
      'agora-rtc-react'
    );

    return {
      default: ({ children }: { children: React.ReactNode }) => {
        // Create the Agora RTC client once using useMemo
        const client = useMemo(
          () => AgoraRTC.createClient({ mode: 'rtc', codec: 'vp8' }),
          []
        );

        // The provider makes the client available to all child components
        return <AgoraRTCProvider client={client}>{children}</AgoraRTCProvider>;
      },
    };
  },
  { ssr: false } // Important: disable SSR for this component
);

export default function LandingPage() {
  // Basic setup, we'll add more functionality as we progress through the guide.
  return (
    <div className="min-h-screen bg-gray-900 text-white p-4">
      <h1 className="text-4xl font-bold mb-6 text-center">
        Agora AI Conversation
      </h1>

      <div className="max-w-4xl mx-auto">
        <p className="text-lg mb-6 text-center">
          When was the last time you had an intelligent conversation?
        </p>

        {/* Placeholder for our start conversation button */}
        <div className="flex justify-center mb-8">
          <button className="px-6 py-3 bg-blue-600 text-white rounded-lg">
            Start Conversation
          </button>
        </div>

        <AgoraProvider>
          <div>PLACEHOLDER: We'll add the conversation component here</div>
        </AgoraProvider>
      </div>
    </div>
  );
}

이제 app/page.tsx 파일을 업데이트하여 이 랜딩 페이지를 사용하도록 설정하세요:

import LandingPage from '@/components/LandingPage';

export default function Home() {
  return <LandingPage />;
}

기본 아고라 React JS 구현

랜딩 페이지 설정이 완료되면 Agora의 React JS SDK를 구현하여 채널 가입, 오디오 전송, 오디오 수신 및 Agora SDK 이벤트 처리와 같은 핵심 RTC 기능을 처리할 수 있습니다.

components/ConversationComponent.tsx에 파일을 생성합니다.,

touch components/ConversationComponent.tsx

다음 코드를 추가하세요:

'use client';

import { useState, useEffect, useCallback } from 'react';
import {
  useRTCClient,
  useLocalMicrophoneTrack,
  useRemoteUsers,
  useClientEvent,
  useIsConnected,
  useJoin,
  usePublish,
  RemoteUser,
  UID,
} from 'agora-rtc-react';

export default function ConversationComponent() {
  // Access the client from the provider context
  const client = useRTCClient();

  // Track connection status
  const isConnected = useIsConnected();

  // Manage microphone state
  const [isEnabled, setIsEnabled] = useState(true);
  const { localMicrophoneTrack } = useLocalMicrophoneTrack(isEnabled);

  // Track remote users (like our AI agent)
  const remoteUsers = useRemoteUsers();

  // Join the channel when component mounts
  const { isConnected: joinSuccess } = useJoin(
    {
      appid: process.env.NEXT_PUBLIC_AGORA_APP_ID!, // Load APP_ID from env.local
      channel: 'test-channel',
      token: 'replace-with-token',
      uid: 0, // Join with UID 0 and Agora will assign a unique ID when the user joins
    },
    true // Join automatically when the component mounts
  );

  // Publish our microphone track to the channel
  usePublish([localMicrophoneTrack]);

  // Set up event handlers for client events
  useClientEvent(client, 'user-joined', (user) => {
    console.log('Remote user joined:', user.uid);
  });

  useClientEvent(client, 'user-left', (user) => {
    console.log('Remote user left:', user.uid);
  });

  // Toggle microphone on/off
  const toggleMicrophone = async () => {
    if (localMicrophoneTrack) {
      await localMicrophoneTrack.setEnabled(!isEnabled);
      setIsEnabled(!isEnabled);
    }
  };

  // Clean up when component unmounts
  useEffect(() => {
    return () => {
      client?.leave(); // Leave the channel when the component unmounts
    };
  }, [client]);

  return (
    <div className="p-4 bg-gray-800 rounded-lg">
      <div className="mb-4">
        <p className="text-white">
          {/* Display the connection status */}
          Connection Status: {isConnected ? 'Connected' : 'Disconnected'}
        </p>
      </div>

      {/* Display remote users */}
      <div className="mb-4">
        {remoteUsers.length > 0 ? (
          remoteUsers.map((user) => (
            <div
              key={user.uid}
              className="p-2 bg-gray-700 rounded mb-2 text-white"
            >
              <RemoteUser user={user} />
            </div>
          ))
        ) : (
          <p className="text-gray-400">No remote users connected</p>
        )}
      </div>

      {/* Microphone control */}
      <button
        onClick={toggleMicrophone}
        className={`px-4 py-2 rounded ${
          isEnabled ? 'bg-green-500' : 'bg-red-500'
        } text-white`}
      >
        Microphone: {isEnabled ? 'On' : 'Off'}
      </button>
    </div>
  );
}

이 구성 요소는 실시간 오디오 통신을 위한 기반이 되므로, Agora React 훅을 다시 한 번 정리해 보겠습니다:

useRTCClient: 랜딩 페이지에서 설정된 제공자로부터 Agora RTC 클라이언트에 접근합니다
useLocalMicrophoneTrack: 사용자의 마이크 입력을 생성하고 관리합니다
useRemoteUsers: 채널 내 다른 사용자를 추적합니다 (우리 AI 에이전트가 여기에 표시됩니다)
useJoin: 지정된 매개변수로 채널에 참여합니다
usePublish: 오디오 트랙을 채널에 전송하여 다른 사용자가 들을 수 있도록 합니다
useClientEvent: 사용자가 참여하거나退出하는 등 중요한 이벤트에 대한 이벤트 핸들러를 설정합니다

참고: APP_ID는 환경 변수에서 null이 아닌 값을 확인하는 연산자를 사용하여 로드되므로, .env.local 파일에 반드시 설정해야 합니다.

이 컴포넌트를 LandingPage.tsx 파일에 추가해야 합니다. 먼저 컴포넌트를 임포트한 후 AgoraProvider 컴포넌트에 추가하세요.

// Previous imports remain the same as before...
// Dynamically import the ConversationComponent with ssr disabled
const ConversationComponent = dynamic(() => import('./ConversationComponent'), {
  ssr: false,
});
// Previous code remains the same as before...
<AgoraProvider>
  <ConversationComponent />
</AgoraProvider>;

다음으로 토큰 인증을 구현하여 애플리케이션에 추가 보안 계층을 추가하겠습니다.

4. 토큰 생성 및 관리

아고라 팀은 모든 애플리케이션, 특히 생산 환경에서 토큰 기반 인증을 사용하도록 강력히 권장합니다. 이 단계에서는 이러한 토큰을 생성하는 경로를 만들고, LandingPage 및 ConversationComponent를 업데이트하여 이를 사용하도록 설정하겠습니다.

토큰 생성 경로

토큰 생성 경로가 수행해야 할 작업을 살펴보겠습니다:

App ID와 인증서를 사용하여 안전한 Agora 토큰 생성
각 대화마다 고유한 채널 이름 생성
토큰, 채널 이름, 생성 시 사용된 UID를 클라이언트에 반환
기존 채널 이름과 UID를 사용하여 토큰 갱신 지원

app/api/generate-agora-token/route.ts 파일에 새로운 파일을 생성합니다.:

mkdir app/api/generate-agora-token
touch app/api/generate-agora-token/route.ts

다음 코드를 추가하세요:

import { NextRequest, NextResponse } from 'next/server';
import { RtcTokenBuilder, RtcRole } from 'agora-token';

// Access environment variables
const APP_ID = process.env.NEXT_PUBLIC_AGORA_APP_ID;
const APP_CERTIFICATE = process.env.NEXT_PUBLIC_AGORA_APP_CERTIFICATE;
const EXPIRATION_TIME_IN_SECONDS = 3600; // Token valid for 1 hour

// Helper function to generate unique channel names
function generateChannelName(): string {
  // Combine timestamp and random string for uniqueness
  const timestamp = Date.now();
  const random = Math.random().toString(36).substring(2, 8);
  return `ai-conversation-${timestamp}-${random}`;
}

export async function GET(request: NextRequest) {
  console.log('Generating Agora token...');

  // Verify required environment variables are set
  if (!APP_ID || !APP_CERTIFICATE) {
    console.error('Agora credentials are not set');
    return NextResponse.json(
      { error: 'Agora credentials are not set' },
      { status: 500 }
    );
  }

  // Get query parameters (if any)
  const { searchParams } = new URL(request.url);
  const uidStr = searchParams.get('uid') || '0';
  const uid = parseInt(uidStr);

  // Use provided channel name or generate new one
  const channelName = searchParams.get('channel') || generateChannelName();

  // Calculate token expiration time
  const expirationTime =
    Math.floor(Date.now() / 1000) + EXPIRATION_TIME_IN_SECONDS;

  try {
    // Generate the token using Agora's SDK
    console.log('Building token with UID:', uid, 'Channel:', channelName);
    const token = RtcTokenBuilder.buildTokenWithUid(
      APP_ID,
      APP_CERTIFICATE,
      channelName,
      uid,
      RtcRole.PUBLISHER, // User can publish audio/video
      expirationTime,
      expirationTime
    );

    console.log('Token generated successfully');
    // Return the token and session information to the client
    return NextResponse.json({
      token,
      uid: uid.toString(),
      channel: channelName,
    });
  } catch (error) {
    console.error('Error generating Agora token:', error);
    return NextResponse.json(
      { error: 'Failed to generate Agora token', details: error },
      { status: 500 }
    );
  }
}

이 경로는 애플리케이션의 토큰 생성을 담당합니다. 중요한 기능을 다시 한 번 정리해 보겠습니다:

시간 스탬프와 무작위 문자열을 사용하여 충돌을 방지하기 위해 고유한 채널 이름을 생성합니다
앱 ID와 인증서를 사용하여 안전한 토큰을 생성합니다
기존 채널 이름과 사용자 ID를 사용하여 토큰을 갱신하기 위한 URL 매개변수를 수락합니다

참고: 이 경로는 환경 변수에서 APP_ID와 APP_CERTIFICATE를 로드하므로, .env.local 파일에 반드시 설정해 주세요.

랜딩 페이지 업데이트를 통한 토큰 요청

토큰 경로 설정이 완료되었으니, 모든 토큰 가져오기 논리를 처리하기 위해 랜딩 페이지를 업데이트하겠습니다. 먼저, 컴포넌트에서 사용할 수 있도록 토큰 데이터용 새로운 유형 정의를 생성해야 합니다.

types/conversation.ts 파일에 생성합니다.:

touch types/conversation.ts

다음 코드를 추가하세요:

// Types for Agora token data
export interface AgoraLocalUserInfo {
  token: string;
  uid: string;
  channel: string;
  agentId?: string;
}

components/LandingPage.tsx 파일을 열고, React 임포트 문을 업데이트한 후 AgoraLocalUserInfo 유형을 위한 새로운 임포트 문을 추가하고, 전체 LandingPage() 함수를 업데이트합니다.

Agora React SDK가 동적으로 로드되기 때문에 Suspense를 사용할 것입니다. 대화 컴포넌트는 로드하는 데 시간이 필요하기 때문에, 준비될 때까지 로딩 상태를 표시하는 것이 좋습니다.

'use client';

import { useState, useMemo, Suspense } from 'react'; // added Suspense
// Previous imports remain the same as before...
import type { AgoraLocalUserInfo } from '../types/conversation';

export default function LandingPage() {
  // Manage conversation state
  const [showConversation, setShowConversation] = useState(false);
  // Manage loading state, while the agent token is generated
  const [isLoading, setIsLoading] = useState(false);
  // Manage error state
  const [error, setError] = useState<string | null>(null);
  // Store the token data for the conversation
  const [agoraLocalUserInfo, setAgoraLocalUserInfo] =
    useState<AgoraLocalUserInfo | null>(null);

  const handleStartConversation = async () => {
    setIsLoading(true);
    setError(null);

    try {
      // Request a token from our API
      console.log('Fetching Agora token...');
      const agoraResponse = await fetch('/api/generate-agora-token');

      if (!agoraResponse.ok) {
        throw new Error('Failed to generate Agora token');
      }

      const responseData = await agoraResponse.json();
      console.log('Token response:', responseData);

      // Store the token data for the conversation
      setAgoraLocalUserInfo(responseData);

      // Show the conversation component
      setShowConversation(true);
    } catch (err) {
      setError('Failed to start conversation. Please try again.');
      console.error('Error starting conversation:', err);
    } finally {
      setIsLoading(false);
    }
  };

  const handleTokenWillExpire = async (uid: string) => {
    try {
      // Request a new token using the channel name and uid
      const response = await fetch(
        `/api/generate-agora-token?channel=${agoraLocalUserInfo?.channel}&uid=${uid}`
      );
      const data = await response.json();

      if (!response.ok) {
        throw new Error('Failed to generate new token');
      }

      return data.token;
    } catch (error) {
      console.error('Error renewing token:', error);
      throw error;
    }
  };

  return (
    <div className="min-h-screen bg-gray-900 text-white p-4">
      <div className="max-w-4xl mx-auto">
        <h1 className="text-4xl font-bold mb-6 text-center">
          Agora Conversational AI
        </h1>

        <p className="text-lg mb-6 text-center">
          When was the last time you had an intelligent conversation?
        </p>

        {!showConversation ? (
          <div className="flex justify-center mb-8">
            <button
              onClick={handleStartConversation}
              disabled={isLoading}
              className="px-6 py-3 bg-blue-600 text-white rounded-lg disabled:opacity-50"
            >
              {isLoading ? 'Starting...' : 'Start Conversation'}
            </button>
          </div>
        ) : agoraLocalUserInfo ? (
          <Suspense
            fallback={<p className="text-center">Loading conversation...</p>}
          >
            <AgoraProvider>
              <ConversationComponent
                agoraLocalUserInfo={agoraLocalUserInfo}
                onTokenWillExpire={handleTokenWillExpire}
                onEndConversation={() => setShowConversation(false)}
              />
            </AgoraProvider>
          </Suspense>
        ) : (
          <p className="text-center text-red-400">
            Failed to load conversation data.
          </p>
        )}

        {error && <p className="text-center text-red-400 mt-4">{error}</p>}
      </div>
    </div>
  );
}

현재 ConversationComponent에 표시되는 오류나 경고는 걱정하지 마세요. 다음 단계에서 수정할 것입니다.

토큰을 사용하도록 Conversation Component 업데이트

토큰과 채널 이름이 준비되었으니, 이들을 LandingPage에서 ConversationComponent로 전달할 수 있도록 일부 프로퍼티를 생성해 보겠습니다.

types/conversation.ts 파일을 열고 다음 interface를 추가하세요:

// Props for our conversation component
export interface ConversationComponentProps {
  agoraLocalUserInfo: AgoraLocalUserInfo;
  onTokenWillExpire: (uid: string) => Promise<string>;
  onEndConversation: () => void;
}

ConversationComponent.tsx 파일을 열고, 방금 생성한 props를 가져와서 채널에 참여하기 위해 사용하도록 업데이트합니다. 또한 토큰 만료 이벤트 핸들러를 추가하여 토큰 갱신 처리를 담당하고, 대화를 종료하기 위한 버튼을 추가합니다.

// Previopus imports remain the same as before...
import type { ConversationComponentProps } from '../types/conversation'; // Import the new props

// Update the component to accept the new props
export default function ConversationComponent({
  agoraLocalUserInfo,
  onTokenWillExpire,
  onEndConversation,
}: ConversationComponentProps) {
  // The previous declarations remain the same as before
  const [joinedUID, setJoinedUID] = useState<UID>(0); // New: After joining the channel we'll store the uid for renewing the token

  // Update the useJoin hook to use the token and channel name from the props
  const { isConnected: joinSuccess } = useJoin(
    {
      appid: process.env.NEXT_PUBLIC_AGORA_APP_ID!,
      channel: agoraLocalUserInfo.channel, // Using the channel name received from the token response
      token: agoraLocalUserInfo.token, // Using the token we received
      uid: parseInt(agoraLocalUserInfo.uid), // Using uid 0 to join the channel, so Agora's system will create and return a uid for us
    },
    true
  );

  // Set the actualUID to the Agora generated uid once the user joins the channel
  useEffect(() => {
    if (joinSuccess && client) {
      const uid = client.uid;
      setJoinedUID(uid as UID);
      console.log('Join successful, using UID:', uid);
    }
  }, [joinSuccess, client]);

  /*
  Existing code remains the same as before:
  // Publish local microphone track
  // Handle remote user events
  // Handle remote user left event
*/

  // New: Add listener for connection state changes
  useClientEvent(client, 'connection-state-change', (curState, prevState) => {
    console.log(`Connection state changed from ${prevState} to ${curState}`);
  });

  // Add token renewal handler to avoid disconnections
  const handleTokenWillExpire = useCallback(async () => {
    if (!onTokenWillExpire || !joinedUID) return;
    try {
      // Request a new token from our API
      const newToken = await onTokenWillExpire(joinedUID.toString());
      await client?.renewToken(newToken);
      console.log('Successfully renewed Agora token');
    } catch (error) {
      console.error('Failed to renew Agora token:', error);
    }
  }, [client, onTokenWillExpire, joinedUID]);

  // New: Add listener for token privilege will expire event
  useClientEvent(client, 'token-privilege-will-expire', handleTokenWillExpire);

  /*
  Existing code remains the same as before:
  // Toggle microphone
  // Cleanup on unmount
*/

  //update the return statement to include new UI elements for leaving the conversation
  return (
    <div className="p-4 bg-gray-800 rounded-lg">
      <div className="flex items-center justify-between mb-4">
        <div className="flex items-center gap-2">
          <div
            className={`w-3 h-3 rounded-full ${
              isConnected ? 'bg-green-500' : 'bg-red-500'
            }`}
          />
          <span className="text-white">
            {isConnected ? 'Connected' : 'Disconnected'}
          </span>
        </div>

        <button
          onClick={onEndConversation}
          className="px-4 py-2 bg-red-500 text-white rounded"
        >
          End Conversation
        </button>
      </div>

      {/* Display remote users */}
      <div className="mb-4">
        <h2 className="text-xl mb-2 text-white">Remote Users:</h2>
        {remoteUsers.length > 0 ? (
          remoteUsers.map((user) => (
            <div
              key={user.uid}
              className="p-2 bg-gray-700 rounded mb-2 text-white"
            >
              <RemoteUser user={user} />
            </div>
          ))
        ) : (
          <p className="text-gray-400">No remote users connected</p>
        )}
      </div>

      {/* Microphone control */}
      <button
        onClick={toggleMicrophone}
        className={`px-4 py-2 rounded ${
          isEnabled ? 'bg-green-500' : 'bg-red-500'
        } text-white`}
      >
        Microphone: {isEnabled ? 'On' : 'Off'}
      </button>
    </div>
  );
}

빠른 테스트

기본 RTC 기능과 토큰 생성이 정상적으로 작동하는 것을 확인했으니 이제 애플리케이션을 테스트해 보겠습니다.

pnpm run dev 명령어를 실행하여 애플리케이션을 실행합니다.
브라우저에서 애플리케이션을 열려면 다음 URL을 입력합니다: http://localhost:3000
“Start Conversation” 버튼을 클릭합니다.
연결 상태가 “Connected”로 변경되는 것을 확인할 수 있습니다.

아고라의 대화형 AI 엔진 추가

기본 RTC 기능이 작동하는 이제 Agora의 대화형 AI 서비스를 통합해 보겠습니다. 다음 섹션에서는 다음과 같은 작업을 수행합니다:

채널에 AI 에이전트를 초대하기 위한 API 경로를 생성합니다
Agora Start Request를 구성합니다(LLM 엔드포인트 및 TTS 제공업체 선택 포함)
대화를 중단하기 위한 경로를 생성합니다

유형 설정

먼저 번거로운 부분을 처리해 보겠습니다. types/conversation.ts 파일에 새로운 유형을 추가합니다:

// Previous types remain the same as before...

// New types for the agent invitation API
export interface ClientStartRequest {
  requester_id: string;
  channel_name: string;
  rtc_codec?: number;
  input_modalities?: string[];
  output_modalities?: string[];
}

interface MicrosoftTTSParams {
  key: string;
  region: string;
  voice_name: string;
  rate?: number;
  volume?: number;
}

interface ElevenLabsTTSParams {
  key: string;
  voice_id: string;
  model_id: string;
}

export enum TTSVendor {
  Microsoft = 'microsoft',
  ElevenLabs = 'elevenlabs',
}

export interface TTSConfig {
  vendor: TTSVendor;
  params: MicrosoftTTSParams | ElevenLabsTTSParams;
}

// Agora API request body
export interface AgoraStartRequest {
  name: string;
  properties: {
    channel: string;
    token: string;
    agent_rtc_uid: string;
    remote_rtc_uids: string[];
    enable_string_uid?: boolean;
    idle_timeout?: number;
    advanced_features?: {
      enable_aivad?: boolean;
      enable_bhvs?: boolean;
    };
    asr: {
      language: string;
      task?: string;
    };
    llm: {
      url?: string;
      api_key?: string;
      system_messages: Array<{
        role: string;
        content: string;
      }>;
      greeting_message: string;
      failure_message: string;
      max_history?: number;
      input_modalities?: string[];
      output_modalities?: string[];
      params: {
        model: string;
        max_tokens: number;
        temperature?: number;
        top_p?: number;
      };
    };
    vad: {
      silence_duration_ms: number;
      speech_duration_ms?: number;
      threshold?: number;
      interrupt_duration_ms?: number;
      prefix_padding_ms?: number;
    };
    tts: TTSConfig;
  };
}

export interface StopConversationRequest {
  agent_id: string;
}

export interface AgentResponse {
  agent_id: string;
  create_ts: number;
  state: string;
}

이 새로운 유형들은 다음 단계에서 조립할 모든 부품에 대한 이해를 제공합니다. 고객 요청을 받아 AgoraStartRequest를 구성하고 Agora의 대화형 AI 엔진으로 전송합니다. Agora의 Convo AI 엔진은 에이전트를 대화로 추가합니다.

에이전트 초대 경로

app/api/invite-agent/route.ts에 경로 파일을 생성합니다.:

mkdir app/api/invite-agent
touch app/api/invite-agent/route.ts

다음 코드를 추가하세요:

import { NextResponse } from 'next/server';
import { RtcTokenBuilder, RtcRole } from 'agora-token';
import {
  ClientStartRequest,
  AgentResponse,
  TTSVendor,
} from '@/types/conversation';

// Helper function to validate and get all configuration
function getValidatedConfig() {
  // Validate Agora Configuration
  const agoraConfig = {
    baseUrl: process.env.NEXT_PUBLIC_AGORA_CONVO_AI_BASE_URL || '',
    appId: process.env.NEXT_PUBLIC_AGORA_APP_ID || '',
    appCertificate: process.env.NEXT_PUBLIC_AGORA_APP_CERTIFICATE || '',
    customerId: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_ID || '',
    customerSecret: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_SECRET || '',
    agentUid: process.env.NEXT_PUBLIC_AGENT_UID || 'Agent',
  };

  if (Object.values(agoraConfig).some((v) => v === '')) {
    throw new Error('Missing Agora configuration. Check your .env.local file');
  }

  // Validate LLM Configuration
  const llmConfig = {
    url: process.env.NEXT_PUBLIC_LLM_URL,
    api_key: process.env.NEXT_PUBLIC_LLM_API_KEY,
    model: process.env.NEXT_PUBLIC_LLM_MODEL,
  };

  // Get TTS Vendor
  const ttsVendor =
    (process.env.NEXT_PUBLIC_TTS_VENDOR as TTSVendor) || TTSVendor.Microsoft;

  // Get Modalities Configuration
  const modalitiesConfig = {
    input: process.env.NEXT_PUBLIC_INPUT_MODALITIES?.split(',') || ['text'],
    output: process.env.NEXT_PUBLIC_OUTPUT_MODALITIES?.split(',') || [
      'text',
      'audio',
    ],
  };

  return {
    agora: agoraConfig,
    llm: llmConfig,
    ttsVendor,
    modalities: modalitiesConfig,
  };
}

// Helper function to get TTS configuration based on vendor
function getTTSConfig(vendor: TTSVendor) {
  if (vendor === TTSVendor.Microsoft) {
    return {
      vendor: TTSVendor.Microsoft,
      params: {
        key: process.env.NEXT_PUBLIC_MICROSOFT_TTS_KEY,
        region: process.env.NEXT_PUBLIC_MICROSOFT_TTS_REGION,
        voice_name:
          process.env.NEXT_PUBLIC_MICROSOFT_TTS_VOICE_NAME ||
          'en-US-AriaNeural',
        rate: parseFloat(process.env.NEXT_PUBLIC_MICROSOFT_TTS_RATE || '1.0'),
        volume: parseFloat(
          process.env.NEXT_PUBLIC_MICROSOFT_TTS_VOLUME || '100.0'
        ),
      },
    };
  } else if (vendor === TTSVendor.ElevenLabs) {
    return {
      vendor: TTSVendor.ElevenLabs,
      params: {
        key: process.env.NEXT_PUBLIC_ELEVENLABS_API_KEY,
        model_id: process.env.NEXT_PUBLIC_ELEVENLABS_MODEL_ID,
        voice_id: process.env.NEXT_PUBLIC_ELEVENLABS_VOICE_ID,
      },
    };
  }

  throw new Error(`Unsupported TTS vendor: ${vendor}`);
}

export async function POST(request: Request) {
  try {
    // Get our configuration
    const config = getValidatedConfig();
    const body: ClientStartRequest = await request.json();
    const { requester_id, channel_name, input_modalities, output_modalities } =
      body;

    // Generate a unique token for the AI agent
    const timestamp = Date.now();
    const expirationTime = Math.floor(timestamp / 1000) + 3600;

    const token = RtcTokenBuilder.buildTokenWithUid(
      config.agora.appId,
      config.agora.appCertificate,
      channel_name,
      config.agora.agentUid,
      RtcRole.PUBLISHER,
      expirationTime,
      expirationTime
    );

    // Check if we're using string UIDs
    const isStringUID = (str: string) => /[a-zA-Z]/.test(str);

    // Create a descriptive name for this conversation
    const uniqueName = `conversation-${timestamp}-${Math.random()
      .toString(36)
      .substring(2, 8)}`;

    // Get the appropriate TTS configuration
    const ttsConfig = getTTSConfig(config.ttsVendor);

    // Prepare the request to the Agora Conversational AI API
    const requestBody = {
      name: uniqueName,
      properties: {
        channel: channel_name,
        token: token,
        agent_rtc_uid: config.agora.agentUid,
        remote_rtc_uids: [requester_id],
        enable_string_uid: isStringUID(config.agora.agentUid),
        idle_timeout: 30,
        // ASR (Automatic Speech Recognition) settings
        asr: {
          language: 'en-US',
          task: 'conversation',
        },
        // LLM (Large Language Model) settings
        llm: {
          url: config.llm.url,
          api_key: config.llm.api_key,
          system_messages: [
            {
              role: 'system',
              content:
                'You are a helpful assistant. Respond concisely and naturally as if in a spoken conversation.',
            },
          ],
          greeting_message: 'Hello! How can I assist you today?',
          failure_message: 'Please wait a moment while I process that.',
          max_history: 10,
          params: {
            model: config.llm.model || 'gpt-3.5-turbo',
            max_tokens: 1024,
            temperature: 0.7,
            top_p: 0.95,
          },
          input_modalities: input_modalities || config.modalities.input,
          output_modalities: output_modalities || config.modalities.output,
        },
        // VAD (Voice Activity Detection) settings
        vad: {
          silence_duration_ms: 480,
          speech_duration_ms: 15000,
          threshold: 0.5,
          interrupt_duration_ms: 160,
          prefix_padding_ms: 300,
        },
        // TTS (Text-to-Speech) settings
        tts: ttsConfig,
      },
    };

    // Send the request to the Agora API
    const response = await fetch(
      `${config.agora.baseUrl}/${config.agora.appId}/join`,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: `Basic ${Buffer.from(
            `${config.agora.customerId}:${config.agora.customerSecret}`
          ).toString('base64')}`,
        },
        body: JSON.stringify(requestBody),
      }
    );

    if (!response.ok) {
      const errorText = await response.text();
      console.error('Agent start response:', {
        status: response.status,
        body: errorText,
      });
      throw new Error(
        `Failed to start conversation: ${response.status} ${errorText}`
      );
    }

    // Parse and return the response, which includes the agentID.
    // We'll need the agentID later, when its time to remove the agent.
    const data: AgentResponse = await response.json();
    return NextResponse.json(data);
  } catch (error) {
    console.error('Error starting conversation:', error);
    return NextResponse.json(
      {
        error:
          error instanceof Error
            ? error.message
            : 'Failed to start conversation',
      },
      { status: 500 }
    );
  }
}

아고라는 다중 TTS 제공업체를 지원하므로 TTS 섹션에는 Microsoft Azure TTS 및 ElevenLabs의 구성 설정이 포함되어 있으며, TTSVendor 환경 변수를 사용하여 사용할 TTS 구성을 결정합니다.

필요에 따라 TTS 제공업체를 선택하세요. 제공업체를 선택한 후에는 음성도 선택해야 합니다. 시작을 돕기 위해 각 제공업체의 음성 갤러리 링크를 제공합니다:

Microsoft Azure TTS 음성 갤러리: 자연스러운 음성의 다양한 옵션을 제공합니다.
ElevenLabs 음성 라이브러리: 현실감 있고 감정 표현이 풍부한 음성으로 유명합니다.

참고: 이 경로는 여러 환경 변수를 로드합니다. 이 변수들은 .env.local 파일에 설정해야 합니다. 이 가이드의 마지막 부분에 필요한 모든 환경 변수 목록을 포함했습니다.

대화 중단 경로

에이전트가 대화에 참여한 후에는 대화를 종료하기 위한 방법이 필요합니다. 이때 stop-conversation 경로가 사용됩니다. 이 경로는 에이전트 ID를 받아 Agora의 Conversational AI Engine에 요청을 전송하여 에이전트를 채널에서 제거합니다.

app/api/stop-conversation/route.ts에 파일을 생성하세요:

mkdir app/api/stop-conversation
touch app/api/stop-conversation/route.ts

다음 코드를 추가하세요:

import { NextResponse } from 'next/server';
import { StopConversationRequest } from '@/types/conversation';

// Helper function to validate and get Agora configuration
function getValidatedConfig() {
  const agoraConfig = {
    baseUrl: process.env.NEXT_PUBLIC_AGORA_CONVO_AI_BASE_URL,
    appId: process.env.NEXT_PUBLIC_AGORA_APP_ID || '',
    customerId: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_ID || '',
    customerSecret: process.env.NEXT_PUBLIC_AGORA_CUSTOMER_SECRET || '',
  };

  if (Object.values(agoraConfig).some((v) => !v || v.trim() === '')) {
    throw new Error('Missing Agora configuration. Check your .env.local file');
  }

  return agoraConfig;
}

export async function POST(request: Request) {
  try {
    const config = getValidatedConfig();
    const body: StopConversationRequest = await request.json();
    const { agent_id } = body;

    if (!agent_id) {
      throw new Error('agent_id is required');
    }

    // Create authentication header
    const plainCredential = `${config.customerId}:${config.customerSecret}`;
    const encodedCredential = Buffer.from(plainCredential).toString('base64');
    const authorizationHeader = `Basic ${encodedCredential}`;

    // Send request to Agora API to stop the conversation
    const response = await fetch(
      `${config.baseUrl}/${config.appId}/agents/${agent_id}/leave`,
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          Authorization: authorizationHeader,
        },
      }
    );

    if (!response.ok) {
      const errorText = await response.text();
      console.error('Agent stop response:', {
        status: response.status,
        body: errorText,
      });
      throw new Error(
        `Failed to stop conversation: ${response.status} ${errorText}`
      );
    }

    return NextResponse.json({ success: true });
  } catch (error) {
    console.error('Error stopping conversation:', error);
    return NextResponse.json(
      {
        error:
          error instanceof Error
            ? error.message
            : 'Failed to stop conversation',
      },
      { status: 500 }
    );
  }
}

클라이언트를 업데이트하여 AI 에이전트를 시작 및 중지하도록 합니다

LandingPage 및 ConversationComponent를 업데이트하여 AI 에이전트를 시작 및 중지하는 기능을 추가합니다.

랜딩 페이지 업데이트를 통해 AI 에이전트 초대

먼저 토큰 생성과 동시에 AI 에이전트를 초대하는 기능을 추가하기 위해 랜딩 페이지를 업데이트합니다. 이 업데이트는 ConversationComponent 로딩과 병렬로 초대 요청을 실행합니다.

이 업데이트된 랜딩 페이지는 AI 에이전트가 대화에 참여하도록 초대하며, 에이전트가 참여할 수 없는 경우 적절한 로딩 및 오류 상태를 표시합니다.

// Previous imports remain the same as before...
// Add new imports for ClientStartRequest and AgentResponse
import type {
  AgoraLocalUserInfo,
  ClientStartRequest,
  AgentResponse,
} from '../types/conversation';

// Dynamically imports for ConversationComponent and AgoraProvider remain the same as before...

export default function LandingPage() {
  // previous state management code remains the same as before...
  const [agentJoinError, setAgentJoinError] = useState(false); // add agent join error state

  const handleStartConversation = async () => {
    setIsLoading(true);
    setError(null);
    setAgentJoinError(false);

    try {
      // Step 1: Get the Agora token (updated)
      console.log('Fetching Agora token...');
      const agoraResponse = await fetch('/api/generate-agora-token');
      const responseData = await agoraResponse.json();
      console.log('Agora API response:', responseData);

      if (!agoraResponse.ok) {
        throw new Error(
          `Failed to generate Agora token: ${JSON.stringify(responseData)}`
        );
      }

      // Step 2: Invite the AI agent to join the channel
      const startRequest: ClientStartRequest = {
        requester_id: responseData.uid,
        channel_name: responseData.channel,
        input_modalities: ['text'],
        output_modalities: ['text', 'audio'],
      };

      try {
        const response = await fetch('/api/invite-agent', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
          },
          body: JSON.stringify(startRequest),
        });

        if (!response.ok) {
          setAgentJoinError(true);
        } else {
          const agentData: AgentResponse = await response.json();
          // Store agent ID along with token data
          setAgoraLocalUserInfo({
            ...responseData,
            agentId: agentData.agent_id,
          });
        }
      } catch (err) {
        console.error('Failed to start conversation with agent:', err);
        setAgentJoinError(true);
      }

      // Show the conversation UI even if agent join fails
      // The user can retry connecting the agent from within the conversation
      setShowConversation(true);
    } catch (err) {
      setError('Failed to start conversation. Please try again.');
      console.error('Error starting conversation:', err);
    } finally {
      setIsLoading(false);
    }
  };

  // Token renewal code remains the same as before...

  // Updated return statement to show error if the agent join fails
  return (
    <div className="min-h-screen bg-gray-900 text-white p-4">
      <div className="max-w-4xl mx-auto py-12">
        <h1 className="text-4xl font-bold mb-6 text-center">
          Agora AI Conversation
        </h1>

        <p className="text-lg mb-6 text-center">
          When was the last time you had an intelligent conversation?
        </p>

        {!showConversation ? (
          <>
            <div className="flex justify-center mb-8">
              <button
                onClick={handleStartConversation}
                disabled={isLoading}
                className="px-8 py-3 bg-blue-600 hover:bg-blue-700 text-white rounded-full shadow-lg disabled:opacity-50 transition-all"
              >
                {isLoading ? 'Starting...' : 'Start Conversation'}
              </button>
            </div>
            {error && <p className="text-center text-red-400 mt-4">{error}</p>}
          </>
        ) : agoraLocalUserInfo ? (
          <>
            {agentJoinError && (
              <div className="mb-4 p-3 bg-red-600/20 rounded-lg text-red-400 text-center">
                Failed to connect with AI agent. The conversation may not work
                as expected.
              </div>
            )}
            <Suspense
              fallback={
                <div className="text-center">Loading conversation...</div>
              }
            >
              <AgoraProvider>
                <ConversationComponent
                  agoraLocalUserInfo={agoraLocalUserInfo}
                  onTokenWillExpire={handleTokenWillExpire}
                  onEndConversation={() => setShowConversation(false)}
                />
              </AgoraProvider>
            </Suspense>
          </>
        ) : (
          <p className="text-center">Failed to load conversation data.</p>
        )}
      </div>
    </div>
  );
}

마이크 버튼 컴포넌트 만들기

마이크 버튼은 오디오 중심 UI의 필수 요소입니다. 따라서 사용자가 마이크를 제어할 수 있는 간단한 버튼 컴포넌트를 만들겠습니다.

components/MicrophoneButton.tsx에 파일을 생성합니다.:

touch components/MicrophoneButton.tsx

다음 코드를 추가하세요:

'use client';

import React, { useState } from 'react';
import { IMicrophoneAudioTrack } from 'agora-rtc-react';
import { Mic, MicOff } from 'lucide-react'; // Import from lucide-react or another icon library

interface MicrophoneButtonProps {
  isEnabled: boolean;
  setIsEnabled: (enabled: boolean) => void;
  localMicrophoneTrack: IMicrophoneAudioTrack | null;
}

export function MicrophoneButton({
  isEnabled,
  setIsEnabled,
  localMicrophoneTrack,
}: MicrophoneButtonProps) {
  const toggleMicrophone = async () => {
    if (localMicrophoneTrack) {
      const newState = !isEnabled;
      try {
        await localMicrophoneTrack.setEnabled(newState);
        setIsEnabled(newState);
        console.log('Microphone state updated successfully');
      } catch (error) {
        console.error('Failed to toggle microphone:', error);
      }
    }
  };

  return (
    <button
      onClick={toggleMicrophone}
      className={`relative w-16 h-16 rounded-full shadow-lg flex items-center justify-center transition-colors ${
        isEnabled ? 'bg-white hover:bg-gray-50' : 'bg-red-500 hover:bg-red-600'
      }`}
      aria-label={isEnabled ? 'Mute microphone' : 'Unmute microphone'}
    >
      <div className={`relative z-10`}>
        {isEnabled ? (
          <Mic size={24} className="text-gray-800" />
        ) : (
          <MicOff size={24} className="text-white" />
        )}
      </div>
    </button>
  );
}

대화 구성 요소 업데이트

이제 AI 에이전트의 중지 및 재시작을 처리하도록 대화 구성 요소를 업데이트하겠습니다. 또한 마이크 버튼 구성 요소를 추가하겠습니다:

// Previous imports remain the same as before...
import { MicrophoneButton } from './MicrophoneButton'; // microphone button component
// import new ClientStartRequest and StopConversationRequest types
import type {
  ConversationComponentProps,
  ClientStartRequest,
  StopConversationRequest,
} from '../types/conversation';

export default function ConversationComponent({
  agoraLocalUserInfo,
  onTokenWillExpire,
  onEndConversation,
}: ConversationComponentProps) {
  // Previous state management code remains the same as before...
  // Add new agent related state variables
  const [isAgentConnected, setIsAgentConnected] = useState(false);
  const [isConnecting, setIsConnecting] = useState(false);
  const agentUID = process.env.NEXT_PUBLIC_AGENT_UID;

  // Join the channel hook remains the same as before...
  // Set UID on join success, remains the same as before...
  // Publish local microphone track remains the same as before...

  // Update remote user events - specifically looking for the AI agent
  useClientEvent(client, 'user-joined', (user) => {
    console.log('Remote user joined:', user.uid);
    if (user.uid.toString() === agentUID) {
      setIsAgentConnected(true);
      setIsConnecting(false);
    }
  });

  useClientEvent(client, 'user-left', (user) => {
    console.log('Remote user left:', user.uid);
    if (user.uid.toString() === agentUID) {
      setIsAgentConnected(false);
      setIsConnecting(false);
    }
  });

  // Sync isAgentConnected with remoteUsers
  useEffect(() => {
    const isAgentInRemoteUsers = remoteUsers.some(
      (user) => user.uid.toString() === agentUID
    );
    setIsAgentConnected(isAgentInRemoteUsers);
  }, [remoteUsers, agentUID]);

  // Connection state listener remains the same as before...
  // Cleanup on unmount remains the same as before...

  // Function to stop conversation with the AI agent
  const handleStopConversation = async () => {
    if (!isAgentConnected || !agoraLocalUserInfo.agentId) return;
    setIsConnecting(true);

    try {
      const stopRequest: StopConversationRequest = {
        agent_id: agoraLocalUserInfo.agentId,
      };

      const response = await fetch('/api/stop-conversation', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(stopRequest),
      });

      if (!response.ok) {
        throw new Error(`Failed to stop conversation: ${response.statusText}`);
      }

      // Wait for the agent to actually leave before resetting state
      // The user-left event handler will handle setting isAgentConnected to false
    } catch (error) {
      if (error instanceof Error) {
        console.warn('Error stopping conversation:', error.message);
      }
      setIsConnecting(false);
    }
  };

  // Function to start conversation with the AI agent
  const handleStartConversation = async () => {
    if (!joinedUID) return;
    setIsConnecting(true);

    try {
      const startRequest: ClientStartRequest = {
        requester_id: joinedUID.toString(),
        channel_name: agoraLocalUserInfo.channel,
        input_modalities: ['text'],
        output_modalities: ['text', 'audio'],
      };

      const response = await fetch('/api/invite-agent', {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify(startRequest),
      });

      if (!response.ok) {
        throw new Error(`Failed to start conversation: ${response.statusText}`);
      }

      // Update agent ID when new agent is connected
      const data = await response.json();
      if (data.agent_id) {
        agoraLocalUserInfo.agentId = data.agent_id;
      }
    } catch (error) {
      if (error instanceof Error) {
        console.warn('Error starting conversation:', error.message);
      }
      // Reset connecting state if there's an error
      setIsConnecting(false);
    }
  };

  // Token renewal handler remains the same as before...
  // Add token observer remains the same as before...

  // Updated return to include stop, restart, and microphone controls
  return (
    <div className="flex flex-col gap-6 p-4 h-full relative">
      {/* Connection Status */}
      <div className="absolute top-4 right-4 flex items-center gap-2">
        {isAgentConnected ? (
          <button
            onClick={handleStopConversation}
            disabled={isConnecting}
            className="px-4 py-2 bg-red-500/80 text-white rounded-full border border-red-400/30 backdrop-blur-sm 
            hover:bg-red-600/90 transition-all shadow-lg 
            disabled:opacity-50 disabled:cursor-not-allowed text-sm font-medium"
          >
            {isConnecting ? 'Disconnecting...' : 'Stop Agent'}
          </button>
        ) : (
          <button
            onClick={handleStartConversation}
            disabled={isConnecting}
            className="px-4 py-2 bg-blue-500/80 text-white rounded-full border border-blue-400/30 backdrop-blur-sm 
            hover:bg-blue-600/90 transition-all shadow-lg 
            disabled:opacity-50 disabled:cursor-not-allowed text-sm font-medium"
          >
            {isConnecting ? 'Connecting...' : 'Connect Agent'}
          </button>
        )}
        <div
          className={`w-3 h-3 rounded-full ${
            isConnected ? 'bg-green-500' : 'bg-red-500'
          }`}
          onClick={onEndConversation}
          role="button"
          title="End conversation"
          style={{ cursor: 'pointer' }}
        />
      </div>

      {/* Remote Users Section */}
      <div className="flex-1">
        {remoteUsers.map((user) => (
          <div key={user.uid} className="mb-4">
            <p className="text-center text-sm text-gray-400 mb-2">
              {user.uid.toString() === agentUID
                ? 'AI Agent'
                : `User: ${user.uid}`}
            </p>
            <RemoteUser user={user} />
          </div>
        ))}

        {remoteUsers.length === 0 && (
          <div className="text-center text-gray-500 py-8">
            {isConnected
              ? 'Waiting for AI agent to join...'
              : 'Connecting to channel...'}
          </div>
        )}
      </div>

      {/* Microphone Control */}
      <div className="fixed bottom-8 left-1/2 -translate-x-1/2">
        <MicrophoneButton
          isEnabled={isEnabled}
          setIsEnabled={setIsEnabled}
          localMicrophoneTrack={localMicrophoneTrack}
        />
      </div>
    </div>
  );
}

오디오 시각화 (선택 사항)

AI 에이전트가 말할 때 사용자에게 시각적 피드백을 제공하기 위해 오디오 시각화를 추가해 보겠습니다. 다음은 Agora 오디오 트랙을 애니메이션의 입력으로 사용하는 오디오 시각화 컴포넌트의 예시입니다.

components/AudioVisualizer.tsx에 파일을 생성합니다.:

touch components/AudioVisualizer.tsx

다음 코드를 추가하세요:

'use client';

import React, { useEffect, useRef, useState } from 'react';
import { ILocalAudioTrack, IRemoteAudioTrack } from 'agora-rtc-react';

interface AudioVisualizerProps {
  track: ILocalAudioTrack | IRemoteAudioTrack | undefined;
}

export const AudioVisualizer: React.FC<AudioVisualizerProps> = ({ track }) => {
  const [isVisualizing, setIsVisualizing] = useState(false);
  const audioContextRef = useRef<AudioContext | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const animationFrameRef = useRef<number>();
  const barsRef = useRef<(HTMLDivElement | null)[]>([]);

  const animate = () => {
    if (!analyserRef.current) {
      return;
    }

    const bufferLength = analyserRef.current.frequencyBinCount;
    const dataArray = new Uint8Array(bufferLength);
    analyserRef.current.getByteFrequencyData(dataArray);

    // Define frequency ranges for different bars to create a more appealing visualization
    const frequencyRanges = [
      [24, 31], // Highest (bar 0, 8)
      [16, 23], // Mid-high (bar 1, 7)
      [8, 15], // Mid (bar 2, 6)
      [4, 7], // Low-mid (bar 3, 5)
      [0, 3], // Lowest (bar 4 - center)
    ];

    barsRef.current.forEach((bar, index) => {
      if (!bar) {
        return;
      }

      // Use symmetrical ranges for the 9 bars
      const rangeIndex = index < 5 ? index : 8 - index;
      const [start, end] = frequencyRanges[rangeIndex];

      // Calculate average energy in this frequency range
      let sum = 0;
      for (let i = start; i <= end; i++) {
        sum += dataArray[i];
      }
      let average = sum / (end - start + 1);

      // Apply different multipliers to create a more appealing shape
      const multipliers = [0.7, 0.8, 0.85, 0.9, 0.95];
      const multiplierIndex = index < 5 ? index : 8 - index;
      average *= multipliers[multiplierIndex];

      // Scale and limit the height
      const height = Math.min((average / 255) * 100, 100);
      bar.style.height = `${height}px`;
    });

    animationFrameRef.current = requestAnimationFrame(animate);
  };

  useEffect(() => {
    if (!track) {
      return;
    }

    const startVisualizer = async () => {
      try {
        audioContextRef.current = new AudioContext();
        analyserRef.current = audioContextRef.current.createAnalyser();
        analyserRef.current.fftSize = 64; // Keep this small for performance

        // Get the audio track from Agora
        const mediaStreamTrack = track.getMediaStreamTrack();
        const stream = new MediaStream([mediaStreamTrack]);

        // Connect it to our analyzer
        const source = audioContextRef.current.createMediaStreamSource(stream);
        source.connect(analyserRef.current);

        setIsVisualizing(true);
        animate();
      } catch (error) {
        console.error('Error starting visualizer:', error);
      }
    };

    startVisualizer();

    // Clean up when component unmounts or track changes
    return () => {
      if (animationFrameRef.current) {
        cancelAnimationFrame(animationFrameRef.current);
      }
      if (audioContextRef.current) {
        audioContextRef.current.close();
      }
    };
  }, [track]);

  return (
    <div className="w-full h-40 rounded-lg overflow-hidden flex items-center justify-center relative">
      <div className="flex items-center space-x-2 h-[100px] relative z-10">
        {/* Create 9 bars for the visualizer */}
        {[...Array(9)].map((_, index) => (
          <div
            key={index}
            ref={(el) => {
              barsRef.current[index] = el;
            }}
            className="w-3 bg-gradient-to-t from-blue-500 via-purple-500 to-pink-500 rounded-full transition-all duration-75"
            style={{
              height: '2px',
              minHeight: '2px',
              background: 'linear-gradient(to top, #3b82f6, #8b5cf6, #ec4899)',
            }}
          />
        ))}
      </div>
    </div>
  );
};

시각화 도구는 다음과 같이 작동합니다:

Agora SDK를 통해 track prop
을 통해 오디오 트랙을 가져옵니다
Web Audio API를 사용하여 오디오 스트림에서 주파수 데이터를 추출합니다

오디오의 다양한 주파수 범위에 반응하는 시각적 바를 렌더링합니다원격 사용자의 오디오 트랙과 이 시각화 도구를 함께 사용하려면, ConversationComponent 내의 RemoteUser

렌더링 방식을 업데이트해야 합니다:

// Inside the remoteUsers.map in ConversationComponent.tsx:
{
  remoteUsers.map((user) => (
    <div key={user.uid} className="mb-4">
      {/* Add the audio visualizer for the remote user */}
      <AudioVisualizer track={user.audioTrack} />
      <p className="text-center text-sm text-gray-400 mb-2">
        {user.uid.toString() === agentUID ? 'AI Agent' : `User: ${user.uid}`}
      </p>
      <RemoteUser user={user} />
    </div>
  ));
}

오디오 비주얼라이저 통합

오디오 비주얼라이저를 대화 구성 요소와 통합/연결하려면 다음 단계를 수행해야 합니다:

AudioVisualizer 구성 요소를 가져옵니다
적절한 오디오 트랙을 전달합니다
UI에 배치합니다

ConversationComponent.tsx파일에 오디오 비주얼라이저를 포함하도록 업데이트합니다:

'use client';

import { useState, useEffect, useCallback } from 'react';
import {
  useRTCClient,
  useLocalMicrophoneTrack,
  useRemoteUsers,
  useClientEvent,
  useIsConnected,
  useJoin,
  usePublish,
  RemoteUser,
  UID,
} from 'agora-rtc-react';
import { MicrophoneButton } from './MicrophoneButton';
import { AudioVisualizer } from './AudioVisualizer';
import type {
  ConversationComponentProps,
  ClientStartRequest,
  StopConversationRequest,
} from '../types/conversation';

// Rest of the component as before...

// Then in the render method:
return (
  <div className="flex flex-col gap-6 p-4 h-full relative">
    {/* Connection Status */}
    {/* ... */}

    {/* Remote Users Section with Audio Visualizer */}
    <div className="flex-1">
      {remoteUsers.map((user) => (
        <div key={user.uid} className="mb-8 p-4 bg-gray-800/30 rounded-lg">
          <p className="text-center text-sm text-gray-400 mb-2">
            {user.uid.toString() === agentUID
              ? 'AI Agent'
              : `User: ${user.uid}`}
          </p>

          {/* The AudioVisualizer receives the remote user's audio track */}
          <AudioVisualizer track={user.audioTrack} />

          {/* The RemoteUser component handles playing the audio */}
          <RemoteUser user={user} />
        </div>
      ))}

      {remoteUsers.length === 0 && (
        <div className="text-center text-gray-500 py-8">
          {isConnected
            ? 'Waiting for AI agent to join...'
            : 'Connecting to channel...'}
        </div>
      )}
    </div>

    {/* Microphone Control */}
    <div className="fixed bottom-8 left-1/2 -translate-x-1/2">
      <MicrophoneButton
        isEnabled={isEnabled}
        setIsEnabled={setIsEnabled}
        localMicrophoneTrack={localMicrophoneTrack}
      />
    </div>
  </div>
);

이 기능은 AI 에이전트가 말할 때 시각적으로 명확하게 표시되는 반응형 시각화를 생성하여 오디오와 함께 시각적 피드백을 제공함으로써 사용자 경험을 향상시킵니다.

시각화 기능이 추가된 마이크 버튼

채널에 단일 사용자 및 AI만 존재하므로 마이크 버튼에 자체 오디오 시각화 기능을 추가해야 합니다. 이는 사용자에게 마이크가 오디오 입력을 캡처 중임을 시각적으로 알려줍니다.

MicrophoneButton.tsx

파일을 더 고급 버전으로 업데이트해 보겠습니다:

'use client';

import React, { useState, useEffect, useRef } from 'react';
import { useRTCClient, IMicrophoneAudioTrack } from 'agora-rtc-react';
import { Mic, MicOff } from 'lucide-react';

// Interface for audio bar data
interface AudioBar {
  height: number;
}

interface MicrophoneButtonProps {
  isEnabled: boolean;
  setIsEnabled: (enabled: boolean) => void;
  localMicrophoneTrack: IMicrophoneAudioTrack | null;
}

export function MicrophoneButton({
  isEnabled,
  setIsEnabled,
  localMicrophoneTrack,
}: MicrophoneButtonProps) {
  // State to store audio visualization data
  const [audioData, setAudioData] = useState<AudioBar[]>(
    Array(5).fill({ height: 0 })
  );

  // Get the Agora client from context
  const client = useRTCClient();

  // References for audio processing
  const audioContextRef = useRef<AudioContext | null>(null);
  const analyserRef = useRef<AnalyserNode | null>(null);
  const animationFrameRef = useRef<number>();

  // Set up and clean up audio analyzer based on microphone state
  useEffect(() => {
    if (localMicrophoneTrack && isEnabled) {
      setupAudioAnalyser();
    } else {
      cleanupAudioAnalyser();
    }

    return () => cleanupAudioAnalyser();
  }, [localMicrophoneTrack, isEnabled]);

  // Initialize the audio analyzer
  const setupAudioAnalyser = async () => {
    if (!localMicrophoneTrack) return;

    try {
      // Create audio context and analyzer
      audioContextRef.current = new AudioContext();
      analyserRef.current = audioContextRef.current.createAnalyser();
      analyserRef.current.fftSize = 64; // Small FFT size for better performance
      analyserRef.current.smoothingTimeConstant = 0.5; // Add smoothing

      // Get the microphone stream from Agora
      const mediaStream = localMicrophoneTrack.getMediaStreamTrack();
      const source = audioContextRef.current.createMediaStreamSource(
        new MediaStream([mediaStream])
      );

      // Connect the source to the analyzer
      source.connect(analyserRef.current);

      // Start updating the visualization
      updateAudioData();
    } catch (error) {
      console.error('Error setting up audio analyser:', error);
    }
  };

  // Clean up audio resources
  const cleanupAudioAnalyser = () => {
    if (animationFrameRef.current) {
      cancelAnimationFrame(animationFrameRef.current);
    }
    if (audioContextRef.current) {
      audioContextRef.current.close();
      audioContextRef.current = null;
    }
    setAudioData(Array(5).fill({ height: 0 }));
  };

  // Update the audio visualization data
  const updateAudioData = () => {
    if (!analyserRef.current) return;

    // Get frequency data from analyzer
    const dataArray = new Uint8Array(analyserRef.current.frequencyBinCount);
    analyserRef.current.getByteFrequencyData(dataArray);

    // Split the frequency data into 5 segments
    const segmentSize = Math.floor(dataArray.length / 5);
    const newAudioData = Array(5)
      .fill(0)
      .map((_, index) => {
        // Get average value for this frequency segment
        const start = index * segmentSize;
        const end = start + segmentSize;
        const segment = dataArray.slice(start, end);
        const average = segment.reduce((a, b) => a + b, 0) / segment.length;

        // Scale and shape the response curve for better visualization
        const scaledHeight = Math.min(60, (average / 255) * 100 * 1.2);
        const height = Math.pow(scaledHeight / 60, 0.7) * 60;

        return {
          height: height,
        };
      });

    // Update state with new data
    setAudioData(newAudioData);

    // Schedule the next update
    animationFrameRef.current = requestAnimationFrame(updateAudioData);
  };

  // Toggle microphone state
  const toggleMicrophone = async () => {
    if (localMicrophoneTrack) {
      const newState = !isEnabled;
      try {
        // Enable/disable the microphone track
        await localMicrophoneTrack.setEnabled(newState);

        // Handle publishing/unpublishing
        if (!newState) {
          await client.unpublish(localMicrophoneTrack);
        } else {
          await client.publish(localMicrophoneTrack);
        }

        // Update state
        setIsEnabled(newState);
        console.log('Microphone state updated successfully');
      } catch (error) {
        console.error('Failed to toggle microphone:', error);
        // Revert to previous state on error
        localMicrophoneTrack.setEnabled(isEnabled);
      }
    }
  };

  return (
    <button
      onClick={toggleMicrophone}
      className={`relative w-16 h-16 rounded-full shadow-lg flex items-center justify-center transition-colors ${
        isEnabled ? 'bg-white hover:bg-gray-50' : 'bg-red-500 hover:bg-red-600'
      }`}
    >
      {/* Audio visualization bars */}
      <div className="absolute inset-0 flex items-center justify-center gap-1">
        {audioData.map((bar, index) => (
          <div
            key={index}
            className="w-1 rounded-full transition-all duration-100"
            style={{
              height: `${bar.height}%`,
              backgroundColor: isEnabled ? '#22c55e' : '#94a3b8',
              transform: `scaleY(${Math.max(0.1, bar.height / 100)})`,
              transformOrigin: 'center',
            }}
          />
        ))}
      </div>

      {/* Microphone icon overlaid on top */}
      <div className={`relative z-10`}>
        {isEnabled ? (
          <Mic size={24} className="text-gray-800" />
        ) : (
          <MicOff size={24} className="text-white" />
        )}
      </div>
    </button>
  );
}

마이크 버튼과 오디오 시각화 기능은 사용자가 다음과을 이해하는 데 도움을 줍니다:

마이크가 정상적으로 작동하고 있는지
말하는 목소리가 충분히 크고 명확하게 들리는지
배경 소음이 오디오 품질에 영향을 주고 있는지

이 기능의 목표는 사용자에게 더 직관적이고 시각적으로 매력적인 경험을 제공하는 것입니다.

테스트

이제 모든 구성 요소가 준비되었으니, 애플리케이션을 테스트하여 마무리하겠습니다.

개발 서버 시작

개발 서버를 시작하려면:

npm run dev

참고: .env 파일이 필요한 모든 자격 증명과 함께 올바르게 구성되어 있는지 확인하세요. 이 가이드의 마지막 부분에 환경 변수의 전체 목록이 있습니다.

응용 프로그램이 정상적으로 실행 중이라면 다음과 같은 출력이 표시됩니다:

서버가 포트 3000에서 실행 중입니다

브라우저를 열고 http://localhost:3000로 이동하여 테스트하세요.

일반적인 문제 및 해결 방법

에이전트가 연결되지 않습니다:
- Agora Conversational AI 자격 증명을 확인하세요
- 콘솔에 특정 오류 메시지가 있는지 확인하세요
- TTS 구성이 유효한지 확인하세요
오디오가 작동하지 않습니다:
- 브라우저의 마이크 액세스 권한을 확인하세요
- 앱에서 마이크가 활성화되어 있는지 확인하세요
- 오디오 트랙이 제대로 게시되었는지 확인하세요
토큰 오류:

App ID 및 App Certificate가 정확한지 확인하세요
토큰 갱신 논리가 정상적으로 작동하는지 확인하세요
토큰 관련 기능에서 오류 처리 여부를 확인하세요.
채널 연결 문제:
- 네트워크 연결을 확인하세요.
- Agora 서비스 상태를 확인하세요.
- 채널을 떠날 때 제대로 정리되었는지 확인하세요.

사용자 지정

Agora Conversational AI Engine은 다양한 사용자 지정을 지원합니다.

에이전트 사용자 지정

/agent/invite엔드포인트에서 system_message는 AI 에이전트의 응답 방식을 정의하여 특정 성격과 커뮤니케이션 스타일을 부여합니다.

system_message를 수정하여 에이전트의 프롬프트를 사용자 지정할 수 있습니다.

// In app/api/invite-agent/route.ts
system_messages: [
  {
    role: 'system',
    content:
      'You are a friendly and helpful assistant named Alex. Your personality is warm, patient, and slightly humorous. When speaking, use a conversational tone with occasional casual expressions. Your responses should be concise but informative, aimed at making complex topics accessible to everyone. If you don't know something, admit it honestly rather than guessing. When appropriate, offer follow-up questions to help guide the conversation.',
  },
],

You can also update the greeting to control the initial message it speaks into the channel.

llm {
    greeting_message: 'Hello! How can I assist you today?',
    failure_message: 'Please wait a moment.',
}

음성 맞춤 설정

응용 프로그램에 적합한 음성을 선택하려면 음성 라이브러리를 탐색하세요:

Microsoft Azure TTS의 경우: Microsoft Azure TTS 음성 갤러리를 방문하세요
ElevenLabs의 경우: ElevenLabs 음성 라이브러리를 탐색하세요

음성 활동 감지(VAD) 조정

대화 흐름을 최적화하기 위해 VAD 설정을 조정하세요:

// In app/api/invite-agent/route.ts
vad: {
  silence_duration_ms: 600,      // How long to wait after silence to end turn (Increase for longer pauses before next turns)
  speech_duration_ms: 10000,     // Maximum duration for a single speech segment (force end of turn after this time)
  threshold: 0.6,                // Sensitivity to background noise (Higher values require louder speech to trigger)
  interrupt_duration_ms: 200,    // How quickly interruptions are detected
  prefix_padding_ms: 400,        // How much audio to capture before speech is detected
},

환경 변수 참조 가이드

다음은 .env 파일용 환경 변수의 전체 목록입니다:

# Agora Configuration
NEXT_PUBLIC_AGORA_APP_ID=
NEXT_PUBLIC_AGORA_APP_CERTIFICATE=
NEXT_PUBLIC_AGORA_CUSTOMER_ID=
NEXT_PUBLIC_AGORA_CUSTOMER_SECRET=

NEXT_PUBLIC_AGORA_CONVO_AI_BASE_URL=https://api.agora.io/api/conversational-ai-agent/v2/projects/
NEXT_PUBLIC_AGENT_UID=333

# LLM Configuration
NEXT_PUBLIC_LLM_URL=https://api.openai.com/v1/chat/completions
NEXT_PUBLIC_LLM_MODEL=gpt-4
NEXT_PUBLIC_LLM_API_KEY=
# TTS Configuration
NEXT_PUBLIC_TTS_VENDOR=microsoft

# Text-to-Speech Configuration
NEXT_PUBLIC_MICROSOFT_TTS_KEY=
NEXT_PUBLIC_MICROSOFT_TTS_REGION=eastus
NEXT_PUBLIC_MICROSOFT_TTS_VOICE_NAME=en-US-AndrewMultilingualNeural
NEXT_PUBLIC_MICROSOFT_TTS_RATE=1.1
NEXT_PUBLIC_MICROSOFT_TTS_VOLUME=70

# ElevenLabs Configuration
NEXT_PUBLIC_ELEVENLABS_API_KEY=
NEXT_PUBLIC_ELEVENLABS_VOICE_ID=XrExE9yKIg1WjnnlVkGX
NEXT_PUBLIC_ELEVENLABS_MODEL_ID=eleven_flash_v2_5

# Modalities Configuration
NEXT_PUBLIC_INPUT_MODALITIES=text
NEXT_PUBLIC_OUTPUT_MODALITIES=text,audio

다음 단계

축하합니다! 아고라의 대화형 AI 엔진과 통합된 Express 서버를 구축하셨습니다. 이 마이크로서비스를 기존 아고라 백엔드와 통합하세요.

아고라의 대화형 AI 엔진에 대한 자세한 내용은 공식 문서를 참고하세요.

개발을 즐겁게 진행하세요!

‍

Learn more about Agora's video and voice solutions

Ready to chat through your real-time video and voice needs? We're here to help! Current Twilio customers get up to 2 months FREE.

Complete the form, and one of our experts will be in touch.

Try Agora for Free

Try for Free

TEN

App Builder

유연한 강의실

SDK 다운로드

지원 계획 및 가격

Next.js와 아고라를 사용하여 대화형 AI 앱을 구축하세요.

사전 요구 사항

프로젝트 설정

랜딩 페이지 컴포넌트

기본 아고라 React JS 구현

4. 토큰 생성 및 관리

토큰 생성 경로

랜딩 페이지 업데이트를 통한 토큰 요청

토큰을 사용하도록 Conversation Component 업데이트

빠른 테스트

아고라의 대화형 AI 엔진 추가

유형 설정

에이전트 초대 경로

대화 중단 경로

클라이언트를 업데이트하여 AI 에이전트를 시작 및 중지하도록 합니다

랜딩 페이지 업데이트를 통해 AI 에이전트 초대

마이크 버튼 컴포넌트 만들기

대화 구성 요소 업데이트

오디오 시각화 (선택 사항)

오디오 비주얼라이저 통합

시각화 기능이 추가된 마이크 버튼

테스트

개발 서버 시작

일반적인 문제 및 해결 방법

사용자 지정

에이전트 사용자 지정

음성 맞춤 설정

음성 활동 감지(VAD) 조정

환경 변수 참조 가이드

다음 단계

Learn more about Agora's video and voice solutions

Try Agora for Free