Add Streaming Transcriptions in Your Conversational AI App

그래서, 아고라를 사용해 놀라운 대화형 AI 앱을 만들었나요? 아마도 저희 주요 Conversational AI Guide를 따라 만들었을 수도 있겠죠. 사용자들은 AI와 대화할 수 있으며, 마치 사람과 대화하는 것처럼 느껴집니다. 하지만 대화를 보는 것은 어떨까요? 바로 여기서 텍스트 스트리밍이 등장합니다.

이 가이드는 오디오 기반 AI 대화에게 실시간 텍스트 자막을 추가하는 방법에 초점을 맞춥니다. AI 채팅에 자막을 추가하는 것과 비슷하다고 생각하시면 됩니다.

음성이 주요 상호작용 방식인데 왜 텍스트를 추가해야 할까요? 좋은 질문입니다! 다음은 그 이유입니다:

접근성 확보: 텍스트는 청각 장애가 있는 사용자에게 앱을 열 수 있도록 합니다. 포용성은 중요합니다!
기억 보조: 솔직히 말해서, 우리는 모두 때로는 잊어버립니다. 텍스트 트랜스크립션은 사용자가 대화 내용을 빠르게 되돌려 볼 수 있도록 합니다.
소음이 많은 장소에서도 문제없음: 시끄러운 카페에서 음성 통화를 해본 적 있나요? 텍스트 트랜스크립션은 오디오가 잘 들리지 않아도 메시지가 전달되도록 보장합니다.
제 말을 제대로 들었나요?: 트랜스크립션을 확인하면 AI가 사용자의 말을 정확히 이해했는지(또는 이해하지 못했는지) 확인할 수 있습니다.
음성 beyond: 목록, 코드 스니펫, 웹사이트 URL과 같은 정보는 시각적으로 보는 것이 더 쉽습니다. 텍스트 스트리밍은 진정한 다중 모달 상호작용을 가능하게 합니다.

앱에 이 슈퍼파워를 추가할 준비가 되셨나요? 시작해 보세요.

블루프린트: 텍스트 스트리밍의 역할

텍스트 스트리밍을 추가하려면 코드베이스에 세 가지 주요 구성 요소가 필요합니다:

브레인 (lib/message.ts): 이는 Agora에서 제공하는 MessageEngine입니다. 원시 텍스트 변환 데이터를 수신하고 내용을 분석하며 메시지 상태(예: “AI가 아직 말하고 있나요?”)를 추적하고 흐름을 관리하는 핵심 논리 모듈입니다.
The Face (components/ConvoTextStream.tsx): 이 컴포넌트는 UI를 담당합니다. MessageEngine에서 처리된 메시지를 받아 아름답게 표시합니다. 채팅 버블, 스크롤링, 스트리밍 텍스트를 위한 멋진 애니메이션 등을 상상해 보세요. 예제를 제공하지만, 앱의 디자인과 스타일에 맞게 자유롭게 커스터마이징할 수 있습니다.
The Conductor (components/ConversationComponent.tsx): 이는 아마도 기존에 사용 중인 구성 요소로, Agora RTC 연결, 마이크 액세스, 전체 대화 흐름을 관리합니다. 중앙 허브 역할을 하며, MessageEngine와 UI(ConvoTextStream)를 연결합니다.

다음은 서로 통신하는 방식의 간략한 다이어그램입니다:

기본적으로 원시 데이터는 Agora에서 제공되며, MessageEngine이 이를 해석하고 주요 ConversationComponent의 상태를 업데이트합니다. 이 상태는 필요한 정보를 ConvoTextStream로 전달하여 사용자에게 표시합니다.

데이터 추적: 메시지 라이프사이클

단일 트랜스크립션 메시지가 네트워크에서 화면까지 이동하는 과정을 이해하는 것이 핵심입니다:

RTC 스트림: Agora RTC 클라이언트를 통해 음성 인식 정보가 포함된 작은 데이터 패킷이 도착합니다. 이는 사용자가 말한 내용의 일부나 AI가 말하는 내용일 수 있습니다.
메시지 처리: MessageEngine는 이 원시 데이터를 가져와 사용자에게 속하는지 AI에게 속하는지, 메시지가 완료되었는지 또는 여전히 전송 중인지 판단합니다.
메시지 큐: 순서를 유지하기 위해 처리된 메시지(또는 메시지 업데이트)는 일시적으로 큐에 저장됩니다.
상태 업데이트: MessageEngine는 사용자가 제공한 콜백을 통해 ConversationComponent에 메시지 목록이 변경되었음을 알립니다. ConversationComponent는 React 상태를 업데이트합니다.
UI 렌더링: React는 상태 변경을 감지하고 새로운 메시지 목록으로 ConvoTextStream 컴포넌트를 재렌더링하여 사용자에게 최신 텍스트를 표시합니다.

이 효율적인 파이프라인은 텍스트가 부드럽고 실시간으로 표시되도록 보장하며, 아직 입력 중인 메시지(“진행 중”), 완전히 전달된 메시지(‘완료’), 또는 중단된 메시지(“중단됨”)를 정확히 처리합니다.

데이터 해독: 메시지 유형

MessageEngine은 RTC 데이터 채널을 통해 흐르는 다양한 종류의 트랜스크립션 메시지를 이해해야 합니다. 주요 유형은 다음과 같습니다:

사용자 트랜스크립션(사용자가 말한 내용)

// Represents a transcription of the user's speech
export interface IUserTranscription extends ITranscriptionBase {
  object: ETranscriptionObjectType.USER_TRANSCRIPTION; // Identifies as "user.transcription"
  final: boolean; // Is this the final, complete transcription? (true/false)
}

이것은 음성 인식 시스템이 사용자가 말한 것으로 판단한 내용을 나타냅니다. final 플래그는 중요합니다 – 중간 결과는 약간 변경될 수 있습니다.

에이전트 트랜스크립션 (AI가 말하는 내용)

// Represents a transcription of the AI agent's speech
export interface IAgentTranscription extends ITranscriptionBase {
  object: ETranscriptionObjectType.AGENT_TRANSCRIPTION; // Identifies as "assistant.transcription"
  quiet: boolean; // Was this generated during a quiet period? (Useful for debugging)
  turn_seq_id: number; // Unique ID for this conversational turn
  turn_status: EMessageStatus; // Is this message IN_PROGRESS, END, or INTERRUPTED?
}

이 텍스트는 AI가 생성한 내용으로, 단어별 또는 구문별로 텍스트-투-스피치 엔진 및 우리 MessageEngine에 전송되어 표시됩니다. turn_status는 AI가 말하기를 시작하고 종료하는 시점을 파악하는 데 필수적입니다.

메시지 중단

// Signals that a previous message was interrupted
export interface IMessageInterrupt {
  object: ETranscriptionObjectType.MSG_INTERRUPTED; // Identifies as "message.interrupt"
  message_id: string; // Which message got interrupted?
  data_type: 'message';
  turn_id: number; // The turn ID of the interrupted message
  start_ms: number; // Timestamp info
  send_ts: number; // Timestamp info
}

이 현상은 예를 들어 사용자가 AI가 아직 말하고 있는 중에 말을 시작할 때 발생합니다. MessageEngine는 이를 활용해 UI에서 AI의 중단된 메시지를 해당 상태로 표시합니다.

MessageEngine는 이러한 다양한 유형을 지능적으로 처리합니다:

사용자 메시지는 일반적으로 완성된 생각으로 도착합니다.
에이전트 메시지는 자주 조각별로 스트리밍됩니다.
중단은 이미 처리 중인 메시지의 상태를 업데이트합니다.

이 모든 것을 내부 큐와 상태 관리 시스템을 통해 처리하여 UI 구성 요소가 원시적인 복잡성을 신경 쓰지 않도록 합니다.

`MessageEngine`을 만나보세요: 텍스트 스트리밍의 핵심

MessageEngine(lib/message.ts)는 마법이 일어나는 곳입니다. 이를 직접 구축할 필요는 없습니다. Agora가 제공합니다. 주요 역할은 다음과 같습니다:

수신: Agora RTC 클라이언트에 연결되어 원시 트랜스크립션 데이터 메시지를 수신합니다.
처리: 메시지를 해독하고 발신자(사용자 또는 AI)를 식별하며 상태를 판단합니다.
상태 관리: 각 메시지가 여전히 스트리밍 중인지(IN_PROGRESS), 완료되었는지(END), 또는 중단되었는지(INTERRUPTED)를 추적합니다.
순서 지정 및 버퍼링: 네트워크 패킷이 약간 순서가 뒤바뀌어 도착하더라도 메시지가 올바른 순서로 처리되도록 보장합니다.
알림: 표시 가능한 메시지 목록이 변경될 때마다 ConversationComponent(콜백을 통해)에 알립니다.

엔진 내부의 핵심 개념

메시지 상태: 완료되었나요?

엔진이 추적하는 모든 메시지는 상태를 가집니다:

export enum EMessageStatus {
  IN_PROGRESS = 0, // Still being received/streamed (e.g., AI is talking)
  END = 1, // Finished normally.
  INTERRUPTED = 2, // Cut off before completion.
}

이 설정은 UI가 각 메시지를 어떻게 표시해야 하는지 알려줍니다(예: IN_PROGRESS 메시지에 “…” 또는 깜빡이는 애니메이션 추가).

엔진 모드: 얼마나 세분화되게 설정하시겠습니까?

엔진은 들어오는 에이전트 텍스트를 다양한 방식으로 처리할 수 있습니다:

export enum EMessageEngineMode {
  TEXT = 'text', // Treats each agent message chunk as a complete block. Simpler, less "streaming" feel.
  WORD = 'word', // Processes agent messages word-by-word if timing info is available. Gives that nice streaming effect.
  AUTO = 'auto', // The engine decides! If word timings are present, it uses WORD mode; otherwise, TEXT mode. (Recommended)
}

AUTO 모드는 일반적으로 시작하는 가장 쉬운 방법입니다. 엔진은 백엔드 대화형 AI 서비스에서 받는 데이터에 따라 자동으로 조정됩니다. 서비스가 자세한 단어 타이밍을 전송하면 부드러운 스트리밍이 이루어지며, 그렇지 않으면 텍스트 블록으로 표시되는 방식으로 우아하게 전환됩니다.

출력: UI가 받는 내용

MessageEngine은 최종적으로 애플리케이션(콜백을 통해)에 표시 준비가 된 메시지 객체 목록을 제공합니다:

export interface IMessageListItem {
  uid: number | string; // Who sent this? User's numeric UID or Agent's string/numeric UID (often 0 or a specific string like "Agent").
  turn_id: number; // Helps keep track of conversational turns.
  text: string; // The actual words to display.
  status: EMessageStatus; // The current status (IN_PROGRESS, END, INTERRUPTED).
}

UI 구성 요소는 단순히 이러한 객체의 목록을 렌더링하면 됩니다.

엔진 연결

일반적으로 MessageEngine는 주요 ConversationComponent 내에서 초기화하며, 아마도 Agora RTC 클라이언트가 준비되면 한 번 실행되는 useEffect 훅 내부에 위치할 것입니다.

// Inside ConversationComponent.tsx
const client = useRTCClient(); // Get the Agora client instance
const [messageList, setMessageList] = useState<IMessageListItem[]>([]);
const [currentInProgressMessage, setCurrentInProgressMessage] =
  useState<IMessageListItem | null>(null);
const messageEngineRef = useRef<MessageEngine | null>(null);
const agentUID = process.env.NEXT_PUBLIC_AGENT_UID || 'Agent'; // Get your agent's expected UID
useEffect(() => {
  // Only initialize once the client exists and we haven't already started the engine
  if (client && !messageEngineRef.current) {
    console.log('Initializing MessageEngine...');
    // Create the engine instance
    const engine = new MessageEngine(
      client,
      EMessageEngineMode.AUTO, // Use AUTO mode for adaptive streaming
      // This callback function is the critical link!
      // It receives the updated message list whenever something changes.
      (updatedMessages: IMessageListItem[]) => {
        // 1. Always sort messages by turn_id to ensure chronological order
        const sortedMessages = [...updatedMessages].sort(
          (a, b) => a.turn_id - b.turn_id
        );
        // 2. Find the *latest* message that's still streaming (if any)
        // We handle this separately for smoother UI updates during streaming.
        const inProgressMsg = sortedMessages.findLast(
          (msg) => msg.status === EMessageStatus.IN_PROGRESS
        );
        // 3. Update component state:
        //    - messageList gets all *completed* or *interrupted* messages.
        //    - currentInProgressMessage gets the single *latest* streaming message.
        setMessageList(
          sortedMessages.filter(
            (msg) => msg.status !== EMessageStatus.IN_PROGRESS
          )
        );
        setCurrentInProgressMessage(inProgressMsg || null);
      }
    );
    // Store the engine instance in a ref
    messageEngineRef.current = engine;
    // Start the engine's processing loop
    // legacyMode: false is recommended for newer setups
    messageEngineRef.current.run({ legacyMode: false });
    console.log('MessageEngine started.');
  }
  // Cleanup function: Stop the engine when the component unmounts
  return () => {
    if (messageEngineRef.current) {
      console.log('Cleaning up MessageEngine...');
      messageEngineRef.current.cleanup();
      messageEngineRef.current = null;
    }
  };
}, [client]); // Dependency array ensures this runs when the client is ready

그 중요한 콜백 함수를 자세히 살펴보겠습니다:

정렬: 메시지가 약간 동기화되지 않을 수 있습니다. turn_id로 정렬하면 이 문제를 해결합니다.
분리: 가장 마지막 IN_PROGRESS 메시지는 다른 메시지와 다르게 처리합니다. 이로써 UI는 전체 목록을 계속 재렌더링하지 않고도 특수 스트리밍 효과를 적용해 해당 메시지를 렌더링할 수 있습니다.
상태 업데이트: messageList (완료/중단된 메시지)와 currentInProgressMessage를 설정하면 React 재렌더링이 트리거되며, 새로운 데이터가 ConvoTextStream 컴포넌트로 전달됩니다.

UI 구축: `ConvoTextStream` 컴포넌트

이제 예제 ConvoTextStream 컴포넌트(components/ConvoTextStream.tsx)를 살펴보겠습니다. 이 컴포넌트의 역할은 ConversationComponent에서 메시지 데이터를 받아 채팅 인터페이스처럼 표시하는 것입니다.

입력(프로퍼티)

부모 컴포넌트(ConversationComponent)로부터 데이터를 필요합니다:

interface ConvoTextStreamProps {
  // All the messages that are done (completed or interrupted)
  messageList: IMessageListItem[];
  // The single message currently being streamed by the AI (if any)
  currentInProgressMessage?: IMessageListItem | null;
  // The UID of the AI agent (so we can style its messages differently)
  agentUID: string | number | undefined;
}

이 프로퍼티는 MessageEngine 콜백이 ConversationComponent에서 업데이트하는 상태 변수(messageList, currentInProgressMessage)에서 직접 가져옵니다.Core UX 기능좋은 채팅 UI는 단순히 텍스트를 표시하는 것 이상이 필요합니다. 이 예제는 다음과에 중점을 둡니다:

Smart Scrolling

사용자는 새로운 메시지가 도착했을 때 자신의 위치를 잃는 것을 싫어합니다. 단, 이미 하단에 있고 최신 메시지를 보고 싶을 때는 제외됩니다.

// Ref for the scrollable chat area
const scrollRef = useRef<HTMLDivElement>(null);
// State to track if we should automatically scroll down
const [shouldAutoScroll, setShouldAutoScroll] = useState(true);

// Function to force scroll to the bottom
const scrollToBottom = () => {
  scrollRef.current?.scrollTo({
    top: scrollRef.current.scrollHeight,
    behavior: 'smooth', // Optional: make it smooth
  });
};
// Detects when the user scrolls manually
const handleScroll = () => {
  if (!scrollRef.current) return;
  const { scrollHeight, scrollTop, clientHeight } = scrollRef.current;
  // Is the user within ~100px of the bottom?
  const isNearBottom = scrollHeight - scrollTop - clientHeight < 100;
  // Only auto-scroll if the user is near the bottom
  if (isNearBottom !== shouldAutoScroll) {
    setShouldAutoScroll(isNearBottom);
  }
};
// Effect to actually perform the auto-scroll when needed
useEffect(() => {
  // Check if a new message arrived OR if we should be auto-scrolling
  const hasNewMessage = messageList.length > prevMessageLengthRef.current; // Track previous length
  if ((hasNewMessage || shouldAutoScroll) && scrollRef.current) {
    scrollToBottom();
  }
  // Update previous length ref for next render
  prevMessageLengthRef.current = messageList.length;
}, [messageList, currentInProgressMessage?.text, shouldAutoScroll]); // Re-run when messages change or scroll state changes
// Add the onScroll handler to the scrollable div
// <div ref={scrollRef} onScroll={handleScroll} className="overflow-auto...">

이 논리는 다음과을 보장합니다:

사용자가 위로 스크롤하여 과거 내용을 읽을 때 뷰는 그대로 유지됩니다.
사용자가 화면 하단에 있을 때 새로운 메시지가 자동으로 화면에 표시됩니다.

스트리밍 중 스크롤링 제한 (선택적 개선 사항)

이전 useEffect 스크롤링 기능은 WORD 모드를 사용할 경우 각 단어 업데이트 시마다 트리거될 수 있습니다. 이는 화면이 떨리는 것처럼 느껴질 수 있습니다. 이 문제를 개선하기 위해 새로운 텍스트가 충분히 도착했을 때만 크게 스크롤되도록 할 수 있습니다.

// --- Add these refs ---
const prevMessageTextRef = useRef(''); // Track the text of the last in-progress message
const significantChangeScrollTimer = useRef<NodeJS.Timeout | null>(null); // Timer ref

// --- New function to check for significant change ---
const hasContentChangedSignificantly = (threshold = 20): boolean => {
  if (!currentInProgressMessage) return false;
  const currentText = currentInProgressMessage.text || '';
  const textLengthDiff = currentText.length - prevMessageTextRef.current.length;
  // Only trigger if a decent chunk of text arrived
  const hasSignificantChange = textLengthDiff >= threshold;
  // Update the ref *only if* it changed significantly
  if (
    hasSignificantChange ||
    currentInProgressMessage.status !== EMessageStatus.IN_PROGRESS
  ) {
    prevMessageTextRef.current = currentText;
  }
  return hasSignificantChange;
};
// --- Modify the scrolling useEffect ---
useEffect(() => {
  const hasNewCompleteMessage =
    messageList.length > prevMessageLengthRef.current;
  const streamingContentChanged = hasContentChangedSignificantly(); // Use the new check
  // Clear any pending scroll timer if conditions change
  if (significantChangeScrollTimer.current) {
    clearTimeout(significantChangeScrollTimer.current);
    significantChangeScrollTimer.current = null;
  }
  if (
    (hasNewCompleteMessage || (streamingContentChanged && shouldAutoScroll)) &&
    scrollRef.current
  ) {
    // Introduce a small delay to batch scrolls during rapid streaming
    significantChangeScrollTimer.current = setTimeout(() => {
      scrollToBottom();
      significantChangeScrollTimer.current = null;
    }, 50); // 50ms delay, adjust as needed
  }
  prevMessageLengthRef.current = messageList.length;
  // Cleanup timer on unmount
  return () => {
    if (significantChangeScrollTimer.current) {
      clearTimeout(significantChangeScrollTimer.current);
    }
  };
}, [messageList, currentInProgressMessage?.text, shouldAutoScroll]);

이 정교한 접근 방식은 스트리밍 메시지에 20자 이상이 추가되었는지 확인한 후 스크롤을 트리거하여 사용자의 경험을 더 부드럽게 만듭니다. 또한 빠르게 연속으로 발생하는 스크롤을 배치 처리하기 위해 작은 setTimeout을 사용합니다.

스트리밍 메시지 표시

currentInProgressMessage

를 언제 어떻게 표시할지 결정해야 합니다:

// Helper to decide if the streaming message should be shown
const shouldShowStreamingMessage = (): boolean => {
  return (
    // Is there an in-progress message?
    currentInProgressMessage !== null &&
    // Is it *actually* in progress?
    currentInProgressMessage.status === EMessageStatus.IN_PROGRESS &&
    // Does it have any text content yet?
    currentInProgressMessage.text.trim().length > 0
  );
};

// In the JSX, combine the lists for rendering:
const allMessagesToRender = [...messageList];
if (shouldShowStreamingMessage() && currentInProgressMessage) {
  // Add the streaming message to the end of the list to be rendered
  allMessagesToRender.push(currentInProgressMessage);
}
// Then map over `allMessagesToRender`
// {allMessagesToRender.map((message, index) => ( ... render message bubble ... ))}

이 기능은 스트리밍 메시지 버블이 비어 있지 않은 텍스트를 적극적으로 수신 중일 때만 표시되도록 보장합니다.

채팅 제어(열기/닫기 토글, 확장)

기본 UI 제어 기능은 사용 편의성을 향상시킵니다:

const [isOpen, setIsOpen] = useState(false); // Is the chat window visible?
const [isChatExpanded, setIsChatExpanded] = useState(false); // Is it in expanded mode?
const hasSeenFirstMessageRef = useRef(false); // Track if the user has interacted or seen the first message

// Toggle chat open/closed
const toggleChat = () => {
  const newState = !isOpen;
  setIsOpen(newState);
  // If opening, mark that the user has now 'seen' the chat
  if (newState) {
    hasSeenFirstMessageRef.current = true;
  }
};
// Toggle between normal and expanded height
const toggleChatExpanded = () => {
  setIsChatExpanded(!isChatExpanded);
};
// --- Auto-Open Logic ---
useEffect(() => {
  const hasAnyMessage = messageList.length > 0 || shouldShowStreamingMessage();
  // If there's a message, we haven't opened it yet automatically, and it's currently closed...
  if (hasAnyMessage && !hasSeenFirstMessageRef.current && !isOpen) {
    setIsOpen(true); // Open it!
    hasSeenFirstMessageRef.current = true; // Mark as seen/auto-opened
  }
}, [messageList, currentInProgressMessage, isOpen]); // Rerun when messages or open state change

이 기능에는 메시지가 처음 표시될 때 채팅 창을 자동으로 열도록 하는 논리가 포함되어 있지만, 사용자가 이전에 수동으로 닫거나 상호작용하지 않은 경우에만 적용됩니다.

메시지 렌더링

메시지 렌더링의 핵심 논리는 결합된 메시지 목록(allMessagesToRender)을 기반으로 각 메시지에 대해 스타일화된 div 요소를 생성합니다:

// Inside the map function:
<div
  key={`${message.turn_id}-${message.uid}-${index}`} // More robust key
  ref={index === allMessagesToRender.length - 1 ? lastMessageRef : null} // Ref for potential scrolling logic
  className={cn(
    'flex items-start gap-2 w-full mb-2', // Basic layout styles
    // Is this message from the AI? Align left. Otherwise, align right.
    message.uid === 0 || message.uid.toString() === agentUID
      ? 'justify-start'
      : 'justify-end'
  )}
>
  {/* Conditionally render Avatar based on sender if needed */}
  {/* {isAgent && <Avatar ... />} */}
  {/* Message Bubble */}
  <div
    className={cn(
      'max-w-[80%] rounded-xl px-3 py-2 text-sm md:text-base shadow-sm', // Slightly softer corners, shadow
      isAgent ? 'bg-gray-100 text-gray-800' : 'bg-blue-500 text-white',
      // Optional: Dim user message slightly if interrupted while IN_PROGRESS
      message.status === EMessageStatus.IN_PROGRESS && !isAgent && 'opacity-80'
    )}
  >
    {message.text}
  </div>
</div>

이 코드는 tailwindcss 및 shadcn 유틸리티를 사용하여 조건부 클래스를 적용합니다:

사용자 메시지를 오른쪽에 정렬하고 AI 메시지를 왼쪽에 정렬합니다.
배경 색상을 다르게 적용합니다.

모든 것을 통합하기: `ConversationComponent`에 통합

MessageEngine

이 초기화되고 상태를 관리 중이라면,

ConvoTextStream

을주

ConversationComponent

에 통합하는 것은 간단합니다.

MessageEngine 초기화: “엔진 연결” 섹션에 표시된 대로 MessageEngine을 useEffect 훅에 설정하고, messageList 및 currentInProgressMessage 상태 변수를 업데이트하는 콜백을 제공합니다.
ConvoTextStream 렌더링: ConversationComponent의 return JSX에 ConvoTextStream을 포함하고 필요한 props를 전달합니다:

// Inside ConversationComponent's return statement
// ... other UI elements like connection status, microphone button ...
return (
  <div className="relative flex flex-col h-full">
    {/* ... Other UI ... */}
    {/* Pass the state managed by MessageEngine's callback */}
    <ConvoTextStream
      messageList={messageList}
      currentInProgressMessage={currentInProgressMessage}
      agentUID={agentUID} // Pass the agent's UID
    />
    {/* ... Microphone Button etc ... */}
  </div>
);

MessageEngine RTC에서 데이터 흐름을 처리하고 ConversationComponent의 상태를 업데이트합니다. 이후 형식화된 데이터를 ConvoTextStream로 전달하여 표시합니다.

맞춤 설정: 스타일링 및 사용자 정의

제공된 ConvoTextStream는 시작점입니다. 앱의 디자인 시스템에 맞게 스타일을 조정하고 싶을 것입니다.

메시지 버블 스타일링

ConvoTextStream.tsx 내의 tailwindcss 클래스를 앱의 디자인 시스템에 맞게 수정합니다. 색상, 폰트, 패딩, 경계선 반경 등을 변경할 수 있습니다.

// Example: Change AI bubble color
    message.uid === 0 || message.uid.toString() === agentUID
  ? 'bg-purple-100 text-purple-900' // Changed from gray
  : 'bg-blue-500 text-white',

채팅 창 표시 설정

채팅 창 컨테이너(#chatbox div 및 그 자식 요소)의 위치(고정, 절대), 크기(w-96), 배경색(bg-white), 그림자(shadow-lg) 등을 조정합니다.

확장/축소 동작

toggleChatExpanded 함수와 관련된 조건부 클래스(isChatExpanded && ‘expanded’)를 수정하여 채팅 창이 확장될 때 크기 조정 또는 동작 방식을 변경할 수 있습니다. 화면 공간을 더 차지하거나 다른 방식으로 도킹되도록 설정할 수 있습니다.

내부 구조: 메시지 처리 흐름

궁금한 분들을 위해 Agora RTC 데이터 채널에서 단일 stream-message 이벤트를 처리하는 MessageEngine의 작동 방식을 약간 더 자세히 설명합니다:

이 과정은 내부 단계들을 보여줍니다: 원시 데이터를 수신하고, 이를 해독한 후, 내부 큐/스토어에 메시지를 업데이트하거나 생성하며, 마지막으로 React 컴포넌트 상태를 업데이트하기 위해 콜백 함수를 트리거합니다.

전체 컴포넌트 코드 (`ConvoTextStream.tsx`)

/* eslint-disable react-hooks/exhaustive-deps */
'use client';

import { useState, useEffect, useRef, useCallback } from 'react';
import { Button } from '@/components/ui/button';
import {
  MessageCircle,
  X,
  ChevronsUpDown, // Changed icon
  ArrowDownToLine, // Changed icon
  Expand, // Added icon for expand
  Shrink, // Added icon for shrink
} from 'lucide-react';
import { cn } from '@/lib/utils';
import { IMessageListItem, EMessageStatus } from '@/lib/message'; // Assuming types are here

interface ConvoTextStreamProps {
  messageList: IMessageListItem[];
  currentInProgressMessage?: IMessageListItem | null;
  agentUID: string | number | undefined; // Allow number or string
}

export default function ConvoTextStream({
  messageList,
  currentInProgressMessage = null,
  agentUID,
}: ConvoTextStreamProps) {
  const [isOpen, setIsOpen] = useState(false);
  const [shouldAutoScroll, setShouldAutoScroll] = useState(true);
  const scrollRef = useRef<HTMLDivElement>(null);
  const prevMessageLengthRef = useRef(messageList.length);
  const prevMessageTextRef = useRef('');
  const [isChatExpanded, setIsChatExpanded] = useState(false);
  const hasSeenFirstMessageRef = useRef(false);
  const significantChangeScrollTimer = useRef<NodeJS.Timeout | null>(null);

  // --- Scrolling Logic ---

  const scrollToBottom = useCallback(() => {
    if (scrollRef.current) {
      scrollRef.current.scrollTo({
        top: scrollRef.current.scrollHeight,
        behavior: 'smooth',
      });
    }
  }, []);

  const handleScroll = useCallback(() => {
    if (!scrollRef.current) return;
    const { scrollHeight, scrollTop, clientHeight } = scrollRef.current;
    const isNearBottom = scrollHeight - scrollTop - clientHeight < 150; // Increased threshold slightly
    if (isNearBottom !== shouldAutoScroll) {
      setShouldAutoScroll(isNearBottom);
    }
  }, [shouldAutoScroll]);

  const hasContentChangedSignificantly = useCallback(
    (threshold = 20): boolean => {
      if (!currentInProgressMessage) return false;
      const currentText = currentInProgressMessage.text || '';
      // Only compare if the message is actually in progress
      const baseText =
        currentInProgressMessage.status === EMessageStatus.IN_PROGRESS
          ? prevMessageTextRef.current
          : currentText;
      const textLengthDiff = currentText.length - baseText.length;
      const hasSignificantChange = textLengthDiff >= threshold;

      // Update ref immediately if it's a significant change or message finished/interrupted
      if (
        hasSignificantChange ||
        currentInProgressMessage.status !== EMessageStatus.IN_PROGRESS
      ) {
        prevMessageTextRef.current = currentText;
      }
      return hasSignificantChange;
    },
    [currentInProgressMessage]
  );

  useEffect(() => {
    const hasNewCompleteMessage =
      messageList.length > prevMessageLengthRef.current;
    // Check significance *only* if we should be auto-scrolling
    const streamingContentChanged =
      shouldAutoScroll && hasContentChangedSignificantly();

    if (significantChangeScrollTimer.current) {
      clearTimeout(significantChangeScrollTimer.current);
      significantChangeScrollTimer.current = null;
    }

    if (
      (hasNewCompleteMessage || streamingContentChanged) &&
      scrollRef.current
    ) {
      // Debounce scrolling slightly
      significantChangeScrollTimer.current = setTimeout(() => {
        scrollToBottom();
        significantChangeScrollTimer.current = null;
      }, 50);
    }

    prevMessageLengthRef.current = messageList.length;

    return () => {
      if (significantChangeScrollTimer.current) {
        clearTimeout(significantChangeScrollTimer.current);
      }
    };
  }, [
    messageList,
    currentInProgressMessage?.text,
    shouldAutoScroll,
    scrollToBottom,
    hasContentChangedSignificantly,
  ]);

  // --- Component Logic ---

  const shouldShowStreamingMessage = useCallback((): boolean => {
    return (
      currentInProgressMessage !== null &&
      currentInProgressMessage.status === EMessageStatus.IN_PROGRESS &&
      currentInProgressMessage.text.trim().length > 0
    );
  }, [currentInProgressMessage]);

  const toggleChat = useCallback(() => {
    const newState = !isOpen;
    setIsOpen(newState);
    if (newState) {
      hasSeenFirstMessageRef.current = true; // Mark as seen if manually opened
    }
  }, [isOpen]);

  const toggleChatExpanded = useCallback(() => {
    setIsChatExpanded(!isChatExpanded);
    // Attempt to scroll to bottom after expanding/shrinking
    setTimeout(scrollToBottom, 50);
  }, [isChatExpanded, scrollToBottom]);

  // Auto-open logic
  useEffect(() => {
    const hasAnyMessage =
      messageList.length > 0 || shouldShowStreamingMessage();
    if (hasAnyMessage && !hasSeenFirstMessageRef.current && !isOpen) {
      setIsOpen(true);
      hasSeenFirstMessageRef.current = true;
    }
  }, [
    messageList,
    currentInProgressMessage,
    isOpen,
    shouldShowStreamingMessage,
  ]);

  // Combine messages for rendering
  const allMessagesToRender = [...messageList];
  if (shouldShowStreamingMessage() && currentInProgressMessage) {
    allMessagesToRender.push(currentInProgressMessage);
  }

  // --- JSX ---
  return (
    // Use a more descriptive ID if needed, ensure z-index is appropriate
    <div
      id="agora-text-stream-chatbox"
      className="fixed bottom-24 right-4 md:right-8 z-50"
    >
      {isOpen ? (
        <div
          className={cn(
            'bg-white rounded-lg shadow-xl w-80 md:w-96 flex flex-col text-black transition-all duration-300 ease-in-out', // Adjusted width and added transition
            // Dynamic height based on expanded state
            isChatExpanded ? 'h-[60vh] max-h-[500px]' : 'h-80'
          )}
        >
          {/* Header */}
          <div className="p-2 border-b flex justify-between items-center shrink-0 bg-gray-50 rounded-t-lg">
            <Button
              variant="ghost"
              size="icon"
              onClick={toggleChatExpanded}
              aria-label={isChatExpanded ? 'Shrink chat' : 'Expand chat'}
            >
              {isChatExpanded ? (
                <Shrink className="h-4 w-4" />
              ) : (
                <Expand className="h-4 w-4" />
              )}
            </Button>
            <h3 className="font-semibold text-sm md:text-base">Conversation</h3>
            <Button
              variant="ghost"
              size="icon"
              onClick={toggleChat}
              aria-label="Close chat"
            >
              <X className="h-4 w-4" />
            </Button>
          </div>

          {/* Message Area */}
          <div
            className="flex-1 overflow-y-auto scroll-smooth" // Use overflow-y-auto
            ref={scrollRef}
            onScroll={handleScroll}
          >
            <div className="p-3 md:p-4 space-y-3">
              {allMessagesToRender.map((message, index) => {
                const isAgent =
                  message.uid === 0 ||
                  message.uid?.toString() === agentUID?.toString();
                return (
                  <div
                    key={`${message.turn_id}-${message.uid}-${index}`} // Use index as last resort for key part
                    className={cn(
                      'flex items-start gap-2 w-full',
                      isAgent ? 'justify-start' : 'justify-end'
                    )}
                  >
                    {/* Optional: Render avatar only for AI or based on settings */}
                    {/* {isAgent && <Avatar ... />} */}

                    {/* Message Bubble */}
                    <div
                      className={cn(
                        'max-w-[80%] rounded-xl px-3 py-2 text-sm md:text-base shadow-sm', // Slightly softer corners, shadow
                        isAgent
                          ? 'bg-gray-100 text-gray-800'
                          : 'bg-blue-500 text-white'
                      )}
                    >
                      {message.text}
                    </div>
                  </div>
                );
              })}
              {/* Add a small spacer at the bottom */}
              <div className="h-2"></div>
            </div>
          </div>
          {/* Optional Footer Area (e.g., for input later) */}
          {/* <div className="p-2 border-t shrink-0">...</div> */}
        </div>
      ) : (
        // Floating Action Button (FAB) to open chat
        <Button
          onClick={toggleChat}
          className="rounded-full w-14 h-14 flex items-center justify-center bg-blue-600 hover:bg-blue-700 text-white shadow-lg hover:scale-105 transition-all duration-200"
          aria-label="Open chat"
        >
          <MessageCircle className="h-6 w-6" />
        </Button>
      )}
    </div>
  );
}

이 예제 컴포넌트는 다음과을 제공합니다:

접힌 FAB (Floating Action Button) 상태.
매끄러운 전환을 지원하는 확장 가능한 채팅 창.
데바운싱을 적용한 개선된 스크롤링 논리.
사용자와 AI 간의 스타일 구분 강화.
기본 접근성 속성 (aria-label, aria-live).

스타일링, 아이콘 (lucide-react 사용됨) 및 특정 UX 동작을 애플리케이션의 요구사항에 맞게 조정해야 합니다.

전체 그림: 메인 앱 내에서의 연결 방식

전체 애플리케이션 맥락에서 데이터 흐름을 간단히 정리해 보겠습니다:

사용자 발언 / AI 생성: 오디오가 Agora 채널에서 발생합니다. 음성 텍스트 변환(STT) 및 AI 처리를 통해 트랜스크립션 데이터가 생성됩니다.
RTC 데이터 채널: 이 데이터(사용자 트랜스크립션, AI 트랜스크립션, 인터럽트)는 Agora RTC 데이터 채널을 통해 전송됩니다.
MessageEngine 수신: 초기화된 MessageEngine이 이 스트림 메시지 이벤트를 수신합니다.
엔진 처리: 데이터를 해독하고 메시지 상태(IN_PROGRESS, END 등)를 관리하며 순서대로 정렬된 목록을 유지합니다.
콜백 트리거: 엔진은 초기화 시 제공한 콜백 함수를 호출하며 업데이트된 IMessageListItem[]를 전달합니다.
ConversationComponent 상태 업데이트: 콜백 함수는 이 목록을 처리(정렬, 진행 중 분리)하고 setMessageList 및 setCurrentInProgressMessage를 호출합니다.
React 재렌더링: 상태 업데이트는 ConversationComponent의 재렌더링을 트리거합니다.
ConvoTextStream 프로프스 수신: ConvoTextStream 컴포넌트는 새로운 messageList 및 currentInProgressMessage를 프로프스로 수신합니다.
UI 업데이트: ConvoTextStream는 새로운 텍스트를 렌더링하고 적절한 스타일(예: 펄스 애니메이션)을 적용하며 스크롤링을 처리합니다.

이 사이클은 대화 진행에 따라 빠르게 반복되어 실시간 텍스트 스트리밍 효과를 생성합니다.

다음 단계

이제 아고라 대화형 AI 애플리케이션에 매끄러운 실시간 텍스트 스트리밍을 추가할 수 있는 도구와 이해를 갖추셨습니다! 이는 단순히 시각적 추가 기능이 아니라 사용성과 접근성을 크게 향상시킵니다.

다음 단계:

통합: MessageEngine 초기화를 ConversationComponent에 추가하고 ConvoTextStream(또는 사용자 정의 버전)를 렌더링합니다.
맞춤 설정: ConvoTextStream을 앱의 디자인 언어와 완벽하게 일치하도록 스타일링합니다. 애니메이션과 스크롤링 동작을 조정하여 최상의 사용자 경험을 제공합니다.
테스트: 다양한 시나리오에서 테스트해 보세요 — 긴 메시지, 빠른 중단, 소음이 많은 환경(STT 테스트 시).
최적화: 테스트 결과를 바탕으로 백엔드의 VAD 설정을 조정하거나 UI 동작을 미세 조정하세요.

이 기능과 관련된 Agora의 특정 기능에 대해 더 자세히 알아보려면 “Live Subtitles” 공식 문서를 참고하세요. 이 문서는 기본 데이터 채널 메커니즘을 설명합니다.

즐거운 개발 되시길 바라며, 더 매력적이고 접근성 높은 대화형 경험을 만들어 보세요!

‍

Learn more about Agora's video and voice solutions

Ready to chat through your real-time video and voice needs? We're here to help! Current Twilio customers get up to 2 months FREE.

Complete the form, and one of our experts will be in touch.

Try Agora for Free

Try for Free

TEN

App Builder

유연한 강의실

SDK 다운로드

지원 계획 및 가격

대화형 AI 앱에 스트리밍 자막을 추가하세요

블루프린트: 텍스트 스트리밍의 역할

데이터 추적: 메시지 라이프사이클

데이터 해독: 메시지 유형

사용자 트랜스크립션(사용자가 말한 내용)

에이전트 트랜스크립션 (AI가 말하는 내용)

메시지 중단

`MessageEngine`을 만나보세요: 텍스트 스트리밍의 핵심

엔진 내부의 핵심 개념

메시지 상태: 완료되었나요?

엔진 모드: 얼마나 세분화되게 설정하시겠습니까?

출력: UI가 받는 내용

엔진 연결

UI 구축: `ConvoTextStream` 컴포넌트

입력(프로퍼티)

Smart Scrolling

스트리밍 중 스크롤링 제한 (선택적 개선 사항)

스트리밍 메시지 표시

채팅 제어(열기/닫기 토글, 확장)

메시지 렌더링

모든 것을 통합하기: `ConversationComponent`에 통합

맞춤 설정: 스타일링 및 사용자 정의

메시지 버블 스타일링

채팅 창 표시 설정

확장/축소 동작

내부 구조: 메시지 처리 흐름

전체 컴포넌트 코드 (`ConvoTextStream.tsx`)

전체 그림: 메인 앱 내에서의 연결 방식

다음 단계

Learn more about Agora's video and voice solutions

Try Agora for Free

TEN

App Builder

유연한 강의실

SDK 다운로드

지원 계획 및 가격

블루프린트: 텍스트 스트리밍의 역할

데이터 추적: 메시지 라이프사이클

데이터 해독: 메시지 유형

사용자 트랜스크립션(사용자가 말한 내용)

에이전트 트랜스크립션 (AI가 말하는 내용)

메시지 중단

MessageEngine을 만나보세요: 텍스트 스트리밍의 핵심

엔진 내부의 핵심 개념

메시지 상태: 완료되었나요?

엔진 모드: 얼마나 세분화되게 설정하시겠습니까?

출력: UI가 받는 내용

엔진 연결

UI 구축: ConvoTextStream 컴포넌트

입력(프로퍼티)

Smart Scrolling

스트리밍 중 스크롤링 제한 (선택적 개선 사항)

스트리밍 메시지 표시

채팅 제어(열기/닫기 토글, 확장)

메시지 렌더링

모든 것을 통합하기: ConversationComponent에 통합

맞춤 설정: 스타일링 및 사용자 정의

메시지 버블 스타일링

채팅 창 표시 설정

확장/축소 동작

내부 구조: 메시지 처리 흐름

전체 컴포넌트 코드 (ConvoTextStream.tsx)

전체 그림: 메인 앱 내에서의 연결 방식

다음 단계

Learn more about Agora's video and voice solutions

Try Agora for Free

`MessageEngine`을 만나보세요: 텍스트 스트리밍의 핵심

UI 구축: `ConvoTextStream` 컴포넌트

모든 것을 통합하기: `ConversationComponent`에 통합

전체 컴포넌트 코드 (`ConvoTextStream.tsx`)