A2E Streaming Avatar Solution Guide

Welcome to the A2E Streaming Avatar Solution guide! This tutorial will walk you through the key concepts you need to know to integrate our real-time interactive avatars into your application. The full API document of our streaming avatar solution is here. We’ll cover the following topic:

1. How to Obtain Video and Audio Streams from the Avatar

2. How to Control What the Avatar Says

3. How You Get Charged

🔗 1. How to Obtain Video and Audio Streams from the Streaming Avatar

The A2E streaming avatar delivers video and audio streams through Agora’s global distribution network, ensuring ultra-low latency (typically under 1 second) and high concurrency. Here’s a quick overview of how the integration works:


Avatar Streaming Architecture

Developer: Builds an app (web, mobile, or desktop) for end users.

User Application: Connects to the avatar stream via Agora’s SDK or IO-Extension

Agora Network: Ensures stable, high-performance streaming.

To get started, you’ll need to integrate Agora’s SDK into your application. Agora offers SDKs for a variety of programming languages, including Python, JavaScript, iOS, Android, Java, React, Flutter, Electron, and C++. You can find Agora’s SDK documentation here:

👉 Agora SDK Documentation

👉 Agora IO-Extension

https://docs.agora.io/en/sdks

Joining an Agora Room

To see and hear the avatar, your app needs to join an Agora room. Here’s how to do that:

1. Generate a Room Token: Use our API to obtain a room token. This token acts as a key to access the room where the avatar stream is hosted.

2. Use Tokens Wisely:

• Each token is user-specific.

Do not reuse tokens across multiple users, as this will cause users to join the same room and see the same avatar stream.

• Tokens are time-limited. If a token expires, the user will be disconnected.

💡 Tip: Be mindful of token expiration times. If a user finishes the intended task but you do not “leave” the room, you’ll be charged for the entire duration until the token expires.

🗣️ 2. How to Control What the Avatar Says

You can control the avatar’s speech using two modes:

Mode 1: Direct Speak

This mode is ideal if you already have a large language model (LLM) generating responses in your application, and you simply need the avatar to act as a voice puppet.

How it Works:

Send a plain text message to the A2E server via an HTTP POST request, and the avatar will start speaking with accurate lip-syncing.

Use Case:

• Chatbots

• Interactive voice assistants

• Guided tutorials or narration

Mode 2: Ask a Question

If you don’t have an LLM and want to add a basic Q&A feature, our solution provides a built-in LLM.

1. Set the Context:

Customize the avatar’s responses by setting the prompt (referred to as “context”) on our server.

2. Send a Question:

After setting the context, send a plain text question via an HTTP POST request. Our server will generate a response using the LLM and return it to your application, where the avatar will speak the answer.

💰 3. How Do You Get Charged?

We’ve designed the A2E streaming avatar solution to be affordable and scalable.

Billing Structure

Usage-Based Billing:

You are charged based on the total time (rounded to the nearest minute) that your application uses the avatar stream. The price is 15 coins per minute. Please refer to this for the price of coins. For example, if you choose the 100k coins package, your streaming avatar cost is $0.09 per minute. The more you buy, the more you save.

Concurrency Limits:

Each user is allocated 1 concurrency (one active stream at a time). If you need more concurrent streams, we offer additional concurrency packages for a monthly fee.