Skip to content

API Reference

This document describes the core APIs of Volcano Engine Real-Time Conversational AI, including StartVoiceChat, UpdateVoiceChat, StopVoiceChat, and related configuration parameters.

API Overview

APIDescription
StartVoiceChatStart an AI voice session, creating an AI agent in the specified room
StopVoiceChatStop a voice session and release AI agent resources
UpdateVoiceChatUpdate an ongoing voice session (interrupt, custom announcements, etc.)

All APIs require V4 signing with AccessKey. See Authentication Proxy Service.

StartVoiceChat

Starts an AI voice session and creates an AI agent in the specified room.

Request endpoint: POST https://rtc.volcengineapi.com?Action=StartVoiceChat&Version=2024-12-01

Request Parameters

ParameterTypeRequiredDescription
AppIdstringYesRTC application ID
RoomIdstringYesRoom ID
TaskIdstringYesTask ID used to identify the session
AgentConfigobjectYesAgent configuration, see AgentConfig
ConfigobjectYesSession configuration, including ASR, TTS, LLM parameters, see Config

AgentConfig

Agent configuration:

ParameterTypeRequiredDescription
TargetUserIdstring[]YesTarget user ID list (client user IDs)
UserIdstringYesAgent user ID (AI Bot identifier)
WelcomeMessagestringNoWelcome message, auto-played at session start
EnableConversationStateCallbackbooleanNoEnable conversation state callback for listening/thinking/speaking states
AnsModenumberNoAI noise reduction mode (0: off, 1: low, 2: medium, 3: high, recommended 3)
VoicePrintobjectNoVoiceprint recognition: Mode (0: off, 1: on), IdList (voiceprint ID list)

Config

Session configuration:

ParameterTypeDescription
ASRConfigobjectSpeech recognition configuration, see ASRConfig
TTSConfigobjectSpeech synthesis configuration, see TTSConfig
LLMConfigobjectLarge language model configuration, see LLMConfig
InterruptModenumberInterrupt mode (0: semantic interrupt, 1: manual interrupt)

Response

json
{
  "ResponseMetadata": {
    "RequestId": "20250104123456789abcdef01234567",
    "Action": "StartVoiceChat",
    "Version": "2024-12-01",
    "Service": "rtc",
    "Region": "cn-north-1"
  },
  "Result": {}
}

On success, Result is an empty object. On failure, ResponseMetadata.Error contains the error information.

Note

StartVoiceChat is used to start an AI agent in an existing room.

Official documentation: StartVoiceChat

StopVoiceChat

Stops a voice session and releases AI agent resources.

Request endpoint: POST https://rtc.volcengineapi.com?Action=StopVoiceChat&Version=2024-12-01

Request Parameters

ParameterTypeRequiredDescription
AppIdstringYesRTC application ID (same as StartVoiceChat)
RoomIdstringYesRoom ID (same as StartVoiceChat)
TaskIdstringYesTask ID (same as StartVoiceChat)

Response

json
{
  "ResponseMetadata": {
    "RequestId": "20250104123456789abcdef01234567",
    "Action": "StopVoiceChat",
    "Version": "2024-12-01"
  },
  "Result": {}
}

On success, Result is an empty object. On failure, ResponseMetadata.Error contains the error information.

Official documentation: StopVoiceChat

UpdateVoiceChat

Updates an ongoing voice session. Supports interruption, function calling, and custom announcements.

Request endpoint: POST https://rtc.volcengineapi.com?Action=UpdateVoiceChat&Version=2024-12-01

Request Parameters

ParameterTypeRequiredDescription
AppIdstringYesRTC application ID
RoomIdstringYesRoom ID
TaskIdstringYesTask ID
CommandstringYesCommand type
MessagestringNoAnnouncement text (max 200 characters)
InterruptModenumberNoAnnouncement priority

Command Types

CommandDescription
InterruptInterrupt current agent output
ExternalTextToSpeechCustom text-to-speech playback
FunctionCallResultReturn function calling result

InterruptMode Priority

Used with ExternalTextToSpeech to specify announcement priority:

ValueDescription
1High priority: stop current interaction and play immediately
2Medium priority: play after current interaction ends
3Low priority: drop if interaction is in progress

Examples

Interrupt the agent:

json
{
  "AppId": "your-app-id",
  "RoomId": "room-uuid",
  "TaskId": "task-id",
  "Command": "Interrupt"
}

Custom announcement:

json
{
  "AppId": "your-app-id",
  "RoomId": "room-uuid",
  "TaskId": "task-id",
  "Command": "ExternalTextToSpeech",
  "Message": "You have a new message",
  "InterruptMode": 1
}

Response

json
{
  "ResponseMetadata": {
    "RequestId": "20250104123456789abcdef01234567",
    "Action": "UpdateVoiceChat",
    "Version": "2024-12-01"
  },
  "Result": {}
}

On success, Result is an empty object. On failure, ResponseMetadata.Error contains the error information.

Official documentation: UpdateVoiceChat

Configuration Details

The following configurations are used in the Config parameter of StartVoiceChat.

ASRConfig

Speech recognition configuration:

ParameterTypeRequiredDescription
ProviderstringYesService provider, fixed as volcano
ProviderParamsobjectYesProvider-specific parameters
VADConfigobjectNoVoice activity detection configuration
VolumeGainnumberNoVolume gain (0.0–1.0), default 0.5
TurnDetectionModenumberNoTurn detection mode
InterruptConfigobjectNoInterrupt configuration

ProviderParams:

ParameterTypeDescription
AppIdstringASR application ID
ModestringRecognition mode: smallmodel or bigmodel
ClusterstringService cluster, default volcengine_streaming_common
contextstringHotword context (JSON format)
boosting_table_idstringHotword table ID
correct_table_idstringCorrection table ID

VADConfig (Voice Activity Detection):

ParameterTypeDescription
SilenceTimenumberSilence duration threshold (ms), default 600
SpeechTimenumberSpeech duration threshold (ms)
PrefixTimenumberPrefix duration (ms)
SuffixTimenumberSuffix duration (ms)
SensitivitynumberSensitivity
AIVADbooleanEnable AI VAD

InterruptConfig:

ParameterTypeDescription
InterruptSpeechDurationnumberInterrupt speech duration (ms), default 400
InterruptKeywordsstring[]Semantic interrupt keyword list

Example configuration:

json
{
  "Provider": "volcano",
  "ProviderParams": {
    "AppId": "your-asr-app-id",
    "Mode": "smallmodel",
    "Cluster": "volcengine_streaming_common"
  },
  "VADConfig": {
    "SilenceTime": 600
  },
  "VolumeGain": 0.5,
  "TurnDetectionMode": 0,
  "InterruptConfig": {
    "InterruptSpeechDuration": 400,
    "InterruptKeywords": ["stop", "wait"]
  }
}

TTSConfig

Speech synthesis configuration:

ParameterTypeRequiredDescription
ProviderstringYesService provider, fixed as volcano
ProviderParamsobjectYesProvider-specific parameters
IgnoreBracketTextnumber[]NoBracket types to ignore

ProviderParams:

ParameterTypeDescription
appobjectApplication config
audioobjectAudio config
ResourceIdstringTTS resource ID
AdditionsobjectAdditional config

app configuration:

ParameterTypeDescription
appidstringTTS application ID
tokenstringTTS application token
clusterstringService cluster, default volcano_tts

audio configuration:

Parameters vary slightly by TTS mode:

ParameterTypeDescriptionApplicable Mode
voice_typestringVoice typeAll modes
volume_rationumberVolume (0.5–2.0)All modes
speed_rationumberSpeech rate (0.5–2.0)standard
pitch_rationumberPitch (0.5–2.0)standard
speech_rationumberSpeech rate (0.5–2.0)bigtts
pitch_ratenumberPitch ratebigtts
speech_ratenumberSpeech ratebidirection
emotionstringEmotion: happy, sad, angry, neutralVoices with emotion support
emotion_strengthnumberEmotion strength (0.0–1.0)With emotion

TTS Modes

  • standard: Standard mode, uses speed_ratio, pitch_ratio
  • bigtts: Large model TTS, uses speech_ratio, pitch_rate
  • bidirection: Bidirectional streaming, uses speech_rate, supports Additions config

Common voices:

Voice IDDescription
BV033_streamingFemale, gentle
BV001_streamingMale, magnetic
BV700_streamingFemale, sweet
BV406_streamingMale, calm

More voices: Volcano Engine TTS Voice List

Example configuration:

json
{
  "Provider": "volcano",
  "ProviderParams": {
    "app": {
      "appid": "your-tts-app-id",
      "token": "your-tts-token",
      "cluster": "volcano_tts"
    },
    "audio": {
      "voice_type": "BV033_streaming",
      "speed_ratio": 1.2,
      "pitch_ratio": 1.1,
      "volume_ratio": 1.0,
      "emotion": "happy",
      "emotion_strength": 0.8
    },
    "ResourceId": "your-resource-id"
  }
}

LLMConfig

Large language model configuration:

ParameterTypeRequiredDescription
ModestringYesMode: ArkV3 (Ark) or CustomLLM (custom)
UrlstringCustomLLM onlyCustomLLM callback URL
APIKeystringNoAPI authentication key
EndPointIdstringArkV3 onlyArk model endpoint ID
ModelNamestringNoModel name
SystemMessagesstring[]NoSystem prompts
UserPromptsobject[]NoPreset conversation history
TemperaturenumberNoSampling temperature (0.0–1.0), default 0.5
TopPnumberNoTop-p sampling (0.0–1.0), default 0.9
MaxTokensnumberNoMax tokens, default 256
HistoryLengthnumberNoNumber of history turns to keep, default 15
EnableRoundIdbooleanNoEnable round ID
VisionConfigobjectNoVision understanding: Enable (boolean), SnapshotConfig (object)
CustomstringNoCustom parameters (JSON string), passed through to CustomLLM

UserPrompts (Preset conversation history):

json
[
  { "Role": "assistant", "Content": "Hello! How can I help you?" },
  { "Role": "user", "Content": "Hello" }
]

CustomLLM mode example:

json
{
  "Mode": "CustomLLM",
  "Url": "https://your-server.com/chat-stream",
  "APIKey": "your-api-key",
  "ModelName": "qwen-flash",
  "Temperature": 0.5,
  "TopP": 0.9,
  "MaxTokens": 256,
  "HistoryLength": 15,
  "EnableRoundId": true,
  "VisionConfig": {
    "Enable": false
  },
  "UserPrompts": [
    { "Role": "assistant", "Content": "Hi, I'm your assistant. Nice to meet you!" }
  ]
}

ArkV3 mode example:

json
{
  "Mode": "ArkV3",
  "EndPointId": "your-endpoint-id",
  "Temperature": 0.7,
  "MaxTokens": 512
}

CustomLLM Callback

When using CustomLLM mode, Volcano Engine sends user speech recognition results to your custom service.

Callback Flow

User speech → Volcano Engine ASR → CustomLLM service → Volcano Engine TTS → User

Request Format

Request from Volcano Engine to your CustomLLM service:

http
POST /chat-stream HTTP/1.1
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "messages": [
    {"role": "system", "content": "You are an intelligent assistant"},
    {"role": "user", "content": "Hello"}
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 256,
  "device_id": "custom-device-id"
}

Request fields:

FieldDescription
messagesConversation history in OpenAI format
streamFixed true, requires streaming response
temperatureSampling temperature
max_tokensMaximum generation length
device_idCustom parameter, passed through from LLMConfig.Custom

Response Format

Response must follow OpenAI SSE format:

data: {"id":"resp-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}],"model":"qwen-flash","created":1704355200}

data: {"id":"resp-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}],"model":"qwen-flash","created":1704355200}

data: {"id":"resp-1","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":"stop"}],"model":"qwen-flash","created":1704355200}

data: [DONE]

Response requirements:

  • Must return SSE streaming response
  • Content-Type: text/event-stream
  • Each line starts with data:
  • Last line must be data: [DONE]

Official documentation: CustomLLM Integration

RTC Token

Clients need a token to join an RTC room. Tokens are generated server-side using AppKey.

Token Structure

Token = Version + AppId + Base64(Message + Signature)
  • Version: fixed value 001
  • AppId: 24-character application identifier
  • Message: binary-encoded payload (RoomId, UserId, expiry time, privileges)
  • Signature: HMAC-SHA256 signature using AppKey

Token Privileges

PrivilegeDescription
PrivPublishStreamPublish audio/video
PrivSubscribeStreamSubscribe to streams

Validity

Default validity is 24 hours (86,400 seconds). Must be regenerated after expiry.

Example

typescript
import { AccessToken } from './rtctoken'

const token = new AccessToken(appId, appKey, roomId, userId)
const expireAt = Math.floor(Date.now() / 1000) + 24 * 3600
token.addPrivilege('PrivPublishStream', expireAt)
token.addPrivilege('PrivSubscribeStream', expireAt)
token.expireTime(expireAt)
const tokenString = token.serialize()

For token generation libraries, see Installation and Testing - Generating an RTC Token.

Error Codes

Response Format

json
{
  "ResponseMetadata": {
    "RequestId": "xxx",
    "Action": "StartVoiceChat",
    "Error": {
      "Code": "InvalidParameter",
      "Message": "Parameter AppId must not be empty"
    }
  }
}

Common Error Codes

Error CodeHTTP StatusDescription
MissingParameter400Missing required parameter
InvalidParameter400Invalid parameter format
MissingRequestInfo400Missing request info
InvalidTimestamp400Invalid or expired timestamp
InvalidAuthorization400Invalid Authorization header
InvalidCredential400Invalid credential format
InvalidAccessKey401Invalid AccessKey
SignatureDoesNotMatch401Signature verification failed
InvalidSecretToken401Invalid or expired STS token
AccessDenied403Insufficient IAM permissions
ServiceNotFound404Service not found
InvalidActionOrVersion404Invalid API Action or Version
FlowLimitExceeded429Rate limit exceeded
InternalError500Internal error
InternalServiceError502Gateway error
ServiceUnavailableTemp503Service temporarily unavailable
InternalServiceTimeout504Service timeout

Business Error Codes

Error CodeDescription
RoomNotExistRoom does not exist
TaskNotExistTask does not exist
InvalidTokenRTC token is invalid or expired

Official documentation: Common Error Codes