Skip to content

API Reference

This document describes the core APIs of Volcano Engine Real-Time Conversational AI, including StartVoiceChat, UpdateVoiceChat, StopVoiceChat, and related configuration parameters.

StartVoiceChat

Starts a voice session and returns RTC connection credentials.

Request endpoint: POST https://rtc.volcengineapi.com?Action=StartVoiceChat&Version=2024-12-01

Request headers: Requests must be signed using AccessKey. See Authentication Proxy Service.

Request Parameters

ParameterTypeRequiredDescription
AppIdstringYesRTC application ID
RoomIdstringYesRoom ID
TaskIdstringYesTask ID used to identify the session
AgentConfigobjectYesAgent configuration
ConfigobjectYesSession configuration, including ASR, TTS, LLM parameters

AgentConfig

Agent configuration:

ParameterTypeRequiredDescription
TargetUserIdstring[]YesTarget user ID list
UserIdstringYesAgent user ID
WelcomeMessagestringNoWelcome message
EnableConversationStateCallbackbooleanNoEnable conversation state callback
AnsModenumberNoNoise reduction mode (0–3)
VoicePrintobjectNoVoiceprint recognition settings

VoicePrint configuration:

ParameterTypeDescription
ModenumberVoiceprint mode (0: disabled, 1: enabled)
IdListstring[]Voiceprint ID list

Config

Session configuration with the following sub-sections:

{
  "ASRConfig": { ... },
  "TTSConfig": { ... },
  "LLMConfig": { ... },
  "InterruptMode": 0
}
ParameterTypeDescription
ASRConfigobjectSpeech recognition configuration
TTSConfigobjectSpeech synthesis configuration
LLMConfigobjectLarge language model configuration
InterruptModenumberInterrupt mode (0: semantic interrupt, 1: manual interrupt)

Response

{
  "ResponseMetadata": {
    "RequestId": "20250104123456789abcdef01234567",
    "Action": "StartVoiceChat",
    "Version": "2024-12-01",
    "Service": "rtc",
    "Region": "cn-north-1"
  },
  "Result": {
    "AppId": "your-app-id",
    "RoomId": "room-uuid",
    "UserId": "user-uuid",
    "Token": "rtc-token..."
  }
}
FieldDescription
Result.AppIdRTC application ID
Result.RoomIdRTC room ID
Result.UserIdRTC user ID
Result.TokenRTC access token (valid for 24 hours)

Official documentation: StartVoiceChat

StopVoiceChat

Stops a voice session and releases resources.

Request endpoint: POST https://rtc.volcengineapi.com?Action=StopVoiceChat&Version=2024-12-01

Request Parameters

ParameterTypeRequiredDescription
AppIdstringYesRTC application ID
RoomIdstringYesRoom ID
TaskIdstringYesTask ID

Response

{
  "ResponseMetadata": {
    "RequestId": "20250104123456789abcdef01234567",
    "Action": "StopVoiceChat",
    "Version": "2024-12-01"
  },
  "Result": {}
}

Official documentation: StopVoiceChat

UpdateVoiceChat

Updates an ongoing voice session. Supports interruption, function calling, and custom announcements.

Request endpoint: POST https://rtc.volcengineapi.com?Action=UpdateVoiceChat&Version=2024-12-01

Request Parameters

ParameterTypeRequiredDescription
AppIdstringYesRTC application ID
RoomIdstringYesRoom ID
TaskIdstringYesTask ID
CommandstringYesCommand type
MessagestringNoAnnouncement text (max 200 characters)
InterruptModenumberNoAnnouncement priority

Command Types

CommandDescription
InterruptInterrupt current agent output
ExternalTextToSpeechCustom text-to-speech playback
FunctionCallResultReturn function calling result

InterruptMode Priority

Used with ExternalTextToSpeech:

ValueDescription
1High priority: stop current interaction and play immediately
2Medium priority: play after current interaction ends
3Low priority: drop if interaction is in progress

Examples

Interrupt the agent:

{
  "AppId": "your-app-id",
  "RoomId": "room-uuid",
  "TaskId": "task-id",
  "Command": "Interrupt"
}

Custom announcement:

{
  "AppId": "your-app-id",
  "RoomId": "room-uuid",
  "TaskId": "task-id",
  "Command": "ExternalTextToSpeech",
  "Message": "You have a new message",
  "InterruptMode": 1
}

Response

{
  "ResponseMetadata": {
    "RequestId": "20250104123456789abcdef01234567",
    "Action": "UpdateVoiceChat",
    "Version": "2024-12-01"
  },
  "Result": {}
}

Official documentation: UpdateVoiceChat

ASRConfig

Speech recognition configuration:

ParameterTypeRequiredDescription
ProviderstringYesService provider, fixed as volcano
ProviderParamsobjectYesProvider-specific parameters
VADConfigobjectNoVoice activity detection configuration
VolumeGainnumberNoVolume gain (0.0–1.0), default 0.5
TurnDetectionModenumberNoTurn detection mode
InterruptConfigobjectNoInterrupt configuration

ProviderParams

ParameterTypeDescription
AppIdstringASR application ID
ModestringRecognition mode: smallmodel or bigmodel
ClusterstringService cluster, default volcengine_streaming_common
contextstringHotword context (JSON format)
boosting_table_idstringHotword table ID
correct_table_idstringCorrection table ID

VADConfig

Voice activity detection configuration:

ParameterTypeDescription
SilenceTimenumberSilence duration threshold (ms), default 600
SpeechTimenumberSpeech duration threshold (ms)
PrefixTimenumberPrefix duration (ms)
SuffixTimenumberSuffix duration (ms)
SensitivitynumberSensitivity
AIVADbooleanEnable AI VAD

InterruptConfig

Interrupt configuration:

ParameterTypeDescription
InterruptSpeechDurationnumberInterrupt speech duration (ms), default 400
InterruptKeywordsstring[]Semantic interrupt keyword list

Example:

{
  "Provider": "volcano",
  "ProviderParams": {
    "AppId": "your-asr-app-id",
    "Mode": "smallmodel",
    "Cluster": "volcengine_streaming_common"
  },
  "VADConfig": {
    "SilenceTime": 600
  },
  "VolumeGain": 0.5,
  "TurnDetectionMode": 0,
  "InterruptConfig": {
    "InterruptSpeechDuration": 400,
    "InterruptKeywords": ["stop", "wait"]
  }
}

TTSConfig

Speech synthesis configuration:

ParameterTypeRequiredDescription
ProviderstringYesService provider, fixed as volcano
ProviderParamsobjectYesProvider-specific parameters
IgnoreBracketTextnumber[]NoBracket types to ignore

ProviderParams

ParameterTypeDescription
appobjectApplication config
audioobjectAudio config
ResourceIdstringTTS resource ID
AdditionsobjectAdditional config

App configuration:

ParameterTypeDescription
appidstringTTS application ID
tokenstringTTS application token
clusterstringService cluster, default volcano_tts

Audio configuration:

ParameterTypeDescriptionRange
voice_typestringVoice typeSee voice list
speed_rationumberSpeech rate0.5–2.0, default 1.0
pitch_rationumberPitch0.5–2.0, default 1.0
volume_rationumberVolume0.5–2.0, default 1.0
emotionstringEmotionhappy, sad, angry, neutral
emotion_strengthnumberEmotion strength0.0–1.0, default 0.8

Common voices:

Voice IDDescription
BV033_streamingFemale, gentle
BV001_streamingMale, magnetic
BV700_streamingFemale, sweet
BV406_streamingMale, calm

More voices: Volcano Engine TTS Voice List

Example:

{
  "Provider": "volcano",
  "ProviderParams": {
    "app": {
      "appid": "your-tts-app-id",
      "token": "your-tts-token",
      "cluster": "volcano_tts"
    },
    "audio": {
      "voice_type": "BV033_streaming",
      "speed_ratio": 1.2,
      "pitch_ratio": 1.1,
      "volume_ratio": 1.0,
      "emotion": "happy",
      "emotion_strength": 0.8
    },
    "ResourceId": "your-resource-id"
  }
}

LLMConfig

Large language model configuration:

ParameterTypeRequiredDescription
ModestringYesArkV3 or CustomLLM
UrlstringCustomLLM onlyCustomLLM callback URL
APIKeystringNoAPI authentication key
EndPointIdstringArkV3 onlyArk model endpoint ID
ModelNamestringNoModel name
SystemMessagesstring[]NoSystem prompts
UserPromptsobject[]NoPreset conversation history
TemperaturenumberNoSampling temperature (0.0–1.0), default 0.5
TopPnumberNoTop-p sampling (0.0–1.0), default 0.9
MaxTokensnumberNoMax tokens, default 256
HistoryLengthnumberNoNumber of history turns to keep, default 15
EnableRoundIdbooleanNoEnable round ID
VisionConfigobjectNoVision understanding config
CustomstringNoCustom parameters (JSON string), passed through

VisionConfig

ParameterTypeDescription
EnablebooleanEnable vision understanding
SnapshotConfigobjectSnapshot configuration

UserPrompts

Preset conversation history:

[
  { "Role": "assistant", "Content": "Hello! How can I help you?" },
  { "Role": "user", "Content": "Hello" }
]

CustomLLM example:

{
  "Mode": "CustomLLM",
  "Url": "https://your-server.com/chat-stream",
  "APIKey": "your-api-key",
  "ModelName": "qwen-flash",
  "Temperature": 0.5,
  "TopP": 0.9,
  "MaxTokens": 256,
  "HistoryLength": 15,
  "EnableRoundId": true,
  "VisionConfig": {
    "Enable": false
  },
  "UserPrompts": [
    { "Role": "assistant", "Content": "Hi, I’m your assistant. Nice to meet you!" }
  ]
}

ArkV3 example:

{
  "Mode": "ArkV3",
  "EndPointId": "your-endpoint-id",
  "Temperature": 0.7,
  "MaxTokens": 512
}

CustomLLM Callback

When using CustomLLM mode, Volcano Engine sends ASR results to the custom service.

Callback Flow

User speech → Volcano Engine ASR → CustomLLM service → Volcano Engine TTS → User

Request Format

POST /chat-stream HTTP/1.1
Authorization: Bearer YOUR_API_KEY
Content-Type: application/json

{
  "messages": [
    {"role": "system", "content": "You are an intelligent assistant"},
    {"role": "user", "content": "Hello"}
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 256,
  "device_id": "custom-device-id"
}

Response Format

The response must follow the OpenAI SSE format and end with data: [DONE].

Official documentation: CustomLLM Integration

RTC Token

Clients need a token to join an RTC room. Tokens are generated server-side using AppKey.

Token Structure

Token = Version + AppId + Base64(Message + Signature)
  • Version: fixed value 001
  • AppId: 24-character application identifier
  • Message: binary-encoded payload (RoomId, UserId, expiry time, privileges)
  • Signature: HMAC-SHA256 signature using AppKey

Token Privileges

PrivilegeDescription
PrivPublishStreamPublish audio/video
PrivSubscribeStreamSubscribe to streams

Validity

Default validity is 24 hours (86,400 seconds).

Example

import { AccessToken } from './rtctoken'

const token = new AccessToken(appId, appKey, roomId, userId)
const expireAt = Math.floor(Date.now() / 1000) + 24 * 3600
token.addPrivilege('PrivPublishStream', expireAt)
token.addPrivilege('PrivSubscribeStream', expireAt)
token.expireTime(expireAt)
const tokenString = token.serialize()

Error Codes

Response Format

{
  "ResponseMetadata": {
    "RequestId": "xxx",
    "Action": "StartVoiceChat",
    "Error": {
      "Code": "InvalidParameter",
      "Message": "Parameter AppId must not be empty"
    }
  }
}

Common Error Codes

Error CodeHTTP StatusDescription
MissingParameter400Missing required parameter
InvalidParameter400Invalid parameter format
MissingRequestInfo400Missing request info
InvalidTimestamp400Invalid or expired timestamp
InvalidAuthorization400Invalid Authorization header
InvalidCredential400Invalid credential format
InvalidAccessKey401Invalid AccessKey
SignatureDoesNotMatch401Signature verification failed
InvalidSecretToken401Invalid or expired STS token
AccessDenied403Insufficient IAM permissions
ServiceNotFound404Service not found
InvalidActionOrVersion404Invalid API Action or Version
FlowLimitExceeded429Rate limit exceeded
InternalError500Internal error
InternalServiceError502Gateway error
ServiceUnavailableTemp503Service temporarily unavailable
InternalServiceTimeout504Service timeout

Business Error Codes

Error CodeDescription
RoomNotExistRoom does not exist
TaskNotExistTask does not exist
InvalidTokenRTC token is invalid or expired

Official documentation: Common Error Codes