Voice Configuration

Device Agent uses the voice channel for speech recognition (ASR) and text-to-speech (TTS). Voice requests enter the selected device session; device online state, command execution, and reporting still use MQTT or the device SDK.

Console Configuration

Open http://127.0.0.1:3000, go to Settings → Voice, and configure:

Field	Notes
Voice enabled	Controls whether the `/ws/voice` channel is available.
Voice WebSocket URL	Read-only in the UI. The default is `ws://127.0.0.1:3001/ws/voice`. Browser and device clients must be able to reach it.
Speech provider	Select `volcengine`, `aliyun`, `aws`, or `elevenlabs`.
Region	Supports `cn`, `us`, `eu`, and `global`; this affects eligible providers.
Speech recognition	ASR model, language, or resource ID. Fields vary by provider.
Text-to-speech	TTS model, voice, and sample rate. Fields vary by provider.
Provider credentials	API key, access key, AWS credentials, or equivalent provider secrets.

Provider, model, voice, and credential changes can be saved from the page. Enablement, bind host, port, or TLS changes require a service restart.

Speech Providers

Provider	Regions	Main settings
Volcengine (`volcengine`)	`cn`, `global`	`VOLCENGINE_SPEECH_APP_ID`, `VOLCENGINE_SPEECH_ACCESS_KEY`, plus ASR/TTS resource IDs, language, voice, and sample rate.
Aliyun DashScope (`aliyun`)	`cn`, `global`	`ALIYUN_DASHSCOPE_API_KEY` or `QWEN_API_KEY`, plus ASR model, TTS model, voice, and sample rate.
AWS (`aws`)	`us`, `eu`, `global`	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, plus Transcribe language and Polly voice.
ElevenLabs (`elevenlabs`)	`us`, `eu`, `global`	`ELEVENLABS_API_KEY`, API endpoint, ASR model, TTS model, voice, and sample rate.

VOICE_REGION=cn uses Volcengine or Aliyun; us uses AWS or ElevenLabs; eu requires an AWS eu-* region or the ElevenLabs EU residency endpoint; global does not restrict providers.

`.env` Configuration

Use .env for first startup, container deployment, or environments without UI access. You can switch the provider later from Settings → Voice.

Aliyun DashScope:

bash

VOICE_ENABLED=true
VOICE_REGION=cn
ALIYUN_DASHSCOPE_API_KEY=sk-...
ALIYUN_ASR_MODEL=paraformer-realtime-v2
ALIYUN_TTS_MODEL=cosyvoice-v3-flash
ALIYUN_TTS_VOICE=longanyang

Volcengine:

bash

VOICE_ENABLED=true
VOICE_REGION=cn
VOLCENGINE_SPEECH_APP_ID=...
VOLCENGINE_SPEECH_ACCESS_KEY=...
VOLCENGINE_TTS_VOICE=zh_female_shuangkuaisisi_moon_bigtts

AWS:

bash

VOICE_ENABLED=true
VOICE_REGION=us
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
AWS_TRANSCRIBE_LANGUAGE_CODE=en-US
AWS_POLLY_VOICE=Joanna

ElevenLabs:

bash

VOICE_ENABLED=true
VOICE_REGION=us
ELEVENLABS_API_KEY=...
ELEVENLABS_TTS_MODEL_ID=eleven_multilingual_v2
ELEVENLABS_TTS_VOICE=UgBBYS2sOqTuMpoF3BR0

Bind Address and TLS

The voice service listens on 127.0.0.1:3001 by default. For LAN or server access, set:

bash

VOICE_HOST=0.0.0.0
VOICE_PORT=3001

For production use or HTTPS console access, enable TLS:

bash

VOICE_TLS_ENABLED=true
VOICE_TLS_CERT_FILE=/path/to/cert.pem
VOICE_TLS_KEY_FILE=/path/to/key.pem

After configuration, the console shows the voice WebSocket URL that clients should use.

For usage flow, see Voice Interaction. For protocol messages, see API Reference.

SDK Access

IM Access

Configuration

Voice Configuration

Console Configuration

Speech Providers

`.env` Configuration

Bind Address and TLS

Voice Configuration ​

Console Configuration ​

Speech Providers ​

.env Configuration ​

Bind Address and TLS ​

Voice Configuration

Console Configuration

Speech Providers

`.env` Configuration

Bind Address and TLS