Voice Configuration
Device Agent uses the voice channel for speech recognition (ASR) and text-to-speech (TTS). Voice requests enter the selected device session; device online state, command execution, and reporting still use MQTT or the device SDK.
Console Configuration
Open http://127.0.0.1:3000, go to Settings → Voice, and configure:
| Field | Notes |
|---|---|
| Voice enabled | Controls whether the /ws/voice channel is available. |
| Voice WebSocket URL | Read-only in the UI. The default is ws://127.0.0.1:3001/ws/voice. Browser and device clients must be able to reach it. |
| Speech provider | Select volcengine, aliyun, aws, or elevenlabs. |
| Region | Supports cn, us, eu, and global; this affects eligible providers. |
| Speech recognition | ASR model, language, or resource ID. Fields vary by provider. |
| Text-to-speech | TTS model, voice, and sample rate. Fields vary by provider. |
| Provider credentials | API key, access key, AWS credentials, or equivalent provider secrets. |
Provider, model, voice, and credential changes can be saved from the page. Enablement, bind host, port, or TLS changes require a service restart.
Speech Providers
| Provider | Regions | Main settings |
|---|---|---|
Volcengine (volcengine) | cn, global | VOLCENGINE_SPEECH_APP_ID, VOLCENGINE_SPEECH_ACCESS_KEY, plus ASR/TTS resource IDs, language, voice, and sample rate. |
Aliyun DashScope (aliyun) | cn, global | ALIYUN_DASHSCOPE_API_KEY or QWEN_API_KEY, plus ASR model, TTS model, voice, and sample rate. |
AWS (aws) | us, eu, global | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, plus Transcribe language and Polly voice. |
ElevenLabs (elevenlabs) | us, eu, global | ELEVENLABS_API_KEY, API endpoint, ASR model, TTS model, voice, and sample rate. |
VOICE_REGION=cn uses Volcengine or Aliyun; us uses AWS or ElevenLabs; eu requires an AWS eu-* region or the ElevenLabs EU residency endpoint; global does not restrict providers.
.env Configuration
Use .env for first startup, container deployment, or environments without UI access. You can switch the provider later from Settings → Voice.
Aliyun DashScope:
VOICE_ENABLED=true
VOICE_REGION=cn
ALIYUN_DASHSCOPE_API_KEY=sk-...
ALIYUN_ASR_MODEL=paraformer-realtime-v2
ALIYUN_TTS_MODEL=cosyvoice-v3-flash
ALIYUN_TTS_VOICE=longanyangVolcengine:
VOICE_ENABLED=true
VOICE_REGION=cn
VOLCENGINE_SPEECH_APP_ID=...
VOLCENGINE_SPEECH_ACCESS_KEY=...
VOLCENGINE_TTS_VOICE=zh_female_shuangkuaisisi_moon_bigttsAWS:
VOICE_ENABLED=true
VOICE_REGION=us
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
AWS_TRANSCRIBE_LANGUAGE_CODE=en-US
AWS_POLLY_VOICE=JoannaElevenLabs:
VOICE_ENABLED=true
VOICE_REGION=us
ELEVENLABS_API_KEY=...
ELEVENLABS_TTS_MODEL_ID=eleven_multilingual_v2
ELEVENLABS_TTS_VOICE=UgBBYS2sOqTuMpoF3BR0Bind Address and TLS
The voice service listens on 127.0.0.1:3001 by default. For LAN or server access, set:
VOICE_HOST=0.0.0.0
VOICE_PORT=3001For production use or HTTPS console access, enable TLS:
VOICE_TLS_ENABLED=true
VOICE_TLS_CERT_FILE=/path/to/cert.pem
VOICE_TLS_KEY_FILE=/path/to/key.pemAfter configuration, the console shows the voice WebSocket URL that clients should use.
For usage flow, see Voice Interaction. For protocol messages, see API Reference.