Skip to content

Voice Configuration

Device Agent uses the voice channel for speech recognition (ASR) and text-to-speech (TTS). Voice requests enter the selected device session; device online state, command execution, and reporting still use MQTT or the device SDK.

Console Configuration

Open http://127.0.0.1:3000, go to Settings → Voice, and configure:

FieldNotes
Voice enabledControls whether the /ws/voice channel is available.
Voice connection URLCopy this value for browser, miniapp, or device clients. The default is ws://127.0.0.1:3001/ws/voice.
Speech providerSelect volcengine, aliyun, aws, or elevenlabs.
RegionSupports cn, us, eu, and global; this affects eligible providers.
Speech recognitionASR model, language, or resource ID. Fields vary by provider.
Text-to-speechTTS model, voice, and sample rate. Fields vary by provider.
Provider credentialsAPI key, access key, AWS credentials, or equivalent provider secrets.

Provider, model, voice, and credential changes can be saved from the page. Enablement, bind host, port, or TLS changes require a service restart.

Speech Providers

ProviderRegionsMain settings
Volcengine (volcengine)cn, globalVOLCENGINE_SPEECH_APP_ID, VOLCENGINE_SPEECH_ACCESS_KEY, plus ASR/TTS resource IDs, language, voice, and sample rate.
Aliyun DashScope (aliyun)cn, globalALIYUN_DASHSCOPE_API_KEY or QWEN_API_KEY, plus ASR model, TTS model, voice, and sample rate.
AWS (aws)us, eu, globalAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION, plus Transcribe language and Polly voice.
ElevenLabs (elevenlabs)us, eu, globalELEVENLABS_API_KEY, API endpoint, ASR model, TTS model, voice, and sample rate.

VOICE_REGION=cn uses Volcengine or Aliyun; us uses AWS or ElevenLabs; eu requires an AWS eu-* region or the ElevenLabs EU residency endpoint; global does not restrict providers.

.env Configuration

Use .env for first startup, container deployment, or environments without UI access. You can switch the provider later from Settings → Voice.

Aliyun DashScope:

bash
VOICE_ENABLED=true
VOICE_REGION=cn
ALIYUN_DASHSCOPE_API_KEY=sk-...
ALIYUN_ASR_MODEL=paraformer-realtime-v2
ALIYUN_TTS_MODEL=cosyvoice-v3-flash
ALIYUN_TTS_VOICE=longanyang

Volcengine:

bash
VOICE_ENABLED=true
VOICE_REGION=cn
VOLCENGINE_SPEECH_APP_ID=...
VOLCENGINE_SPEECH_ACCESS_KEY=...
VOLCENGINE_TTS_VOICE=zh_female_shuangkuaisisi_moon_bigtts

AWS:

bash
VOICE_ENABLED=true
VOICE_REGION=us
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
AWS_TRANSCRIBE_LANGUAGE_CODE=en-US
AWS_POLLY_VOICE=Joanna

ElevenLabs:

bash
VOICE_ENABLED=true
VOICE_REGION=us
ELEVENLABS_API_KEY=...
ELEVENLABS_TTS_MODEL_ID=eleven_multilingual_v2
ELEVENLABS_TTS_VOICE=UgBBYS2sOqTuMpoF3BR0

Bind Address and TLS

The voice service defaults to 127.0.0.1:3001. To accept phones, miniapps, SDKs, or LAN/server clients, bind all interfaces and restart Device Agent:

bash
VOICE_HOST=0.0.0.0
VOICE_PORT=3001

After restarting, go to Settings → Voice and copy Voice connection URL for clients.

Make sure the network or firewall allows VOICE_PORT.

For production use or HTTPS console access, enable TLS:

bash
VOICE_TLS_ENABLED=true
VOICE_TLS_CERT_FILE=/path/to/cert.pem
VOICE_TLS_KEY_FILE=/path/to/key.pem

For usage flow, see Voice Interaction. For protocol messages, see API Reference.