# Voice Configuration

Device Agent uses the voice channel for speech recognition (ASR) and text-to-speech (TTS). Voice
requests enter the selected device session; device online state, command execution, and reporting
still use MQTT or the device SDK.

## Console Configuration

Open `http://127.0.0.1:3000`, go to **Settings → Voice**, and configure:

| Field | Notes |
| --- | --- |
| Voice enabled | Controls whether the `/ws/voice` channel is available. |
| Voice WebSocket URL | Read-only in the UI. The default is `ws://127.0.0.1:3001/ws/voice`. Browser and device clients must be able to reach it. |
| Speech provider | Select `volcengine`, `aliyun`, `aws`, or `elevenlabs`. |
| Region | Supports `cn`, `us`, `eu`, and `global`; this affects eligible providers. |
| Speech recognition | ASR model, language, or resource ID. Fields vary by provider. |
| Text-to-speech | TTS model, voice, and sample rate. Fields vary by provider. |
| Provider credentials | API key, access key, AWS credentials, or equivalent provider secrets. |

Provider, model, voice, and credential changes can be saved from the page. Enablement, bind host,
port, or TLS changes require a service restart.

## Speech Providers

| Provider | Regions | Main settings |
| --- | --- | --- |
| Volcengine (`volcengine`) | `cn`, `global` | `VOLCENGINE_SPEECH_APP_ID`, `VOLCENGINE_SPEECH_ACCESS_KEY`, plus ASR/TTS resource IDs, language, voice, and sample rate. |
| Aliyun DashScope (`aliyun`) | `cn`, `global` | `ALIYUN_DASHSCOPE_API_KEY` or `QWEN_API_KEY`, plus ASR model, TTS model, voice, and sample rate. |
| AWS (`aws`) | `us`, `eu`, `global` | `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION`, plus Transcribe language and Polly voice. |
| ElevenLabs (`elevenlabs`) | `us`, `eu`, `global` | `ELEVENLABS_API_KEY`, API endpoint, ASR model, TTS model, voice, and sample rate. |

`VOICE_REGION=cn` uses Volcengine or Aliyun; `us` uses AWS or ElevenLabs; `eu` requires an AWS
`eu-*` region or the ElevenLabs EU residency endpoint; `global` does not restrict providers.

## `.env` Configuration

Use `.env` for first startup, container deployment, or environments without UI access. You can
switch the provider later from **Settings → Voice**.

Aliyun DashScope:

```bash
VOICE_ENABLED=true
VOICE_REGION=cn
ALIYUN_DASHSCOPE_API_KEY=sk-...
ALIYUN_ASR_MODEL=paraformer-realtime-v2
ALIYUN_TTS_MODEL=cosyvoice-v3-flash
ALIYUN_TTS_VOICE=longanyang
```

Volcengine:

```bash
VOICE_ENABLED=true
VOICE_REGION=cn
VOLCENGINE_SPEECH_APP_ID=...
VOLCENGINE_SPEECH_ACCESS_KEY=...
VOLCENGINE_TTS_VOICE=zh_female_shuangkuaisisi_moon_bigtts
```

AWS:

```bash
VOICE_ENABLED=true
VOICE_REGION=us
AWS_ACCESS_KEY_ID=...
AWS_SECRET_ACCESS_KEY=...
AWS_REGION=us-east-1
AWS_TRANSCRIBE_LANGUAGE_CODE=en-US
AWS_POLLY_VOICE=Joanna
```

ElevenLabs:

```bash
VOICE_ENABLED=true
VOICE_REGION=us
ELEVENLABS_API_KEY=...
ELEVENLABS_TTS_MODEL_ID=eleven_multilingual_v2
ELEVENLABS_TTS_VOICE=UgBBYS2sOqTuMpoF3BR0
```

## Bind Address and TLS

The voice service listens on `127.0.0.1:3001` by default. For LAN or server access, set:

```bash
VOICE_HOST=0.0.0.0
VOICE_PORT=3001
```

For production use or HTTPS console access, enable TLS:

```bash
VOICE_TLS_ENABLED=true
VOICE_TLS_CERT_FILE=/path/to/cert.pem
VOICE_TLS_KEY_FILE=/path/to/key.pem
```

After configuration, the console shows the voice WebSocket URL that clients should use.

For usage flow, see [Voice Interaction](../../usage/voice.md). For protocol messages, see
[API Reference](../api.md#websocket).
