# API Reference

Interfaces are grouped by protocol: MQTT for device access, HTTP for chat, queries, and control, and WebSocket for realtime voice. Local examples use `productId` as the device agent ID, `deviceId` as the real device ID, and HTTP `products` paths as device agents.

```txt
HTTP: http://127.0.0.1:3000
Voice WebSocket: ws://127.0.0.1:3001/ws/voice
Voice HTTP: http://127.0.0.1:3001/api/chat, /api/vision/frames
```

## Choose a Protocol

| Scenario | Recommended Protocol | Notes |
| --- | --- | --- |
| Real devices stay online, report state, and receive commands | MQTT | Fits device-side connections, state synchronization, and command responses. |
| Business systems, console extensions, or automation scripts call Device Agent | HTTP | Fits one-shot requests, queries, and command dispatch. |
| Realtime voice interaction | WebSocket | Fits continuous audio input, realtime ASR results, and TTS output. |
| Browser or device clients connect to an MQTT broker over WebSocket | MQTT over WebSocket | This is an MQTT transport mode, not the Device Agent voice WebSocket. |

## MQTT

MQTT is used for device-side access. Devices use MQTT to come online, report state, receive commands, return command results, and publish events. Broker URL, credentials, and topic templates follow the console configuration.

| Direction | Topic | Purpose |
| --- | --- | --- |
| MQTT client -> Device Agent | `device-agent/{productId}/in` | Send a text request to a device agent. |
| Device Agent -> MQTT client | `device-agent/{productId}/out` | Return a device agent reply. |
| MQTT client -> Device Agent | `device-agent/{productId}/device/{deviceId}/in` | Send a text request with device context. |
| Device Agent -> MQTT client | `device-agent/{productId}/device/{deviceId}/out` | Return a reply with device context. |
| Device Agent -> device | `device-agent/{productId}/device/{deviceId}/commands` | Send device commands. |
| Device -> Device Agent | `device-agent/{productId}/device/{deviceId}/responses` | Return command results. |
| Device -> Device Agent | `v1/{productId}/{deviceId}/telemetry` | Report online status and current state. |
| Device -> Device Agent | `v1/{productId}/{deviceId}/event` | Report device events. |
| Device -> Device Agent | `device-agent/{productId}/device/{deviceId}/ntp/request` | Request time synchronization. |
| Device Agent -> device | `device-agent/{productId}/device/{deviceId}/ntp/response` | Return time synchronization data. |

Text request payload:

```json
{
  "prompt": "Check the current temperature",
  "sessionId": "session-default:thermostat:thermostat-001",
  "metadata": {
    "source": "mqtt-client"
  }
}
```

Device agent reply payload:

```json
{
  "sessionId": "session-default:thermostat:thermostat-001",
  "text": "The current temperature is 28 degrees.",
  "metadata": {
    "timestamp": "2026-05-11T10:00:00.000Z"
  },
  "timestamp": "2026-05-11T10:00:00.000Z"
}
```

Device online status:

```json
{
  "type": "status",
  "data": {
    "status": "online",
    "state": {
      "temperature": 28,
      "humidity": 62,
      "mode": "auto"
    }
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}
```

Device command:

```json
{
  "cmd": "set_target_temperature",
  "params": {
    "target_temperature": 24
  },
  "requestId": "req-001",
  "ts": 1710000010000
}
```

Command response:

```json
{
  "code": 0,
  "msg": "ok",
  "requestId": "req-001",
  "data": {
    "target_temperature": 24
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}
```

Device event:

```json
{
  "type": "event",
  "data": {
    "event": "temperature_alert",
    "temperature": 38.5,
    "level": "warning"
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}
```

Time synchronization request and response:

```json
{
  "deviceSendTime": 1710000010000
}
```

```json
{
  "deviceSendTime": 1710000010000,
  "serverRecvTime": 1710000010100,
  "serverSendTime": 1710000010105
}
```

For more payload rules, validation details, and MQTTX examples, see [MQTT Access](../device-access/mqtt.md).

Commands sent through HTTP are delivered to the device through the MQTT command topic. The device returns the result through the MQTT response topic. Device events are also reported through MQTT and can then be queried through the HTTP events API.

## HTTP

HTTP API paths start with `/api`. Except for `/api/chat`, which returns Server-Sent Events, public integration endpoints usually use JSON.

`/api/chat` and `/api/vision/frames` are mounted on both the main HTTP port and the voice service port. Business systems usually use `3000`; voice or camera clients that already use `3001` can call the same endpoints there.

### Chat and Vision

| Method | Path | Purpose |
| --- | --- | --- |
| `GET` | `/api/health` | Check whether the HTTP API is available. |
| `POST` | `/api/chat` | Start text chat. Requires `stream: true`. |
| `GET` | `/api/sessions/:sessionId/history` | Read session history. |
| `POST` | `/api/sessions/:sessionId/interrupt` | Interrupt a session. |
| `DELETE` | `/api/sessions/:sessionId` | Clear a session. |
| `POST` | `/api/vision/frames` | Upload a vision frame for later chat use. |

Chat example:

```bash
$ curl -N http://127.0.0.1:3000/api/chat \
  -H 'Content-Type: application/json' \
  -H 'Accept: text/event-stream' \
  -d '{
    "message": "Check the current temperature and set the target temperature to 24",
    "stream": true,
    "sessionId": "demo-session",
    "metadata": {
      "productId": "thermostat",
      "deviceId": "thermostat-001"
    }
  }'
```

Upload a vision frame with `/api/vision/frames`, then pass the returned `frameId` and `capturedAt` to `/api/chat` as `visionRefs`. `mimeType` supports `image/jpeg`, `image/png`, and `image/webp`.

```bash
$ curl http://127.0.0.1:3000/api/vision/frames \
  -H 'Content-Type: application/json' \
  -d '{
    "sessionId": "demo-session",
    "deviceId": "thermostat-001",
    "mimeType": "image/png",
    "imageBase64": "<base64>",
    "source": "camera"
  }'
```

A successful upload returns:

```json
{
  "frameId": "frame-001",
  "capturedAt": "2026-05-11T10:00:00.000Z",
  "source": "camera",
  "mimeType": "image/png"
}
```

Pass the vision frame to chat:

```json
{
  "message": "Use this image to check whether the device screen looks abnormal",
  "stream": true,
  "sessionId": "demo-session",
  "visionRefs": [
    {
      "frameId": "frame-001",
      "capturedAt": "2026-05-11T10:00:00.000Z",
      "source": "camera"
    }
  ]
}
```

### Devices, Commands, and Events

| Method | Path | Purpose |
| --- | --- | --- |
| `GET` | `/api/products/:productId/devices` | List devices under a device agent. |
| `GET` | `/api/devices/:deviceId` | Get device details. |
| `POST` | `/api/devices/:deviceId/commands` | Send a command to an online device. |
| `GET` | `/api/devices/:deviceId/events` | Read device events. |

The common call order is:

1. Use `GET /api/products/:productId/devices` to list devices and find the `deviceId`.
2. Use `GET /api/devices/:deviceId` to check whether the device is online and read current state.
3. Use `POST /api/devices/:deviceId/commands` to send a command defined in the device specification.
4. Use `GET /api/devices/:deviceId/events` to read recent device-reported events.

List devices:

```bash
$ curl 'http://127.0.0.1:3000/api/products/thermostat/devices?status=online'
```

Get device details:

```bash
$ curl 'http://127.0.0.1:3000/api/devices/thermostat-001'
```

Command example:

```bash
$ curl http://127.0.0.1:3000/api/devices/thermostat-001/commands \
  -H 'Content-Type: application/json' \
  -d '{
    "command": "set_target_temperature",
    "params": {
      "target_temperature": 24
    },
    "timeoutMs": 30000
  }'
```

Read device events:

```bash
$ curl 'http://127.0.0.1:3000/api/devices/thermostat-001/events?limit=50'
```

Console features such as creating, updating, and publishing device agents, generating SDKs, changing configuration, reading logs, and managing skills or tools use internal product UI APIs. They are not expanded in this public API reference.

## WebSocket

WebSocket is currently used mainly by the voice channel. Default URL:

```txt
ws://127.0.0.1:3001/ws/voice
```

The voice service port also handles `POST /api/chat` and `POST /api/vision/frames`, so voice and camera clients can submit text requests and vision frames through the same service address.

The connection can include these headers:

| Header | Notes |
| --- | --- |
| `Protocol-Version` | Protocol version. Current value: `3`. |
| `Device-Id` | Current device ID. |
| `Client-Id` | Client ID. Defaults to device ID when omitted. |

After connecting, send `hello`:

```json
{
  "type": "hello",
  "version": 3,
  "audio_params": {
    "format": "pcm",
    "sample_rate": 16000,
    "channels": 1
  },
  "sessionId": "demo-session",
  "productId": "thermostat",
  "deviceId": "thermostat-001",
  "provider": "aliyun"
}
```

A voice turn usually follows this message flow:

| Direction | Message | Notes |
| --- | --- | --- |
| Client -> Device Agent | `hello` | Start a voice session with audio parameters, device context, and speech provider. |
| Device Agent -> client | `hello` | Return session ID and server audio output parameters. |
| Client -> Device Agent | `listen` | Start recording. |
| Client -> Device Agent | Binary audio frames | Send voice data. |
| Device Agent -> client | `asr` | Return realtime or final recognized text. |
| Client -> Device Agent | `stop` | Stop recording. Can include `visionRefs`. |
| Device Agent -> client | `agent_reply` | Return device agent text reply. |
| Device Agent -> client | `tts` | Return text prepared for speech synthesis. |
| Device Agent -> client | TTS binary audio frames | Play synthesized voice reply. |
| Device Agent -> client | `tts_complete` | Current voice turn is complete. |
| Client -> Device Agent | `abort` | Interrupt the current turn. |
| Client -> Device Agent | `goodbye` | Close the voice session. |

See [Voice Interaction](../usage/voice.md) for voice configuration and usage.

Note that `mqtt.wsUrl` is the MQTT broker WebSocket URL for carrying MQTT over WebSocket. `/ws/voice` is the Device Agent voice channel. They are different interfaces.
