API Reference

Interfaces are grouped by protocol: MQTT for device access, HTTP for chat, queries, and control, and WebSocket for realtime voice. Local examples use productId as the device agent ID, deviceId as the real device ID, and HTTP products paths as device agents.

txt

HTTP: http://127.0.0.1:3000
Voice WebSocket: ws://127.0.0.1:3001/ws/voice
Voice HTTP: http://127.0.0.1:3001/api/chat, /api/vision/frames

Choose a Protocol

Scenario	Recommended Protocol	Notes
Real devices stay online, report state, and receive commands	MQTT	Fits device-side connections, state synchronization, and command responses.
Business systems, console extensions, or automation scripts call Device Agent	HTTP	Fits one-shot requests, queries, and command dispatch.
Realtime voice interaction	WebSocket	Fits continuous audio input, realtime ASR results, and TTS output.
Browser or device clients connect to an MQTT broker over WebSocket	MQTT over WebSocket	This is an MQTT transport mode, not the Device Agent voice WebSocket.

MQTT

MQTT is used for device-side access. Devices use MQTT to come online, report state, receive commands, return command results, and publish events. Broker URL, credentials, and topic templates follow the console configuration.

Direction	Topic	Purpose
MQTT client -> Device Agent	`device-agent/{productId}/in`	Send a text request to a device agent.
Device Agent -> MQTT client	`device-agent/{productId}/out`	Return a device agent reply.
MQTT client -> Device Agent	`device-agent/{productId}/device/{deviceId}/in`	Send a text request with device context.
Device Agent -> MQTT client	`device-agent/{productId}/device/{deviceId}/out`	Return a reply with device context.
Device Agent -> device	`device-agent/{productId}/device/{deviceId}/commands`	Send device commands.
Device -> Device Agent	`device-agent/{productId}/device/{deviceId}/responses`	Return command results.
Device -> Device Agent	`v1/{productId}/{deviceId}/telemetry`	Report online status and current state.
Device -> Device Agent	`v1/{productId}/{deviceId}/event`	Report device events.
Device -> Device Agent	`device-agent/{productId}/device/{deviceId}/ntp/request`	Request time synchronization.
Device Agent -> device	`device-agent/{productId}/device/{deviceId}/ntp/response`	Return time synchronization data.

Text request payload:

json

{
  "prompt": "Check the current temperature",
  "sessionId": "session-default:thermostat:thermostat-001",
  "metadata": {
    "source": "mqtt-client"
  }
}

Device agent reply payload:

json

{
  "sessionId": "session-default:thermostat:thermostat-001",
  "text": "The current temperature is 28 degrees.",
  "metadata": {
    "timestamp": "2026-05-11T10:00:00.000Z"
  },
  "timestamp": "2026-05-11T10:00:00.000Z"
}

Device online status:

json

{
  "type": "status",
  "data": {
    "status": "online",
    "state": {
      "temperature": 28,
      "humidity": 62,
      "mode": "auto"
    }
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}

Device command:

json

{
  "cmd": "set_target_temperature",
  "params": {
    "target_temperature": 24
  },
  "requestId": "req-001",
  "ts": 1710000010000
}

Command response:

json

{
  "code": 0,
  "msg": "ok",
  "requestId": "req-001",
  "data": {
    "target_temperature": 24
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}

Device event:

json

{
  "type": "event",
  "data": {
    "event": "temperature_alert",
    "temperature": 38.5,
    "level": "warning"
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}

Time synchronization request and response:

json

{
  "deviceSendTime": 1710000010000
}

json

{
  "deviceSendTime": 1710000010000,
  "serverRecvTime": 1710000010100,
  "serverSendTime": 1710000010105
}

For more payload rules, validation details, and MQTTX examples, see MQTT Access.

Commands sent through HTTP are delivered to the device through the MQTT command topic. The device returns the result through the MQTT response topic. Device events are also reported through MQTT and can then be queried through the HTTP events API.

HTTP

HTTP API paths start with /api. Except for /api/chat, which returns Server-Sent Events, public integration endpoints usually use JSON.

/api/chat and /api/vision/frames are mounted on both the main HTTP port and the voice service port. Business systems usually use 3000; voice or camera clients that already use 3001 can call the same endpoints there.

Chat and Vision

Method	Path	Purpose
`GET`	`/api/health`	Check whether the HTTP API is available.
`POST`	`/api/chat`	Start text chat. Requires `stream: true`.
`GET`	`/api/sessions/:sessionId/history`	Read session history.
`POST`	`/api/sessions/:sessionId/interrupt`	Interrupt a session.
`DELETE`	`/api/sessions/:sessionId`	Clear a session.
`POST`	`/api/vision/frames`	Upload a vision frame for later chat use.

Chat example:

bash

$ curl -N http://127.0.0.1:3000/api/chat \
  -H 'Content-Type: application/json' \
  -H 'Accept: text/event-stream' \
  -d '{
    "message": "Check the current temperature and set the target temperature to 24",
    "stream": true,
    "sessionId": "demo-session",
    "metadata": {
      "productId": "thermostat",
      "deviceId": "thermostat-001"
    }
  }'

Upload a vision frame with /api/vision/frames, then pass the returned frameId and capturedAt to /api/chat as visionRefs. mimeType supports image/jpeg, image/png, and image/webp.

bash

$ curl http://127.0.0.1:3000/api/vision/frames \
  -H 'Content-Type: application/json' \
  -d '{
    "sessionId": "demo-session",
    "deviceId": "thermostat-001",
    "mimeType": "image/png",
    "imageBase64": "<base64>",
    "source": "camera"
  }'

A successful upload returns:

json

{
  "frameId": "frame-001",
  "capturedAt": "2026-05-11T10:00:00.000Z",
  "source": "camera",
  "mimeType": "image/png"
}

Pass the vision frame to chat:

json

{
  "message": "Use this image to check whether the device screen looks abnormal",
  "stream": true,
  "sessionId": "demo-session",
  "visionRefs": [
    {
      "frameId": "frame-001",
      "capturedAt": "2026-05-11T10:00:00.000Z",
      "source": "camera"
    }
  ]
}

Devices, Commands, and Events

Method	Path	Purpose
`GET`	`/api/products/:productId/devices`	List devices under a device agent.
`GET`	`/api/devices/:deviceId`	Get device details.
`POST`	`/api/devices/:deviceId/commands`	Send a command to an online device.
`GET`	`/api/devices/:deviceId/events`	Read device events.

The common call order is:

Use GET /api/products/:productId/devices to list devices and find the deviceId.
Use GET /api/devices/:deviceId to check whether the device is online and read current state.
Use POST /api/devices/:deviceId/commands to send a command defined in the device specification.
Use GET /api/devices/:deviceId/events to read recent device-reported events.

List devices:

bash

$ curl 'http://127.0.0.1:3000/api/products/thermostat/devices?status=online'

Get device details:

bash

$ curl 'http://127.0.0.1:3000/api/devices/thermostat-001'

Command example:

bash

$ curl http://127.0.0.1:3000/api/devices/thermostat-001/commands \
  -H 'Content-Type: application/json' \
  -d '{
    "command": "set_target_temperature",
    "params": {
      "target_temperature": 24
    },
    "timeoutMs": 30000
  }'

Read device events:

bash

$ curl 'http://127.0.0.1:3000/api/devices/thermostat-001/events?limit=50'

Console features such as creating, updating, and publishing device agents, generating SDKs, changing configuration, reading logs, and managing skills or tools use internal product UI APIs. They are not expanded in this public API reference.

WebSocket

WebSocket is currently used mainly by the voice channel. Default URL:

txt

ws://127.0.0.1:3001/ws/voice

The voice service port also handles POST /api/chat and POST /api/vision/frames, so voice and camera clients can submit text requests and vision frames through the same service address.

The connection can include these headers:

Header	Notes
`Protocol-Version`	Protocol version. Current value: `3`.
`Device-Id`	Current device ID.
`Client-Id`	Client ID. Defaults to device ID when omitted.

After connecting, send hello:

json

{
  "type": "hello",
  "version": 3,
  "audio_params": {
    "format": "pcm",
    "sample_rate": 16000,
    "channels": 1
  },
  "sessionId": "demo-session",
  "productId": "thermostat",
  "deviceId": "thermostat-001",
  "provider": "aliyun"
}

A voice turn usually follows this message flow:

Direction	Message	Notes
Client -> Device Agent	`hello`	Start a voice session with audio parameters, device context, and speech provider.
Device Agent -> client	`hello`	Return session ID and server audio output parameters.
Client -> Device Agent	`listen`	Start recording.
Client -> Device Agent	Binary audio frames	Send voice data.
Device Agent -> client	`asr`	Return realtime or final recognized text.
Client -> Device Agent	`stop`	Stop recording. Can include `visionRefs`.
Device Agent -> client	`agent_reply`	Return device agent text reply.
Device Agent -> client	`tts`	Return text prepared for speech synthesis.
Device Agent -> client	TTS binary audio frames	Play synthesized voice reply.
Device Agent -> client	`tts_complete`	Current voice turn is complete.
Client -> Device Agent	`abort`	Interrupt the current turn.
Client -> Device Agent	`goodbye`	Close the voice session.

See Voice Interaction for voice configuration and usage.

Note that mqtt.wsUrl is the MQTT broker WebSocket URL for carrying MQTT over WebSocket. /ws/voice is the Device Agent voice channel. They are different interfaces.

SDK Access

IM Access

Configuration

API Reference

Choose a Protocol

MQTT

HTTP

Chat and Vision

Devices, Commands, and Events

WebSocket

API Reference ​

Choose a Protocol ​

MQTT ​

HTTP ​

Chat and Vision ​

Devices, Commands, and Events ​

WebSocket ​

API Reference

Choose a Protocol

MQTT

HTTP

Chat and Vision

Devices, Commands, and Events

WebSocket