Skip to content

API Reference

Interfaces are grouped by protocol: MQTT for device access, HTTP for chat, queries, and control, and WebSocket for realtime voice. Local examples use productId as the device agent ID, deviceId as the real device ID, and HTTP products paths as device agents.

txt
HTTP: http://127.0.0.1:3000
Voice WebSocket: ws://127.0.0.1:3001/ws/voice
Voice HTTP: http://127.0.0.1:3001/api/chat, /api/vision/frames

Choose a Protocol

ScenarioRecommended ProtocolNotes
Real devices stay online, report state, and receive commandsMQTTFits device-side connections, state synchronization, and command responses.
Business systems, console extensions, or automation scripts call Device AgentHTTPFits one-shot requests, queries, and command dispatch.
Realtime voice interactionWebSocketFits continuous audio input, realtime ASR results, and TTS output.
Browser or device clients connect to an MQTT broker over WebSocketMQTT over WebSocketThis is an MQTT transport mode, not the Device Agent voice WebSocket.

MQTT

MQTT is used for device-side access. Devices use MQTT to come online, report state, receive commands, return command results, and publish events. Broker URL, credentials, and topic templates follow the console configuration.

DirectionTopicPurpose
MQTT client -> Device Agentdevice-agent/{productId}/inSend a text request to a device agent.
Device Agent -> MQTT clientdevice-agent/{productId}/outReturn a device agent reply.
MQTT client -> Device Agentdevice-agent/{productId}/device/{deviceId}/inSend a text request with device context.
Device Agent -> MQTT clientdevice-agent/{productId}/device/{deviceId}/outReturn a reply with device context.
Device Agent -> devicedevice-agent/{productId}/device/{deviceId}/commandsSend device commands.
Device -> Device Agentdevice-agent/{productId}/device/{deviceId}/responsesReturn command results.
Device -> Device Agentv1/{productId}/{deviceId}/telemetryReport online status and current state.
Device -> Device Agentv1/{productId}/{deviceId}/eventReport device events.
Device -> Device Agentdevice-agent/{productId}/device/{deviceId}/ntp/requestRequest time synchronization.
Device Agent -> devicedevice-agent/{productId}/device/{deviceId}/ntp/responseReturn time synchronization data.

Text request payload:

json
{
  "prompt": "Check the current temperature",
  "sessionId": "session-default:thermostat:thermostat-001",
  "metadata": {
    "source": "mqtt-client"
  }
}

Device agent reply payload:

json
{
  "sessionId": "session-default:thermostat:thermostat-001",
  "text": "The current temperature is 28 degrees.",
  "metadata": {
    "timestamp": "2026-05-11T10:00:00.000Z"
  },
  "timestamp": "2026-05-11T10:00:00.000Z"
}

Device online status:

json
{
  "type": "status",
  "data": {
    "status": "online",
    "state": {
      "temperature": 28,
      "humidity": 62,
      "mode": "auto"
    }
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}

Device command:

json
{
  "cmd": "set_target_temperature",
  "params": {
    "target_temperature": 24
  },
  "requestId": "req-001",
  "ts": 1710000010000
}

Command response:

json
{
  "code": 0,
  "msg": "ok",
  "requestId": "req-001",
  "data": {
    "target_temperature": 24
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}

Device event:

json
{
  "type": "event",
  "data": {
    "event": "temperature_alert",
    "temperature": 38.5,
    "level": "warning"
  },
  "metadata": {
    "productId": "thermostat",
    "source": "existing-device"
  }
}

Time synchronization request and response:

json
{
  "deviceSendTime": 1710000010000
}
json
{
  "deviceSendTime": 1710000010000,
  "serverRecvTime": 1710000010100,
  "serverSendTime": 1710000010105
}

For more payload rules, validation details, and MQTTX examples, see MQTT Access.

Commands sent through HTTP are delivered to the device through the MQTT command topic. The device returns the result through the MQTT response topic. Device events are also reported through MQTT and can then be queried through the HTTP events API.

HTTP

HTTP API paths start with /api. Except for /api/chat, which returns Server-Sent Events, public integration endpoints usually use JSON.

/api/chat and /api/vision/frames are mounted on both the main HTTP port and the voice service port. Business systems usually use 3000; voice or camera clients that already use 3001 can call the same endpoints there.

Chat and Vision

MethodPathPurpose
GET/api/healthCheck whether the HTTP API is available.
POST/api/chatStart text chat. Requires stream: true.
GET/api/sessions/:sessionId/historyRead session history.
POST/api/sessions/:sessionId/interruptInterrupt a session.
DELETE/api/sessions/:sessionIdClear a session.
POST/api/vision/framesUpload a vision frame for later chat use.

Chat example:

bash
$ curl -N http://127.0.0.1:3000/api/chat \
  -H 'Content-Type: application/json' \
  -H 'Accept: text/event-stream' \
  -d '{
    "message": "Check the current temperature and set the target temperature to 24",
    "stream": true,
    "sessionId": "demo-session",
    "metadata": {
      "productId": "thermostat",
      "deviceId": "thermostat-001"
    }
  }'

Upload a vision frame with /api/vision/frames, then pass the returned frameId and capturedAt to /api/chat as visionRefs. mimeType supports image/jpeg, image/png, and image/webp.

bash
$ curl http://127.0.0.1:3000/api/vision/frames \
  -H 'Content-Type: application/json' \
  -d '{
    "sessionId": "demo-session",
    "deviceId": "thermostat-001",
    "mimeType": "image/png",
    "imageBase64": "<base64>",
    "source": "camera"
  }'

A successful upload returns:

json
{
  "frameId": "frame-001",
  "capturedAt": "2026-05-11T10:00:00.000Z",
  "source": "camera",
  "mimeType": "image/png"
}

Pass the vision frame to chat:

json
{
  "message": "Use this image to check whether the device screen looks abnormal",
  "stream": true,
  "sessionId": "demo-session",
  "visionRefs": [
    {
      "frameId": "frame-001",
      "capturedAt": "2026-05-11T10:00:00.000Z",
      "source": "camera"
    }
  ]
}

Devices, Commands, and Events

MethodPathPurpose
GET/api/products/:productId/devicesList devices under a device agent.
GET/api/devices/:deviceIdGet device details.
POST/api/devices/:deviceId/commandsSend a command to an online device.
GET/api/devices/:deviceId/eventsRead device events.

The common call order is:

  1. Use GET /api/products/:productId/devices to list devices and find the deviceId.
  2. Use GET /api/devices/:deviceId to check whether the device is online and read current state.
  3. Use POST /api/devices/:deviceId/commands to send a command defined in the device specification.
  4. Use GET /api/devices/:deviceId/events to read recent device-reported events.

List devices:

bash
$ curl 'http://127.0.0.1:3000/api/products/thermostat/devices?status=online'

Get device details:

bash
$ curl 'http://127.0.0.1:3000/api/devices/thermostat-001'

Command example:

bash
$ curl http://127.0.0.1:3000/api/devices/thermostat-001/commands \
  -H 'Content-Type: application/json' \
  -d '{
    "command": "set_target_temperature",
    "params": {
      "target_temperature": 24
    },
    "timeoutMs": 30000
  }'

Read device events:

bash
$ curl 'http://127.0.0.1:3000/api/devices/thermostat-001/events?limit=50'

Console features such as creating, updating, and publishing device agents, generating SDKs, changing configuration, reading logs, and managing skills or tools use internal product UI APIs. They are not expanded in this public API reference.

WebSocket

WebSocket is currently used mainly by the voice channel. Default URL:

txt
ws://127.0.0.1:3001/ws/voice

The voice service port also handles POST /api/chat and POST /api/vision/frames, so voice and camera clients can submit text requests and vision frames through the same service address.

The connection can include these headers:

HeaderNotes
Protocol-VersionProtocol version. Current value: 3.
Device-IdCurrent device ID.
Client-IdClient ID. Defaults to device ID when omitted.

After connecting, send hello:

json
{
  "type": "hello",
  "version": 3,
  "audio_params": {
    "format": "pcm",
    "sample_rate": 16000,
    "channels": 1
  },
  "sessionId": "demo-session",
  "productId": "thermostat",
  "deviceId": "thermostat-001",
  "provider": "aliyun"
}

A voice turn usually follows this message flow:

DirectionMessageNotes
Client -> Device AgenthelloStart a voice session with audio parameters, device context, and speech provider.
Device Agent -> clienthelloReturn session ID and server audio output parameters.
Client -> Device AgentlistenStart recording.
Client -> Device AgentBinary audio framesSend voice data.
Device Agent -> clientasrReturn realtime or final recognized text.
Client -> Device AgentstopStop recording. Can include visionRefs.
Device Agent -> clientagent_replyReturn device agent text reply.
Device Agent -> clientttsReturn text prepared for speech synthesis.
Device Agent -> clientTTS binary audio framesPlay synthesized voice reply.
Device Agent -> clienttts_completeCurrent voice turn is complete.
Client -> Device AgentabortInterrupt the current turn.
Client -> Device AgentgoodbyeClose the voice session.

See Voice Interaction for voice configuration and usage.

Note that mqtt.wsUrl is the MQTT broker WebSocket URL for carrying MQTT over WebSocket. /ws/voice is the Device Agent voice channel. They are different interfaces.