# Python SDK

The Python SDK is for gateway programs, validation scripts, or existing Python services. The package already handles MQTT connection, command subscription, parameter validation, command responses, state reports, and event reports. Replace the default state update logic with real sensors, actuators, or existing service calls.

## Use Cases

- A gateway program connects a group of devices or an external system to a Device Agent.
- Scripts validate a DeviceSpec, commands, telemetry, and events quickly.
- An existing Python service already reads device data or calls business systems.

## Package Contents

| File | Purpose |
| --- | --- |
| `src/main.py` | Device entry point for connection, subscription, responses, status reports, and events |
| `src/voice_client.py` | Device-side voice WebSocket client |
| `device-spec.json` | Current DeviceSpec, used for command validation and field mapping |
| `.env.example` | MQTT, `namespace`, `productId`, `deviceId`, and connection settings |
| `README.md` | Setup, run, and development guide for this package |
| `_references/` | Shared SDK source for checking message shapes |

For a real device, you mainly work in `src/main.py`: state generation, command handling, and event triggers.

## Access Steps

1. Download the Python SDK package.
2. Copy `.env.example` to `.env` and update MQTT broker or credentials if needed.
3. Start the program and confirm the device comes online.
4. Replace the default state update logic in `apply_command_to_state()`.
5. Return to the Device Agent workspace and test commands, state, and events.

```bash
cp .env.example .env
uv run device-agent-toolkit
```

You can also run the entry file directly:

```bash
uv run python src/main.py
```

## Implement Device Logic

The default code validates each command and its parameters against `device-spec.json`, then merges command parameters into the current state. For real access, update `apply_command_to_state()`:

```python
def apply_command_to_state(device_spec, state, command, params):
    next_state = deepcopy(state)

    if command == "set_temperature":
        target = params["target_temperature"]
        call_thermostat_service(target)
        next_state["target_temperature"] = target

    if "updated_at" in next_state:
        next_state["updated_at"] = time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime())

    return next_state
```

Use this function to perform the real device action, such as reading sensors, calling a gateway interface, controlling a relay, or translating the command into an existing system API. The returned state is used for command responses and state reports.

If the DeviceSpec defines events, report them from business logic with `publish_event()`:

```python
publish_event("temperature_alarm", {
    "current_temperature": 32.5,
    "level": "warning",
})
```

## Voice Access Code

The Python SDK includes `src/voice_client.py` for device-side voice conversations. It connects to `/ws/voice`, sends 16 kHz mono Int16LE PCM audio, and receives ASR text, agent replies, and TTS audio through event callbacks.

```python
from voice_client import VoiceClient

voice = VoiceClient(
    ws_url="ws://127.0.0.1:3001/ws/voice",
    device_id="device-001",
    product_id="agent-001",
)

await voice.connect()
await voice.start_listening("manual")
await voice.send_audio(pcm_chunk)
await voice.stop_listening()
```

For a real device, connect microphone capture and speaker playback to these calls. Voice service, voice type, and credential settings are covered in [Voice Interaction](../../usage/voice.md).

## Vision Recognition Code

`src/main.py` includes the command flow for photo recognition. When the DeviceSpec contains one of these commands, the program runs vision recognition before normal state updates:

- `capture_and_recognize`
- `take_photo_vision`
- `vision_recognize`
- `photo_identify`

The device first checks `imageDataUrl` and `imageBase64` in command parameters, then `VISION_FALLBACK_IMAGE_DATA_URL` in `.env`. If none is available, it calls `capture_local_vision_image()`. For real access, implement that function with a camera, screenshot, or image file source.

```python
def capture_local_vision_image():
    return {
        "mimeType": "image/jpeg",
        "imageBase64": read_camera_frame_as_base64(),
        "source": "sdk-camera",
    }
```

The program uploads the image to `/api/vision/frames`, then calls `/api/chat` with `visionRefs`. This is one photo recognition round per command, not continuous video streaming.

## Verify Access

After the program starts, return to the Device Agent workspace and confirm:

1. The device appears in the device list and is online.
2. Current state shows fields reported by the Python program.
3. A conversation command executes the logic in `apply_command_to_state()`.
4. If `publish_event()` is called, the event appears in recent events.