Skip to content

Schema Validation

The Schema Validation feature supports validating MQTT messages in various ways to ensure that only messages matching the predefined data formats are published to the subscribers, preventing non-compliant data from flowing into downstream systems. Schema validation can use schemas in formats such as JSON Schema, Protobuf, and Avro, or utilize built-in SQL statements to validate the message format from a specified topic. This page introduces the Schema Validation feature and how to use and configure it.

Why Validate Data

Clients may publish non-standard messages to the Broker, which could lead to exceptions in subscribers and data systems or pose security risks. EMQX can identify and block these non-compliant messages by validating data formats early, ensuring system stability and reliability. schema validation brings benefits in the following aspects:

  • Data Integrity: Validates the structure and format of MQTT messages to ensure data consistency and correctness.
  • Data Quality: Enforces data quality by checking for missing or invalid fields, data types, and formats, ensuring data consistency and quality.
  • Unified Data Model: Ensures that the entire team and project use a unified data model, reducing inconsistencies and errors.
  • Reuse and Sharing: Allows team members to reuse and share schemas, improving collaboration efficiency and reducing repetitive work and errors.
  • Security: Prevents malicious or incorrectly formatted messages from being processed, reducing the risk of security vulnerabilities.
  • Interoperability: Ensures messages conform to standardized formats, facilitating communication between different devices and systems.
  • Debugging: Easily identify and debug invalid or incorrectly formatted messages.

Workflow

When a message is published, the system validates it according to predefined rules. If the validation is successful, the message continues through the process; if it fails, the user defines an action, such as discarding the message or disconnecting the client.

  1. When a message is published, the EMQX Platform’s authorization feature first checks the client’s publishing permissions. Once the permissions are verified, the system checks the validation rules in the user-configured validation list based on the published topic. A validation rule can include multiple topics or topic filters.

  2. Once a validation rule is matched, the message is validated against the preset Schema or SQL.

    • Supports multiple types of Schema: JSON Schema, Protobuf, and Avro.
    • Supports SQL statements that comply with EMQX rule engine syntax.
    • A single policy can add multiple Schemas or SQLs and specify their relationships:
      • All Pass: Validation is considered successful only if all validations pass.
      • Any Pass: Validation stops and is considered successful if any validation passes.
  3. Once validated successfully, the message continues to the next process, such as triggering the rule engine or dispatching to subscribers.

  4. If validation fails, the following user-configured actions can be executed:

    • Discard Message: Terminate the publish and discard the message, returning a specific reason code (131 - Implementation Specific Error) for QoS 1 and QoS 2 messages via PUBACK.
    • Disconnect and Discard Message: Discard the message and disconnect the publishing client.
    • Ignore: No additional actions are taken.

    Regardless of the action taken after a validation failure, a validation failure message is logged in the deployment logs. The schema validation rule allows you to set the log output level, which is defaulted to warning.

    Additionally, validation failures can trigger a "validation failed" event in the data integration rules' SQL, where the event topic is $events/schema_validation_failed. Users can capture this event for custom handling, such as publishing erroneous messages to another topic or sending them to Kafka for analysis.

Usage Example

If you have subscribed to the Smart Data Hub, you can go to the Schema Validation page by clicking Smart Data Hub -> Schema Validation from the left menu of the deployment page. On this page, you can create Schema validation rules. Afterward, you can test whether the validation rules are effective by simulating message publication.

Use JSON Schema for Validation

  1. Create a new JSON Schema. Follow the Schema Registry document to create a new JSON Schema. Name it JSON_schema and define the schema as:

    json
    {
      "$schema": "http://json-schema.org/draft-06/schema#",
      "type": "object",
      "properties": {
        "temp": {
          "type": "integer"
        },
        "id": {
          "type": "string"
        }
      },
      "required": [
        "temp",
        "id"
      ]
    }
  2. Create a new JSON Schema validation rule. On the Schema Validation page, click the + New button in the top right corner to enter the New Schema Validation page, and configure the validation rule as follows:

    • Name: Enter the name of the Schema validation rule to identify it.
    • Remarks: Optional, add remarks for the validation rule.
    • Message Source Topic: Set which topics’ messages need to be validated. You can set multiple topics or topic filters. In this example, set it to t/#.
    • Validation Method:
      • Validation Strategy: Specify whether all validators must pass or if any one must pass.
        • All Pass:The message is considered valid only if all validation methods pass.
        • Any Pass:The validation stops as soon as any validation method passes, and the message is considered valid.
      • Validation List:Specify the type of Schema or SQL statement used for validation. You can use the Schema created in Schema Registry or choose the SQL type and input SQL statements. In this example, select JSON_schema created in Step 1.
    • Validation Failure Operation
      • Action After Failure:Specify the behavior of the MQTT message or publishing client after validation failure.
        • Drop Message:Terminate publishing and discard the message. For QoS 1 and QoS 2 messages, return the corresponding reason code (131 - Implementation Specific Error) to the client via PUBACK.
        • Disconnect and Drop Message:Discard the message and disconnect the publishing client.
        • Ignore:Do nothing.
      • Logs Level:Define the log level for the validation failure output in the deployment logs. You can choose either warning or error.

    json_schema_validator

  3. Test the validation rule using the client tool MQTTX. Connect MQTTX as a simulation client to the deployment and subscribe to the topic t/#.

    • First, publish an MQTT message that conforms to JSON_schema to the topic t/1, and the message will be successfully received.

    • Next, publish an MQTT message that does not conform to JSON_schema to the topic t/2. Since the message does not meet the validation rules, it will not be successfully received.

    validate_json_schema

  4. Check the execution status of the Schema validation rule.

    • You can check the validation failure logs in the deployment logs. Example:

      bash
      2025-01-20 17:09:44 (UTC+08:00)[emqx-node-2] Warning
      clientid: mqttx_31338552, peername: 101.224.70.14:39452, username: user, pid: <0.92295.0>, line: 288, action: drop, tag: SCHEMA_VALIDATION, validation: use_json_schema_validator, msg: validation_failed
    • Click the corresponding name in the Schema Validation list to enter the detail page, where you can view the related statistics.

Use Avro Schema for Validation

Avro Schema validation is similar to JSON Schema validation. When creating a validation rule, select Avro in the Type field of the validator list, and choose the name of the previously created Avro Schema in the Schema/SQL field.

Use Protobuf Schema for Validation

Protobuf Schema validation is similar to JSON Schema validation. When creating a validation rule, select Protobuf in the Type field of the validator list, and choose the name of the previously created Protobuf Schema in the Schema/SQL field.

Additionally, unlike JSON Schema and Avro Schema, when selecting the Protobuf type, you must also provide the Message Type defined in the Protobuf Schema. For example, in the following Protobuf Schema, the message type is Device:

proto
message Device {
  required string id = 1;
  required uint32 temp = 2;
}

View Schema Validation Statistics

Click on the name in the Schema Validation list to enter the details page, where you can view the relevant statistical metrics for the Schema execution:

  • Statistics:
    • Matched: The total number of times the validation rule has been triggered since it was enabled.
    • Success: The number of successful data validations.
    • Failure: The number of failed data validations.
  • Rate Indicators:
    • Current Rate: Trend of validation counts in the past minute.
    • Rate in Last 5 Min: The validation rate over the last 5 minutes.
    • Maximum Rate: The highest validation rate recorded.

Manage Schema Validation

On the Schema Validation list page, you can perform the following management operations:

  • Enable/Disable Schema Validation: Click the toggle button in the Enabled column.
  • Edit Schema Validation: Click the "Edit" icon in the Actions column to enter the editing page and make changes, then save.
  • Delete Schema Validation: Click the "More" icon in the Actions column, select "Delete" from the dropdown menu, and confirm the deletion.
  • Adjust Schema Validation Order: Use the mouse to drag rows in the list to reorder them, or use the quick move button in the "More options" menu in the Actions column.