Skip to content

End-to-End Tracing Span Details

EMQX provides end-to-end tracing capabilities based on the OpenTelemetry standard. This allows you to monitor the entire lifecycle of MQTT messages and client activities within the EMQX cluster. This page describes the spans generated by EMQX and explains what they reveal about the broker’s internal operations.

Client Lifecycle Spans

These spans trace the primary lifecycle events of an MQTT client.

  • client.connect: A root span that traces the client connection process. It starts when a client initiates a connection to the broker and ends when the connection is established or rejected.

  • client.disconnect: Traces the client disconnection process. It starts when a client sends a DISCONNECT packet or when the connection is closed for other reasons (e.g., network error, keep-alive timeout).

  • client.subscribe: Traces a client's subscription request. It covers the entire process of the broker receiving the SUBSCRIBE packet, processing the subscriptions, and sending a SUBACK packet.

  • client.unsubscribe: Traces a client's unsubscription request. It covers the process from receiving the UNSUBSCRIBE packet to sending the UNSUBACK packet.

Authentication and Authorization Spans

These spans reveal how EMQX performs authentication and authorization checks.

  • client.authn: Traces the authentication process for a client. This span is a child of the client.connect span.

  • client.authn_backend: Traces a specific backend call during authentication (e.g., querying a database, calling an HTTP service). This is a child span of client.authn and is useful for identifying performance bottlenecks in authentication backends.

  • client.authz: Traces the authorization process for a client, which occurs on publish or subscribe operations.

  • client.authz_backend: Traces a specific backend call during authorization. This is a child span of client.authz.

Message Lifecycle Spans

These spans trace the journey of an MQTT message through the broker.

Ingress (Client to Broker)

  • client.publish: A root span that traces a message published by a client to the broker. It starts when the broker receives the PUBLISH packet.

  • message.route: A child span of client.publish that traces the routing of the message within the broker to find matching subscribers.

  • message.forward: If the message needs to be delivered to a subscriber on another node in the cluster, this span traces the forwarding of the message to that node. It is a child of message.route.

  • message.handle_forward: On the receiving node, this span traces the handling of a forwarded message.

Egress (Broker to Client)

  • broker.publish: Traces how the broker prepares and publishes a message to a subscriber. It is a child of either message.route or message.handle_forward.

QoS Acknowledgement Spans

These spans trace the QoS 1 and QoS 2 acknowledgement flows.

Broker to Publisher

  • broker.puback: Traces the broker sending a PUBACK to a publisher (QoS 1).

  • broker.pubrec: Traces the broker sending a PUBREC to a publisher (QoS 2).

  • broker.pubcomp: Traces the broker sending a PUBCOMP to a publisher (QoS 2), completing the QoS 2 flow from the publisher's side.

Publisher to Broker

  • client.pubrel: Traces the broker receiving a PUBREL from a publisher (QoS 2).

Broker to Subscriber

  • broker.pubrel: Traces the broker sending a PUBREL to a subscriber (QoS 2).

Subscriber to Broker

  • client.puback: Traces the broker receiving a PUBACK from a subscriber (QoS 1).

  • client.pubrec: Traces the broker receiving a PUBREC from a subscriber (QoS 2).

  • client.pubcomp: Traces the broker receiving a PUBCOMP from a subscriber (QoS 2), completing the QoS 2 flow from the subscriber's side.

Rule Engine Spans

These spans trace the execution within the EMQX Rule Engine.

  • broker.rule_engine.apply: Traces how a message is evaluated against rules. This is a child of the message.route.

  • broker.rule_engine.action: Traces the execution of a specific action triggered by a matched rule. This is a child of broker.rule_engine.apply.

Broker Internal Spans

These spans trace internal broker operations not initiated directly by clients.

  • broker.disconnect: Traces when the broker actively disconnects a client (e.g., due to an administrative action).

  • broker.subscribe: Traces an internal subscription process initiated by the broker itself (e.g., due to an administrative action).

  • broker.unsubscribe: Traces an internal unsubscription process.

Trace Sampling and Filtering

EMQX's OpenTelemetry integration includes a flexible sampler that allows you to control which traces are generated. This helps to manage the volume of trace data and focus on specific clients, topics, or event types. The decision to sample a trace is made based on the following hierarchy:

  1. Trace Context Source: Determines whether EMQX extracts trace context from incoming MQTT packets. This is controlled by the follow_traceparent boolean switch.

    • If true (the default), EMQX attempts to extract trace context from the incoming request (e.g., from the traceparent user property in an MQTT packet). This allows you to link traces that originate from upstream instrumented applications.
    • If false, EMQX ignores any incoming trace context and always starts a new trace.
  2. Remote Sampling Decision: If follow_traceparent is true and an incoming request contains a trace context that is already marked as "sampled", EMQX will respect this upstream decision and sample the trace, unless overridden by other rules.

  3. Whitelist Rules: If the trace is not already sampled by a remote parent, you can define specific rules to force sampling for certain clients or topics. This is the most direct way to ensure you are tracing the activity you care about.

    • ClientID whitelist: Forces sampling for all root-level activities (connect, subscribe, publish, etc.).

    • Topic whitelist: Forces sampling for messages published to matching topics.

      Note: This rule applies to the start of a trace (e.g., the client.publish span). It does not apply to the broker.publish span, which is responsible for message delivery to subscribers.

  4. Ratio-Based Sampling: If no whitelist rule matches, the decision falls back to ratio-based sampling, which is controlled by the sample_ratio configuration.

    You can configure this ratio (from 0.0 to 1.0) to control the percentage of traces that are captured. A value of 1.0 means 100% of traces will be captured, while 0.0 means none will be (unless they match a whitelist rule).

  5. Event Type Switches: Even if a trace is selected by the ratio-based sampler, it is only generated if the relevant event type switch is enabled. These switches act as a global on/off for categories of spans. The available switches are:

    • client_connect_disconnect: A boolean switch to enable or disable tracing for client connect and disconnect events.
    • client_subscribe_unsubscribe: A boolean switch to enable or disable tracing for client subscribe and unsubscribe events.
    • client_messaging: A boolean switch to enable or disable tracing for client message publishing.
    • trace_rule_engine: A boolean switch to enable or disable tracing for the rule engine.
  6. Message Trace Level: For spans related to QoS acknowledgements (e.g., PUBACK, PUBREC), you can control their creation based on a QoS level using the msg_trace_level switch.

    • msg_trace_level: This setting can be configured to a specific QoS level (0, 1, or 2) to control which acknowledgement spans are created based on the QoS of the original message.

      For example, if msg_trace_level is set to 1, PUBACK spans will be created for QoS 1 messages. For QoS 2 messages, this setting will generate PUBREC spans, but not PUBREL or PUBCOMP spans. This helps to reduce the verbosity of traces for high-QoS message flows.