Known Issues in EMQX 5.8
e5.8.6
TLS listener started with default configuration cannot be hot-updated to use tlsv1.3 only (since 5.4.0, will be fixed in 5.9.0)
May fail with error like
incompatible,[client_renegotiation,{versions,['tlsv1.3']}]
Workaround: Disable the listener, then re-enable it after config change.
Node Crash if Linux monotonic clock steps backward (since 5.0)
In certain virtual Linux environments, the operating system is unable to keep the clocks monotonic, which may cause Erlang VM to exit with message
OS monotonic time stepped backwards!
. For such environments, one may set the+c
flag tofalse
inetc/vm.args
.IoTDB May Not Work Properly in Batch Mode when
batch_size > 1
(since 5.0)This issue arises because EMQX uses the IoTDB v1 API, which lacks native support for batch operations. To simulate batch functionality, an iterative approach is used; however, this method is not atomic and may lead to bugs.
The Thrift Driver for IoTDB Does Not Support
async
Mode (since 5.8.1)Limitation in SAML-Based SSO (since 5.3)
EMQX Dashboard supports Single Sign-On based on the Security Assertion Markup Language (SAML) 2.0 standard and integrates with Okta and OneLogin as identity providers. However, the SAML-based SSO currently does not support a certificate signature verification mechanism and is incompatible with Azure Entra ID due to its complexity.
e5.8.4
Node Cannot Start if a New Node Joined Cluster While It was Stopped (since 5.0, fixed in 5.8.5)
In a cluster of 2 or more nodes, if a new node joins the cluster while some nodes are down, the nodes that were down will fail to restart and will emit logs like below.
2024-10-03T17:13:45.063985+00:00 [error] Mnesia('emqx@172.17.0.5'): ** ERROR ** (core dumped to file: "/opt/emqx/MnesiaCore.emqx@172.17.0.5_1727_975625_63176"), ** FATAL ** Failed to merge schema: {aborted,function_clause}
Workaround: Delete the
data/mnesia
directory and restart the node.Shard Replica Set Changes Become Stuck once Number of Sites are Lost (since 5.8.0, fixed in 5.8.5)
This issue may occur only when Durable Sessions are enabled and backed by DS Raft backend.
When nodes acting as replication sites for Durable Storage data permanently leave the cluster without handing off the data first, it may lead to a situation where any requested replica set transitions will never finish.
As a simplified example, this is how it could look in
emqx ctl ds info
output. Here, nodeemqx@emqxc1-core0.local
has left the cluster while it was still responsible and was the only replication site for all shards, and thenemqx@emqxc2-core0.local
was asked to take over withemqx ds join messages ABCDEF2222222222
.Site ABCDEF1111111111 'emqx@emqxc1-core0.local' (!) UNIDENTIFIED ABCDEF2222222222 'emqx@emqxc2-core0.local' up <...> Shard Replicas messages/0 (!) ABCDEF1111111111 messages/1 (!) ABCDEF1111111111 <...> messages/9 (!) ABCDEF1111111111 Shard Transitions messages/0 +ABCDEF2222222222 -ABCDEF1111111111 messages/1 +ABCDEF2222222222 -ABCDEF1111111111 <...> messages/9 +ABCDEF2222222222 -ABCDEF1111111111
In this example, transition
+ABCDEF2222222222
would never finish.
e5.8.1
Kafka Disk Buffer Directory Name (since 5.8.0, fixed in 5.8.2)
The introduction of a dynamic topic template for Kafka (Azure EventHubs, Confluent Platform) producer integration imposed an incompatible change for the on-disk buffer directory name. If
disk
mode buffer is used, please wait for the 5.8.2 release to avoid buffered messages getting lost after upgrade from an older version. Ifhybrid
mode buffer is used, you will need to manually clean up the old directories after upgrading from an older version.Kafka Disk Buffer Resume (since 5.8.0, fixed in 5.8.2)
If
disk
mode buffer is used, Kafka (Azure EventHubs, Confluent Platform) producers will not automatically start sending data from disk to Kafka after node restart. The sending will be triggered only after there is a new message to trigger the dynamic add of a topic producer.Performance Degradation When Viewing Audit Events (since 5.4.0, fixed in 5.8.2)
Enabling the audit log and viewing specific events in the Dashboard can, in rare cases, cause significant performance degradation or even crash the EMQX node in exceptional situations, particularly on memory-constrained nodes. Events known to cause this issue include Backup and Restore API requests and commands executed in the EMQX remote console that manipulate large data structures. Nodes may also take longer to start and become responsive in these situations.
Workaround: Adjust the Max Dashboard Record Size through the Dashboard, or lower the
log.audit.max_filter_size
setting. Over time, problematic events will be cleared from the Audit log as new events are recorded.Distorted Gauge Values in
GET /monitor
HTTP API and Dashboard (since 5.8.1, fixed in 5.8.2)When using the
GET /monitor
HTTP API, which also provides data for the Dashboard, changing the time window from 1 hour to a larger time frame may cause fresh data points (collected within the past hour) to appear distorted. For instance, three connections may incorrectly display as nine or more. This issue is purely visual for data points within the past hour. However, for data older than 1 hour, the distortion is irreversible.Impacted gauges:
disconnected_durable_sessions
subscriptions_durable
subscriptions
topics
connections
live_connections
e5.8.0
- Node Crash Race Condition (since 5.0, fixed in 5.8.1) If a node shuts down while RPC channels are being established, it may cause the peer node to crash.