EMQX Enterprise Version 5
5.8.3
Release Date: 2024-12-05
Make sure to check the breaking changes and known issues before upgrading to EMQX 5.8.3.
Enhancements
Core MQTT Functionalities
#14219 Enhanced Connection Rate Limiter for Improved System Resilience.
Improved system stability and responsiveness under high connection rates: Previously, when the connection rate limit was exceeded, listener acceptors would ignore new connection attempts, potentially resulting in an unrecoverable state if a large number of clients connected or reconnected frequently within a short period. Listeners now accept pending connections but immediately close them if the rate limit is reached. This reduces resource strain and improves system resilience during peak loads.
New listener option
nolinger
introduced: When set totrue
, a TCP-RST is sent immediately upon socket closure, helping to mitigate SYN flood attacks and further enhancing connection-handling efficiency.max_connection
configuration for MQTT listeners now capped by system limits: Themax_connection
value for MQTT listeners is now constrained by the system's limits (e.g.,ulimit
from the OS andnode.process_limit
). If configured toinfinity
or a value greater than the system limit, it will automatically be adjusted to match the system's maximum limit.SSL listeners'
ssl_options
now validated before changes: Previously, invalid SSL options (such as unsupported TLS versions) could be accepted, causing client connection failures after a listener reconfiguration. With this update:- The node will fail to boot if a listener is configured with invalid SSL options.
- Requests to apply invalid SSL options via the Dashboard or config API will now fail with a
400
status code.
Authentication and Authorization
- #14147 Added support for using
memberOf
syntax in LDAP extensible match filter, for example:(&(objectClass=class)(memberOf:1.2.840.113556.1.4.1941:=CN=GroupName,OU=emqx,DC=WL,DC=com))
.
Data Integration
#14166 Added support for configuring
exchange
androuting_key
as template values in the RabbitMQ producer. This enables dynamic routing based on message payloads. For example, To dynamically set therouting_key
based on a field in the payload, configure it as${payload.akey}
.Note: In batch mode, the
exchange
androuting_key
template values must remain constant for all messages in the batch. This ensures consistent routing and avoids conflicts during batch processing.#14176 Exposed additional metadata for RabbitMQ source actions in the rule engine, including
queue
,exchange
, androuting_key
. This allows users to access these fields directly in their rules for enhanced processing and routing logic.Example:
sqlselect *, queue as payload.queue, exchange as payload.exchange, routing_key as payload.routing_key from "$bridges/rabbitmq:test"
#14218 Introduced vhost-style bucket access and improved redirect handling for S3-compatible storage providers. These improvements are now available in S3 Bridges and File Transfer backend configurations.
Configuration
#14195 Added support for client ID override.
EMQX now provides greater flexibility by allowing custom client ID overrides using the
mqtt.clientid_override={Expression}
configuration. This introduces a more dynamic approach to client ID management. As part of this update, theuse_userid_as_clientid
andpeer_cert_as_clientid
options are deprecated, though they will remain available for compatibility until version 6.0.
MQTT over QUIC
- #14283 Improved QUIC transport, upgrade
quicer
to 0.1.9.- Early release of remote stream resources in the abnormal scenarios.
- Added more troubleshooting APIs. For more details, see: https://github.com/emqx/quic/compare/0.1.6...0.1.9.
Bug Fixes
Core MQTT Functionalities
#14201 Prevent
check_gc
warning from appearing when a WebSocket connection encounters a rate limit.#14215 Fixed an issue where calls to the retainer (via REST or CLI) would throw an exception if it was disabled.
#14223 Ensured the WebSocket close reason is returned as an atom to avoid crashes.
#14260 Resolved a rare race condition that could cause the connection process to crash if the CONNECT packet was not fully received before the idle timeout (default 15 seconds) expired.
#14268 Fixed another rare race condition that could cause the WebSocket connection process to crash when the CONNECT packet was not fully received before the idle timeout expired.
#14266 Updated
emqtt
from version 1.13.0 to 1.13.5. For more details, please refer to the emqtt changelog.
Durable Sessions
#14160 Ensured that topic matching rules for durable session subscriptions are properly applied to topics starting with the
$
symbol, in accordance with the MQTT specification.#14229 Fixed several issues in the Raft/RocksDB backend implementation for Durable Storage, which could have affected the correctness and replica convergence of internal databases used by Durable Shared Subscriptions in certain cases.
#14298 Improved fault tolerance for transient remote shard failures in the DS Raft/RocksDB backend, preventing durable session crashes that occurred when polling shards for updates.
REST API
- #14117 Fixed an issue in the REST API documentation where the
Users
endpoint was incorrectly listed as supportingBasic
Authentication.
Authentication
- #14314 Fixed the
scram:http
authentication, which was previously non-functional. - #14305 Removed support of hashing algorithms
MD4
,MD5
, andRIPEMD-160
from authentication as they are not compliant with NIST Secure Hash Standard.
Rule Engine
- #14217 Fixed errors in the example configurations for the schema registry endpoints.
Data Integration
#14172 Resolved a potential race condition where testing a connector using the HTTP API could leave lingering resources if the HTTP request timed out.
#14178 Fixed an issue where configuration synchronization could become stuck on a particular node due to simultaneous deletion of rules across different nodes in the cluster.
#14226 Mitigated a scenario where, under high load, a node could lose track of resource metrics (e.g., action/source) and fail to recover without a restart. Now, when restarting a resource or resetting its metrics, the system attempts to recreate the lost metrics.
Additionally, warning logs related to metric failures, such as those for "hot-path" metrics like
matched
, are now throttled to prevent excessive log flooding.#14265 Fixed an issue where a
badkey
error would occur when stopping a connector if the MQTT Source action failed to subscribe successfully.#14296 Prevented
ecpool_sup
from being blocked by a slow-startingecpool_worker
.#14126 Fixed an issue with prepared statements for Oracle integration. Prior to this fix, an invalid prepared statement (e.g., referencing a non-existent table column) could cause the action to apply the oldest prepared statement version, leading to inconsistencies.
#14181 Made Kafka and Pulsar producers more resilient to corrupted COMMIT files. If the COMMIT file is corrupted in disk mode buffers, it will now be ignored. While this may result in some previously sent messages being replayed, the producer will no longer crash.
Configuration
#14180 Fixed an issue with variform expressions returning
'undefined'
when a variable is bound to the valueundefined
ornull
. Now, an empty string is returned instead.#14289 Resolved a log file path issue when importing configurations from a different environment. The
EMQX_LOG_DIR
environment variable is set to/opt/emqx/log
in Docker but/var/log/emqx/
when installed via RPM/DEB packages. Prior to this fix, log file paths (default file handler and audit handler) are environment-variable interpolated when being exported. This could cause crashes when importing configs into a different environment where the directory didn’t exist.With this fix, log file paths are no longer environment-variable interpolated during export. Additionally, absolute log directory paths from older versions are now converted back to environment variables if the path doesn’t exist in the new environment.
#14313 Fixed an issue where EMQX could become stuck during startup due to reading the REST API bootstrap API keys file on a replicant node. Now, the bootstrap api keys file is only loaded on core nodes.
Gateway
- #14276 Enhanced error logging for failed JT/T808 message parsing, providing more detailed information for troubleshooting.
Extension
- #14243 Fixed an issue where the
client.connect
hook was not being triggered for some gateways.
MQTT over QUIC
#14258 Reduced the QUIC connection shutdown timeout. Previously, QUIC connections had a 5-second timeout for graceful shutdown. If the client was unresponsive, EMQX would log warnings like:
[warning] msg: session_stepdown_request_timeout, action: discard,
or potentially cause a timeout on the Dashboard when attempting to disconnect the client. The timeout has now been reduced to 1 second for "kick" actions and 3 seconds for other scenarios.
5.8.2
Release Date: 2024-11-12
Make sure to check the breaking changes and known issues before upgrading to EMQX 5.8.2.
Enhancements
Core MQTT Functionalities
#14059 Added a new configuration option for the retainer to cap message expiry intervals for retained messages. This enables garbage collection to remove messages sooner if storage is running low.
#14072 Updated the virtual machine to use Unicode for its printable range. This improvement enhances the readability of certain binary data in messages. For instance, a binary previously displayed as
<<116,101,115,116,228,184,173,230,150,135>>
will now be formatted as<<"test中文"/utf8>>
, providing clearer representation.
MQTT Durable Sessions
#14130 Reduced CPU usage for idle durable sessions.
Previously, idle durable sessions periodically woke up to refresh the list of DS streams. With this change, stream discovery is now event-based, significantly lowering CPU consumption during idle periods. Additionally, the update reduces the delay in notifying sessions of new streams, effectively eliminating the long-tail latency in end-to-end processing.
REST API
#13889 Enhanced the performance of the
/api/v5/monitor_current
and/api/v5/metrics
APIs.Previously, these APIs queried clustered nodes sequentially in a loop. Now, the queries are sent in parallel, reducing response time. The latency is now primarily dependent on the slowest node in the cluster.
Additionally, a
node
parameter was added to the/api/v5/monitor_current
API, allowing targeted queries to a single node instead of the entire cluster. For instance, using?aggregate=false&node=emqx@node1.domain.name
will return data exclusively for the specified node.
EMQX Clustering
- #13903 Added logs to inform the user when a replicant node cannot find a core node with the same release version as its own.
Security
#13923 Added
zone
support in authentication, authorization, and mountpoint templates.Previously, to reference a client's
zone
in authentication or authorization rules, users needed to access it throughclient_attrs
. Now, the${zone}
placeholder can be used directly in these templates, simplifying rule creation and enabling zone-specific configurations.For example, the following ACL rule uses
${zone}
to dynamically apply permissions based on a client’s assigned zone:{allow, all, all, ["${zone}/${username}/#"]}
.#14102 Added support for SSL private key passphrase from a secret file.
EMQX can now read the passphrase from a secret file if
password
is configured as...ssl.password = "file://{path-to-secret-file}"
.
Data Integration
#14065 Added a new
queuing_bytes
metric for data integration. This metric shows the RAM and/or disk resources consumed by buffering for a specific action. Currently, the Pulsar Producer action is the only action that does not support this metric.#14044 Enhanced the IoTDB Thrift driver to support multiple addresses in the
server
field. If the current connection fails, the driver will attempt to connect to the next address.#14048 Improved resilience for Kafka/Confluent/Azure Event Hub Producer actions. Once these actions are successfully created and in a healthy state, they will no longer be marked unhealthy upon detecting an unknown topic. Instead, they will continue queuing messages under such conditions.
#14079 Added the
max_wait_time
setting option for Kafka Consumer sources, allowing users to configure the maximum duration to wait for a fetch response from the Kafka broker.
Observability
- #14096 Exposed
emqx_conf_sync_txid
as a Prometheus metric, allowing for monitoring the configuration file synchronization status of each node in the cluster.
MQTT over QUIC
#13814 Connection Scope Keepalive for MQTT over QUIC Multi-Stream:
Introduced a new feature to keep MQTT connections alive when data streams remain active, even if the control stream is idle.
Previously, clients were required to send
MQTT.PINGREQ
on idle control streams to keep the connection alive. Now, a shared state tracks activity across all streams for each connection. This shared state is used to determine if the connection is still alive, reducing the risk of keepalive timeouts due to Head-of-Line (HOL) blocking.#13984 The Quicer NIF library now links to the system
libcrypto
, enhancing security, performance, and compatibility with system OpenSSL updates.Note: This change does not apply to RHEL 7/CentOS 7, as they continue to use OpenSSL 1.0.x.
#14112 Added support
ssl_options.hibernate_after
in QUIC listener to reduce memory footprint of QUIC transport.
Bug Fixes
Core MQTT Functionality
- #13931 Updated the
gen_rpc
library to version 3.4.1, which includes a fix to prevent client socket initialization errors from escalating to the node level on the server side. - #13969 Optimized the periodic cleanup of expired retained messages to ensure efficient resource usage, particularly in cases with a large volume of expired messages.
- #14068 Added the
handle_frame_error/2
callback to all gateway implementation modules to handle message parsing errors. - #14037 Improved the internal database bootstrap process to better tolerate temporary unavailability of peer nodes, particularly when a new node joins an existing cluster.
- #14116 Fixed an issue where the default configuration for the retainer was generated incorrectly after joining a cluster.
MQTT Durable Sessions
- #14042 Fix crash in the durable session after updates to subscription parameters (such as QoS,
no_local
,upgrade_qos
, ...). - #14052 Corrected memory usage reporting from cgroups when in use.
- #14055 Updated the
/clients_v2
API to properly respect all filtering arguments when querying offline clients with durable sessions. Previously, only theusername
filter was applied, while other filtering arguments were ignored. - #14151 Fixed handling of the
conn_state
filter in the/clients_v2
API for offline clients with durable sessions. Previously, these clients could be incorrectly selected withconn_state=connected
. - #14057 Resolved a compatibility issue that prevented the Messages DS database from starting due to a slightly different database configuration schema. This issue occurred when upgrading EMQX from version 5.7.x with session durability enabled.
REST API
#14023 Fixed an issue with the
GET /monitor
HTTP API where returned values could appear higher than actual values, depending on the requested time window. For data points within a 1-hour window, this distortion is only visual on the Dashboard. However, for data points older than 1 hour, the data distortion is permanent.The affected metrics include:
disconnected_durable_sessions
subscriptions_durable
subscriptions
topics
connections
live_connections
EMQX Clustering
- #13996 Fixed an intermittent crash occurring when using
emqx conf fix
to resolve configuration discrepancies, particularly if a configuration key was missing on one of the nodes.
Security
- #13922 Updated the CRL (Certificate Revocation List) cache to use the full Distribution Point (DP) URL as the cache key. Previously, only the path part of the URL was used, causing conflicts when multiple DPs shared the same path.
- #13924 Fixed an issue where JWK keys could leak into debug logs upon JWT authentication failure.
- #13998 Fixed an issue that caused the SSO feature to crash if OIDC was configured with invalid settings.
Data Integration
#13916 Fixed an issue where the parent metric
failed
was not incremented when a rule’sfailed.no_result
orfailed.exception
metrics were updated. - #14001 Resolved a race condition where a resource (such as a connector, action, source, authentication, or authorization) could falsely report a connected, healthy channel after a brief disconnection. This issue could result in excessiveaction_not_found
log entries when the race condition occurred.#13913 Fixed an issue with the actions and source HTTP APIs where a 500 status code would be returned if a timeout occurred while attempting to update or delete a resource.
#14101 Resolved an issue where deleting a resource would fail if a source and an action were both created with the same name.
#13901 Fixed prepared statements for Postgres integration. Previously, if an invalid prepared statement was used while updating a Postgres integration action (e.g., referencing an unknown table column), it could cause the action to apply an outdated prepared statement from a previous version.
#14005 Fixed an issue where the IoTDB Thrift driver failed to function with SSL enabled.
#14125 For IoTDB, as the Thrift driver has never supported
async
mode, an error log will now be generated ifasync
mode is specified in the configuration.#14008 Resolved a potential race condition in actions with aggregation mode (e.g., S3, Azure Blob Storage, Snowflake) that could result in an aggregated batch being skipped during upload.
#14015 Fixed an issue where a Kafka/Confluent/Azure Event Hub Producer action with disk buffering would not send queued messages after a restart until a new message arrived. This fix applies to actions with a fixed topic (i.e., without placeholders). Additionally, prior to EMQX 5.7.2, disk-buffered messages for Kafka/Confluent/Azure Event Hub Producer actions were stored in a different directory structure. Now, upon detecting an old disk buffer directory, EMQX will automatically rename it to the current structure to prevent data loss.
#14069 Fixed prepared statements for Cassandra integration. Previously, when SQL templates in EMQX actions were modified, the updated statements could not be prepared for Cassandra, causing write failures.
#14079 Resolved a latency issue with Kafka consumers when multiple partitions shared the same partition leader in Kafka. Previously, fetch requests were blocked because Kafka only allows one in-flight fetch request per connection, leading to head-of-line blocking. This fix ensures that each partition consumer establishes its own TCP connection to the partition leader, preventing delays when partitions share the same leader broker.
#14106 Added a validation that forbids a single Kafka Consumer connector from containing sources with repeated Kafka topics.
#14120 Improved handling of timeouts during Pulsar Connector health checks to reduce unnecessary log noise. Previously, timeouts could generate repetitive error logs.
Observability
#13909 Fixed log formatting for cases where the payload cannot be displayed as readable UTF-8 Unicode characters.
#14061 Improved log information when
emqx_cm:request_stepdown/3
fails.In scenarios where a client channel needs to terminate another channel with the same ClientID, a race condition may occur if the target channel has already been closed or terminated. In such cases, error logs and stack traces that provide no useful information will no longer be generated.
#14070 Removed the connector's
state
from error and warning logs due to its potential length. For issue analysis, the connector's state can now be accessed throughemqx_resource:list_instances_verbose/0
. Below is an example of a log entry before this change:pid: <0.43914.0>, connector: connector:sqlserver:connector-05a2e105, reason: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Argument data type varchar is invalid for argument 2 of dateadd function. SQLSTATE IS: 42000, state: {"resource_opts":{"start_timeout":5000,"start_after_created":true,"health_check_interval":15000},"pool_name":"connector:sqlserver:connector-05a2e105","installed_channels":{"action:sqlserver:action-4b033621:connector:sqlserver:connector-05a2e105":{"sql_templates":{"batch_insert_temp":{"send_message":{"batch_insert_tks":["{str,<<\" ( \">>}","{var,[<<\"messageId\">>]}","{str,<<\", \">>}","{var,[<<\"measurement\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_Fault_1\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_Fault_2\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_Fault_3\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_Fault_4\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_PV_1\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_PV_2\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_PV_3\">>]}","{str,<<\", \">>}","{var,[<<\"Analog_IN_PV_4\">>]}","{str,<<\", DATEADD(MS, \">>}","{var,[<<\"ms_shift\">>]}","{str,<<\", DATEADD(S, \">>}","{var,[<<\"s_shift\">>]}","{str,<<\", '19700101 00:00:00:000') ))\">>}"],"batch_insert_part":"insert into TransactionLog(MessageId, Measurement, Fault1, Fault2, Fault3, Fault4, Value1, Value2, Value3, Value4, DateStamp) \r\n"}}}}}},msg: invalid_request
#14099 Removed an error-level log entry that was triggered when validation of UTF-8 strings in MQTT messages failed.
#14091 Implemented a fix to remove
function_clause
from log messages when users provide unsupported write syntax.Example of unsupported syntax:
bashweather,location=us-midwest,season=summer temperature=82 ${timestamp}u
Before this fix, the error log would contain the
function_clause
error, as shown:pid: <0.558392.0>, info: {"stacktrace":["{emqx_bridge_influxdb_connector,parse_timestamp,[[1719350482910000000,<<\"u\">>]],[{file,\"emqx_bridge_influxdb_connector.erl\"},{line,692}]}", ...], ..., "error":"{error,function_clause}"}, tag: ERROR, msg: resource_exception
This change improves log clarity by omitting
function_clause
in cases of syntax errors.
Audit Log
- #14152 Implemented log truncation to prevent storage of excessively lengthy audit log content.
Cluster Linking
- #14004 Fixed an issue in Cluster Linking where overlapping topic filters in the
topics
configuration caused inconsistent and incomplete cross-cluster message routing. Each topic filter is now handled individually; however, overlapping filters may introduce additional complexity in cross-cluster routing. - #13929 Fixed an issue where a cluster link could occasionally become stuck and stop working until a manual restart of the upstream cluster was performed.
5.8.1
Release Date: 2024-10-14
Make sure to check the breaking changes and known issues before upgrading to EMQX 5.8.1.
Important Changes
- #13956 Updated the
gen_rpc
library to version 3.4.1, which includes a node crash issue. Previously, if a node is force shutdown down while RPC channels are being established, it may cause a cluster peer node to crash.
Enhancements
Core MQTT Functionalities
#13525 Added new configuration item
shared_subscription_initial_sticky_pick
to specify the strategy for making the initial pick whenshared_subscription_strategy
is set tosticky
.#13942 The HTTP client now automatically reconnects if no activity is detected for 10 seconds after the latest request has expired. Previously, it would wait indefinitely for a server response, causing timeouts if the server dropped requests.
This change impacts below components.
- HTTP authentication
- HTTP authorization
- Webhook (HTTP connector)
- GCP PubSub connector
- S3 connector
- InfluxDB connector
- Couchbase connector
- IoTDB connector
- Snowflake connector
Authentication and Authorization
#13863 EMQX now supports
${cert_common_name}
placeholder in topic name templates for raw ACL rules.#13864 Added support for using
memberOf
syntax in LDAP query filters.#13810 Added client-info authentication.
Client-info (of type
cinfo
) authentication is a lightweight authentication mechanism that checks client properties and attributes against user-defined rules. The rules make use of the Variform expression to define match conditions, and the authentication result when match is found. For example, to quickly fence off clients without a username, the match condition can bestr_eq(username, '')
associated with a check resultdeny
.#13792 The banned-clients API
GET /banned
supports querying the rules using filters in the query string.The available filters are:
- clientid
- username
- peerhost
- like_clientid
- like_username
- like_peerhost
- like_peerhost_net
When adding a new banned client entry, the default expiration time for entries without the
until
parameter specified has been changed from 1 year toinfinite
.
Rule Engine
#13773 Disabled rule actions now do not trigger
out_of_service
warnings.Previously, if an action is disabled, there would be a warning log with
msg: out_of_service
, and theactions.failed
counter was incremented for the rule.After this enhancement, disabled action will result in a
debug
level log withmsg: discarded
, and the newly introduced counteractions.discarded
will be incremented.#13804 Added support for using Confluent Schema Registry as an external provider in our Schema Registry.
Data Integrations
#13716 Introduced a Thrift driver for the IoTDB connector.
#13745 EMQX supports the data integration with Snowflake.
#13783 Lowered Kafka producer buffer RAM usage when running in async mode.
#13861 Added a new configuration item
undefined_vars_as_null
to some of the data integration actions, to ensure that undefined variables in the SQL templates are treated asNULL
when writing data into databases.The following Sink actions are added with this configuration item:
- MySQL
- ClickHouse
- SQLServer
- TDengine
- DynamoDB
MQTT over QUIC
#13814 Connection Scope Keepalive for MQTT over QUIC Multi-Stream:
This update introduces a new feature to maintain MQTT connections over QUIC multi-streams, even when the control stream is idle but other data streams are active.
Previously, clients had to send
MQTT.PINGREQ
on idle control streams to keep the connection alive. Now, a shared state is maintained for each connection, monitoring activity across all streams. This shared state helps determine if the connection is still active, reducing the risk of keepalive timeouts caused by Head-of-Line (HOL) blocking and improving overall connection stability.
Durable Storage
- #13788 Prevent DS Shared Subscriptions application from running full-fledged startup sequence if respective features are disabled. This includes preventing the initialization of the internal database that could have been created and occupied a lot of disk space otherwise.
Cluster Linking
- #13835 Added a
PUT /cluster/links/link/:name/metrics/reset
HTTP API endpoint that resets metrics for a given cluster link.
Dashboard
#13873 Improved
/api/v5/monitor
endpoint performance.This update enhances the performance of the Dashboard's monitor page, particularly in clusters with a large number of nodes, where timeouts were previously common.
Key improvements include:
- Implemented concurrent RPC calls to retrieve metrics from nodes across the cluster.
- Introduced data downsampling to reduce the density of data points based on the requested query time span:
10s
intervals for the last1h
1m
intervals for the last1d
5m
intervals for the last3d
10m
intervals for the last7d
- Added dummy data points for periods when EMQX was stopped, ensuring visibility of gaps in the timeline on the dashboard.
Enterprise License
- #13910 Performance enhancement for Enterprise edition license check.
Bug Fixes
Core MQTT Functions
#13702 Clean up the corresponding exclusive subscriptions when a node goes down.
#13708 Fixed an issue which may cause shared subscription 'sticky' strategy to degrade to 'random'.
#13733 Made
cacertfile
optional when configuring https listener fromemqx ctl conf load
command.#13742 Fixed an issue when a client would receive retained messages for topic starting with
$
when it subscribed to topic#
or+
.#13754 Fixed an issue when websocket connection would break consistently on its own.
#13756 Introduced more randomness to broker assigned client IDs.
#13790 The default heartbeat interval for the MQTT connector has been reduced from 300 seconds to 160 seconds.
This change helps maintain the underlying TCP connection by preventing timeouts due to the idle limits imposed by load balancers or firewalls, which typically range from 3 to 5 minutes depending on the cloud provider.
#13832 Fixed 500 error when using
/publish
REST API endpoint with persistent sessions enabled.#13842 Fixed a UTF-8 string validation exception.
#13956 Updated the
gen_rpc
library to version 3.4.1, which includes a fix to prevent client socket initialization errors from escalating to the node level on the server side.
Upgrade and Migration
- #13731 Resolved an issue that prevented clusters running on EMQX 5.4.0 from upgrading to EMQX 5.8.0. This fix introduces a migration procedure to update specific internal database tables created in version 5.4.0 to align with the new schema.
Authentication
- #13726 Upgraded Kerberos authentication library to use MEMORY type cache instead of FILE type which sometimes fails when authentication requests are initialized concurrently.
Rule Engine
- #13735 Improved message transformation error messages when payload to be decoded is invalid.
- #13769 Fixed an issue where using JSON schema validation (draft 3) with the
extends
property would lead to validation always failing.
Data Integrations
#13851 Fixed an exception of
Test connectivity
with the IotDB Thrift driver.#13724 Azure Blob Storage and S3 actions in aggregated mode now trigger sending aggregated data sooner after the maximum number of records is reached.
#13734 Made Azure Blob Storage connector configuration error messages more friendly.
#13736 Upgraded Kafka producer to support client re-authentication. See kafka_protocol#122.
Also fixed below minor issuses:
unexpected_info
error log in PR#13727 and wolff#74.einval
crash report of Kafka connection due to a race condition kafka_protocol#124.
#13896 Upgraded pulsar client from
0.8.3
to0.8.4
(see pulsar#61.Prior to this fix, if the producer client experiences a 'socket error' (but not a normal 'socket close'), it may continue sending data to a closed socket without any error handling. From the EMQX Dashboard, user may observe that the 'total' counter keeps increasing, but the other counters 'success', 'failed' and 'dropped' are not.
#13897 Microsoft SQL Server connector is now compatible with Microsoft ODBC 18.
#13902 Fixed an issue with prepared statements in MySQL integration.
Before this fix, if an invalid prepared statement was used (e.g., referencing an unknown table column) when updating a MySQL integration action, it could cause the action to revert to using the oldest version of a previously prepared statement.
#13906 Fixed an issue with prepared statements in PostgreSQL integration.
Before this fix, if an invalid prepared statement was used (e.g., referencing an unknown table column) when updating a Postgres integration action, it could cause the action to apply the oldest version of a previously prepared statement.
#13921 Fixed an issue where changing the
sync_timeout
parameter for the Pulsar producer action would not have the expected effect on request timeout.Additionally, deprecated the
resource_opts.request_ttl
configuration for the Pulsar producer action, as it did not influence the request TTL as expected (this is handled by theretention_period
setting). This change helps prevent potential confusion for users.#13959 Upgraded the Pulsar client from
0.8.4
to0.8.5
(see pulsar#62). This update fixed an issue where, under certain race conditions, the producer could fail to communicate with the client process. As a result, the client process could stop unexpectedly and not restart automatically. The only solution before this fix was to manually restart the process.#13965 Fixed a function clause error that occurred when using the batch mode as the data writing method in IotDB Sink.
#13971 Fixed a Kafka producer bug introduced in EMQX Enterprise 5.8.0, where the producer could crash if it failed to fetch metadata during the initialization stage.
#13973 Fixed an issue in the Microsoft SQL Server integration where EMQX would log multiple errors and warnings each time the connection to the server was stopped.
Management and Operation
- #13963 Fixed the following issues with the Audit Log feature:
- The Audit Log feature was incompatible with the Single Sign-On (SSO) feature, causing exceptions for each SSO event.
- Illegal access attempts (e.g.,
GET
requests toPOST
-only endpoints) were not being logged.
Cluster Linking
- #13888 Fixed an issue that prevented updating a cluster link without clientid via its HTTP API.
- #13927 Fixed an issue where Cluster Link bootstrap process could have crashed if the local cluster had one or more very crowded topics.
5.8.0
Release Date: 2024-08-28
Please read Known Issues of 5.8 before upgrade.
Enhancements
Cluster Linking
- #13126 Introduced the Cluster Linking feature, which allows multiple, separate EMQX clusters to connect and communicate with each other. This feature enables efficient message exchange between clients across different clusters, even if they are geographically distributed, improving the flexibility and reach of your MQTT deployments.
Core MQTT Functionality
- #13009 Updated the log level for message receiving pause due to rate limiting from
debug
towarning
. The log messagesocket_receive_paused_by_rate_limit
is throttled to avoid excessive logging.
Authentication and Authorization
#12418 Enhanced JWT authentication to support claims verification using a list of objects:
[ { name = "claim_name", value = "${username}" }, ... ]
Expected values are now treated as templates, consistent with other authenticators, allowing for arbitrary expressions such as
${username}
and${clientid}
. Previousy, only fixed"${username}"
"${clientid}"
values were supported for interpolation.Improved the documentation for the
verify_claims
parameter.#13229 Added support for
${cert_pem}
placeholder in authentication templates.#13324 The EMQX Dashboard can now integrate with identity services that support the OIDC protocol, such as Okta, to enable OIDC-based Single Sign-On (SSO).
#13534 Added trace logging to indicate when the superuser bypasses the authorization check.
#13601 Added support for Kerberos authentication in EMQX using the GSSAPI mechanism (SASL-GSSAPI with Kerberos V5). This enhancement allows MQTT clients and servers to authenticate securely over a non-secure network using the
GSSAPI-KERBEROS
method.
Data Integrations
#13144 Changed the log level to
warning
and added throttling for the log messagedata_bridge_buffer_overflow
when bridge buffers overflow and messages are dropped. Previously, these events were logged at theinfo
level and were not visible with the default log settings.#13492 Enhanced the
GET /connectors
andGET /connectors/:id
APIs to include lists of actions and sources that depend on a specific connector. Additionally, theGET /actions
,GET /sources
,GET /actions/:id
, andGET /sources/:id
APIs now return the list of rules associated with a specific action or source.#13505 Added the ability to filter rules in the HTTP API based on the IDs of data integration actions or sources used.
#13506 Introduced the
peername
field to all rule engine events that already include thepeerhost
field. Thepeername
field is a string formatted asIP:PORT
.#13516 Added a
direct_dispatch
argument to therepublish
action.When
direct_dispatch
is set totrue
(or rendered astrue
from template) the message is dispatched directly to subscribers. This feature helps prevent the triggering of additional rules or the recursive activation of the same rule.#13573 Introduced
client_attrs
to the SQL context for client connectivity events and the messagepublish
event. Users can now access client attributes within rule SQL statements, such asSELECT client_attrs.attr1 AS attribute1
, and utilize${attribute1}
in data integration actions.#13640 Added two new SQL functions for rules:
coalesce/2
andcoalesce_ne/2
.These functions simplify handling null values in rule SQL expressions. For instance, instead of using:
SELECT CASE WHEN is_null(payload.path.to.value) THEN 0 ELSE payload.path.to.value END AS my_value
you can now write a more concise expression:
SELECT coalesce(payload.path.to.value, 0) AS my_value
.#12959 Introduced a new option to configure a dedicated topic for health check purposes in Kafka Producer connectors. This feature enables more precise detection of connection issues with partition leaders, such as incorrect or missing credentials that could prevent establishing the connections.
#12961 Added a configuration option to customize group IDs in advance for Kafka Consumer sources.
#13069 EMQX supports the data integration with Azure Blob Storage.
#13199 Implemented the Message Transformation feature. This feature allows users to transform and enrich incoming messages just by using simple variform syntax, without the need to define SQL rules in the Rule Engine.
Example Use Case: Suppose you receive a message encoded in Avro format, and you want to decode it into JSON. After decoding, you want to prepend a
tenant
attribute (retrieved from the client attributes of the publishing client) to the topic before processing the message in the Rule Engine. With this new feature, you can achieve this transformation with the following configuration:hoconmessage_transformation { transformations = [ { name = mytransformation failure_action = drop payload_decoder = {type = avro, schema = myschema} payload_encoder = {type = json} operations = [ {key = "topic", value = "concat([client_attrs.tenant, '/', topic])"} ] } ] }
This configuration specifies a transformation named
mytransformation
that:- Decodes the message payload from Avro format using a specified schema.
- Encodes the payload into JSON format.
- Concatenates the
tenant
attribute from client attributes with the original topic, thereby modifying the topic before further processing.
#13415 EMQX now supports data integration with Couchbase.
#13463 Enhanced the GCP PubSub Producer action to automatically retry requests when receiving HTTP status codes 502 (Bad Gateway) or 503 (Service Unavailable) from PubSub. The retries will continue until the request is successful or the message's Time-To-Live (TTL) is reached.
#13546 Added a configurable option for the query mode in the Pulsar Producer action, allowing users to customize how data is queried before it is sent to the Pulsar service.
#13650 EMQX now supports data integration with DataLayers.
Operations
- #13202 Introduced the
emqx ctl conf cluster_sync fix
command to address cluster configuration inconsistencies. This command synchronizes the configuration of all nodes with the configuration of the node that has the highesttnx_id
, ensuring consistency across the cluster. - #13250 Added a new value for
cluster.discovery_strategy
:singleton
. By choosing this option, there will be effectively no clustering, and the node will reject connection attempts to and from other nodes. - #13370 Added a new version of
wildcard_optimized
storage layout for durable storage, offering the following improvements:The new layout does not have an inherent latency.
MQTT messages are serialized into a more space-efficient format.
- #13524 Added the
emqx ctl exclusive
CLI interface to manage exclusive topics more effectively. It allows administrators to better manage and troubleshoot exclusive topic subscriptions, ensuring that subscription states are accurately reflected and preventing unexpected failures. - #13597 Added thin wrapper functions for plugins to store and manage the certificate files used by the plugins themselves. This fix prevents plugin certificates from being inadvertently deleted by the certificate garbage collection (GC) function.
- #13626 Added a new command
emqx ctl listeners enable <Identifier> <Bool>
to enable/disable a listener. - #13493 Upgraded the RPC library
gen_rpc
to version 3.4.0. This update changes the default RPC server socket option fromtrue
toactive-100
, which introduces back-pressure to peer nodes when the RPC server experiences heavy load. - #13665 Added a new metric
emqx_actions_count
to the prometheus endpoint. It contains the number of all actions added by all rules, including Republish actions and Console Output actions.
Bug Fixes
Core MQTT Functionality
#12944 Fixed an issue that caused a crash when clients with non-UTF8 client IDs attempted to connect with
strict_mode=false
.#13006 Improved the validation of retained, delayed, and taken-over session messages to ensure they comply with banned client ID rules implemented through regular expression matching. Previously, certain messages, such as those delayed due to network issues or taken over by another session, could bypass the client ID bans set by regular expressions.
Authentication and Authorization
#13024 Added a default ACL deny rule to reject subscriptions to the
+/#
topic pattern. Since EMQX by default rejects subscriptions to#
topic, for completeness, it should reject+/#
as well.#13040 Improved HTTP authentication:
- Improved error logging for cases where the HTTP
Content-Type
header is missing or unrecognized, providing more detailed information. - Fixed an issue causing double encoding of query parameters in authentication HTTP requests
- Enhanced error messages when a POST method with a JSON content type is configured for authentication requests but the JSON template fails to render into valid JSON. This can occur, for example, when a template contains a placeholder like
${password}
but receives a non-UTF8 password input, leading to better transparency and easier debugging for such scenarios.
- Improved error logging for cases where the HTTP
#13196 Added a limit to the built-in authorization database, restricting the number of Access Control List (ACL) rules per client or user to a default of 100.
#13584 Fixed an issue with creating HTTP authorization that resulted in errors when the HTTP header list was empty.
#13618 Improved the type specifications for the
authorization/sources
endpoint to provide clearer and more concise error messages.#13624 Fixed an issue in the built-in authorizer where updating rules for a client or user could result in the total number of rules exceeding the
max_rules
limit.#13678 Made the deletion of an authenticator in the chain an idempotent operation, ensuring that deleting a non-existing authenticator always succeeds.
Data Integrations
#13207 Improved the
republish
rule engine action to accurately reflect the success and failure of message publishing. Previously, the success metrics were incremented even when the republish action failed to deliver the message to any subscribers. Now, if the action detects that a message fails to reach any subscriber, the failure metrics are correctly incremented.#13425 Improved the MQTT connector error log messages to provide clearer and more detailed information.
#13589 Fixed an issue where creating a rule with a string
"null"
for ID via the HTTP API was allowed, which could lead to an inconsistent configuration.#13414 Improved the RabbitMQ connector error log messages to provide clearer and more detailed information.
File Transfer
- #12514 Fixed the issue with the file transfer command result reporting to the
$file-response/${clientid}
channel. Previously, if a channel issued anassemble
command and then disconnected before the assembly process finished, the status message would be lost and not sent to the response topic. Now, the assembly status is monitored by a dedicated process, ensuring the status message is reliably delivered even if the original channel disconnects.
Operations
#13078 Improved validation and error handling in the EMQX Management API to ensure that requests with a JSON body include the
Content-Type: application/json
header. If the header is missing for APIs that expect JSON input, the server now correctly responds with a415 Unsupported Media Type
status code instead of400 Bad Request
.#13225 Enhanced security in authentication and authorization APIs by redacting sensitive data such as passwords. Previously, the APIs could return the original password values in responses. With this update, sensitive information is replaced with
******
to prevent accidental exposure and protect user credentials.
Gateways
- #13607 Fixed an issue where the QoS level for CoAP subscriptions displayed through the API did not match the actual QoS level being used. This discrepancy could cause confusion as successful subscriptions were not accurately reflected on the Dashboard.
5.7.2
Release Date: 2024-08-07
Enhancements
#13317 Added a new per-authorization source metric type:
ignore
. This metric increments when an authorization source attempts to authorize a request but encounters scenarios where the authorizer is not applicable or encounters an error, resulting in an undecidable outcome.#13336 Added functionality to initialize authentication data in the built-in database of an empty EMQX node or cluster using a bootstrap file in CSV or JSON format. This feature introduces new configuration entries,
bootstrap_file
andbootstrap_type
.#13348 Added a new field
payload_encode
in the log configuration to determine the format of the payload in the log data.#13436 Added the option to add custom request headers to JWKS requests.
#13507 Introduced a new built-in function
getenv
in the rule engine and variform expression to facilitate access to environment variables. This function adheres to the following constraints:- Prefix
EMQXVAR_
is added before reading from OS environment variables. For example,getenv('FOO_BAR')
is to readEMQXVAR_FOO_BAR
. - These values are immutable once loaded from the OS environment.
- Prefix
#13521 Resolved an issue where LDAP query timeouts could cause the underlying connection to become unusable, potentially causing subsequent queries to return outdated results. The fix ensures the system reconnects automatically in case of a timeout.
#13528 Applied log throttling for the event of unrecoverable errors in data integrations.
#13548 EMQX now can optionally invoke the
on_config_changed/2
callback function when the plugin configuration is updated via the REST API. This callback function is assumed to be exported by the<PluginName>_app
module. For example, if the plugin name and version aremy_plugin-1.0.0
, then the callback function is assumed to bemy_plugin_app:on_config_changed/2
.#13386 Added support for initializing a list of banned clients on an empty EMQX node or cluster with a bootstrap file in CSV format. The corresponding config entry to specify the file path is
banned.bootstrap_file
. This file is a CSV file with,
as its delimiter. The first line of this file must be a header line. All valid headers are listed here:- as :: required
- who :: required
- by :: optional
- reason :: optional
- at :: optional
- until :: optional
See the Configuration Manual for details on each field.
Each row in the rest of this file must contain the same number of columns as the header line, and the column can be omitted then its value is
undefined
.#13452 Kafka producer action's
topic
configuration now supports templates.Ensure that topics are pre-existing in Kafka. If a message is directed to a non-existent topic (given Kafka's disabled topic auto-creation), the message will fail with an unrecoverable error. Additionally, if a message lacks sufficient information to match the configured template, it will also fail with an unrecoverable error. For example, the template is
t-${t}
but the message context lacks at
definition. For detailed information, see Configure Kafka Dynamic Topics.This feature is also supported for Azure Event Hubs and Confluent Platform producer integrations.
#13504 Introduced an HTTP backend for the
scram
authentication mechanism.This backend implementation utilizes an external web resource to provide SCRAM authentication data, including the client's stored key, server key, and salt. It also supports additional authentication and authorization extension fields such as
is_superuser
,client_attrs
,expire_at
, andacl
.Note: This is not an implementation of the RFC 7804: Salted Challenge Response HTTP Authentication Mechanism.
#13441 Enhanced CoAP gateway connection mode. The UDP connection will always be bound to the corresponding gateway connection through the
clientid
.
Bug Fixes
#13222 Resolved issues with flags checking and error handling associated with the Will message in the
CONNECT
packet. For detailed specifications, refer to:- MQTT-v3.1.1-[MQTT-3.1.2-13], MQTT-v5.0-[MQTT-3.1.2-11]
- MQTT-v3.1.1-[MQTT-3.1.2-14], MQTT-v5.0-[MQTT-3.1.2-12]
- MQTT-v3.1.1-[MQTT-3.1.2-15], MQTT-v5.0-[MQTT-3.1.2-13]
#13307 Updated
ekka
library to version 0.19.5. This version ofekka
utilizesmria
0.8.8, enhancing auto-heal functionality. Previously, the auto-heal worked only when all core nodes were reachable. This update allows to apply auto-heal once the majority of core nodes are alive. For details, refer to the Mria PR.#13334 Implemented strict mode checking for the
PasswordFlag
in the MQTT v3.1.1 CONNECT packet to align with protocol specifications.Note: To ensure bug-to-bug compatibility, this check is performed only in strict mode.
#13344 Resolved an issue where the
POST /clients/:clientid/subscribe/bulk
API would not function correctly if the node receiving the API request did not maintain the connection to the specifiedclientid
.#13358 Fixed an issue when the
reason
in theauthn_complete_event
event was incorrectly displayed.#13375 The value
infinity
has been added as default value to the listener configuration fieldsmax_conn_rate
,messages_rate
, andbytes_rate
.#13382 Updated the
emqtt
library to version 0.4.14, which resolves an issue preventingemqtt_pool
s from reusing pools that are in an inconsistent state.#13389 Fixed an issue where the
Derived Key Length
forpbkdf2
could be set to a negative integer.#13389 Fixed an issue where topics in the authorization rules might be parsed incorrectly.
#13393 Fixed an issue where plugin applications failed to restart after a node joined a cluster, resulting in hooks not being properly installed and causing inconsistent states.
#13398 Fixed an issue where ACL rules were incorrectly cleared when reloading the built-in database for authorization using the command line.
#13403 Addressed a security issue where environment variable configuration overrides were inadvertently logging passwords. This fix ensures that passwords present in environment variables are not logged.
#13408 Resolved a
function_clause
crash triggered by authentication attempts with invalid salt or password types. This fix enhances error handling to better manage authentication failures involving incorrect salt or password types.#13419 Resolved an issue where crash log messages from the
/configs
API were displaying garbled hints. This fix ensures that log messages related to API calls are clear and understandable.#13422 Fixed an issue where the option
force_shutdown.max_heap_size
could not be set to 0 to disable this tuning.#13442 Fixed an issue where the health check interval configuration for actions/sources was not being respected. Previously, EMQX ignored the specified health check interval for actions and used the connector's interval instead. The fix ensures that EMQX now correctly uses the health check interval configured for actions/sources, allowing for independent and accurate health monitoring frequencies.
#13503 Fixed an issue where connectors did not adhere to the configured health check interval upon initial startup, requiring an update or restart to apply the correct interval.
#13515 Fixed an issue where the same client could not subscribe to the same exclusive topic when the node was down for some reason.
#13527 Fixed an issue in the Rule Engine where executing a SQL test for the Message Publish event would consistently return no results when a
$bridges/...
source was included in theFROM
clause.#13541 Fixed an issue where disabling CRL checks for a listener required a listener restart to take effect.
#13305 Improved error handling for Redis connectors. Previously, Redis connectors with Redis Mode set as
single
orsentinel
would always encounter a timeout error during the connector test in the Dashboard if no username or password was provided. This update ensures that users now receive an informative error message in such scenarios. Additionally, more detailed error information has been added for all Redis connector types to enhance diagnostics and troubleshooting.#13327 Fixed an issue in Kafka, Confluent, and Azure Event Hubs integrations where multiple actions reusing the same connector and configured with the same topic could interfere with each other when one of the actions was deleted or disabled. For example, data writing of other actions might be affected.
#13345 Improved error message clarity for Schema Registry to provide clearer feedback when creating a schema with a name exceeds length limits or contains invalid formatting.
#13420 Implemented a validation rule to the Schema Validation configuration to avoid empty topic filter lists in the configuration. Previously, allowing empty lists could create message transformations that lack meaningful functionality, as they wouldn't apply to specific topics.
#13543 Fixed an issue where the internal cache for Protobuf schemas in Schema Registry was not properly cleaned up after deleting or updating a schema.
#13332 Improve error messages to provide more informative and easy to read details when an Amazon S3 connector is improperly configured.
#13552 Added a startup timeout limit for EMQX plugins with a default timeout of 10 seconds. Before this update, problematic plugins could cause runtime errors during startup, leading to potential issues where the main startup process might hang when EMQX is stopped and restarted.
5.7.1
Release Date: 2024-06-26
Enhancements
#12983 Add new rule engine event
$events/client_check_authn_complete
for authentication completion event.#13175 Added the
disable_prepared_statements
option for Postgres-based connectors.This option is to be used with endpoints that do not support the prepared statements session feature, such as PGBouncer and Supabase in Transaction mode.
#13180 Improved client message handling performance when EMQX is running on Erlang/OTP 26 and increased message throughput by 10% in fan-in mode.
#13191 Upgraded EMQX Docker images to run on Erlang/OTP 26.
EMQX had been running on Erlang/OTP 26 since v5.5 except for docker images which were on Erlang/OTP 25. Now all releases are on Erlang/OTP 26. This upgrade fixed the following known issue:
When an older version of EMQX joins a cluster with newer version nodes, the Schema Registry of the older version node may encounter an issue, emitting logs like the following:
Error loading module '$schema_parser___CiYAWBja87PleCyKZ58h__SparkPlug_B_BUILT-IN':, This BEAM file was compiled for a later version of the runtime system than the current (Erlang/OTP 25).
This issue is fixed in the newer version. However, for older versions, a manual step is required. Execute the following command on one of the clustered nodes before the older version EMQX joins the cluster:
shellemqx eval 'lists:foreach(fun(Key) -> mnesia:dirty_delete(emqx_ee_schema_registry_protobuf_cache_tab, Key) end, mnesia:dirty_all_keys(emqx_ee_schema_registry_protobuf_cache_tab)).'
If the older version of EMQX is already in the cluster, execute the above command and restart the affected node.
#13242 Significantly increased the startup speed of EMQX Dashboard listener.
#13172 Added a rule function
map_to_redis_hset_args
to help preparing redis HSET (or HMSET) multi-fields values.For example, if
payload.value
is a map of multiple data fields, this ruleSELECT map_to_redis_hset_args(payload.value) as hset_fields FROM "t/#"
can preparehset_fields
for redis action to render the command template likeHMSET name1 ${hset_fields}
.#13210 EMQX now validates that referenced schemas and message types exist in the Schema Registry when inserting or updating a Schema Validation.
#13211 Enhanced TLS listener to support more flexible TLS verifications.
partial_chain
support: If the optionpartial_chain
is set totrue
, connections with incomplete certificate chains are allowed. Check the Configuration Manual for more details.Certificate Key Usage validation: Added support for required Extended Key Usage as defined in rfc5280. A new option (
verify_peer_ext_key_usage
) has been introduced to enforce specific key usages (such as "serverAuth") in peer certificates during the TLS handshake. This enhances security by ensuring certificates are used for their intended purposes, for example, "serverAuth,OID:1.3.6.1.5.5.7.3.2". Check the Configuration Manual for more details.
#13274 The RocketMQ connector now supports configuring SSL settings.
Bug Fixes
#13156 Resolved an issue where the Dashboard Monitoring pages would crash following the update to EMQX v5.7.0.
#13164 Fixed HTTP authorization request body encoding.
Before this fix, the HTTP authorization request body encoding format was taken from the
accept
header. The fix is to respect thecontent-type
header instead. Also addedaccess
templating variable for v4 compatibility. The access code of SUBSCRIBE action is1
and PUBLISH action is2
.#13238 Improved the logged error messages when an HTTP authorization request with an unsupported content-type header is returned.
#13258 Fixed an issue where the MQTT-SN gateway would not restart correctly due to incorrect startup order of gateway dependencies.
#13273 Fixed and improved handling of URIs in several configurations. The fix includes the following improvement details:
- Authentication and authorization configurations: Corrected a previous error where valid pathless URIs such as
https://example.com?q=x
were mistakenly rejected. These URIs are now properly recognized as valid. - Connector configurations: Enhanced checks to ensure that URIs with potentially problematic components, such as user info or fragment parts, are no longer erroneously accepted.
- Authentication and authorization configurations: Corrected a previous error where valid pathless URIs such as
#13276 Fixed an issue in the durable message storage mechanism where parts of the internal storage state were not correctly persisted during the setup of new storage generations. The concept of "generation" is used internally and is crucial for managing message expiration and cleanup. This could have manifested as messages being lost after a restart of EMQX.
#13291 Fixed an issue where durable storage sites that were down being reported as up.
#13290 Fixed an issue where the command
$ bin/emqx ctl rules show rule_0hyd
would produce no output when used to display rules with a data integration action attached.#13293 Improved the restoration process from data backups by automating the re-indexing of imported retained messages. Previously, re-indexing required manual intervention using the
emqx ctl retainer reindex start
CLI command after importing a data backup file.This fix also extended the functionality to allow exporting retained messages to a backup file when the
retainer.backend.storage_type
is configured asram
. Previously, only setups withdisc
as the storage type supported exporting retained messages.#13147 Improved the error messages for decoding failures in the rule engine protobuf decode functions by adding clear descriptions to indicate what went wrong when message decoding failed.
#13140 Fixed an issue that caused text traces for the republish action to crash and not display correctly.
#13148 Fixed an issue where a 500 HTTP status code could be returned by
/connectors/:connector-id/start
when there is a timeout waiting for the resource to be connected.#13181 EMQX now forcefully shut down the connector process when attempting to stop a connector, if such operation times out. This fix also improved the clarity of error messages when disabling an action or source fails due to an unresponsive underlying connector.
#13216 Respect
clientid_prefix
config for MQTT bridges. Since EMQX v5.4.1, the MQTT client IDs are restricted to a maximum of 23 bytes. Previously, the system factored theclientid_prefix
into the hash of the original, longer client ID, affecting the final shortened ID. The fix includes the following change details:- Without Prefix: The behavior remains unchanged. EMQX hashes the long client IDs (exceeding 23 bytes) to fit within the 23-byte limit.
- With Prefix:
- Prefix ≤ 19 bytes: The prefix is retained, and the remaining portion of the client ID is hashed into a 4-byte space, ensuring the total length does not exceed 23 bytes.
- Prefix ≥ 20 bytes: EMQX will not attempt to shorten the client ID, fully preserving the configured prefix regardless of length.
#13189 Fixed an issue where the data integration with Microsoft SQL Server or MySQL could not use SQL templates with substring
values
in table name or column name.#13070 Improved Kafka connector error logs to provide more diagnostic information by capturing specific error details, such as unreachable advertised listeners. To manage log verbosity, only the first occurrence of an error is logged, accompanied by the total count of similar errors.
#13093 Improved Kafka consumer group stability. Before this change, the Kafka consumer group sometimes needs to rebalance twice after the Kafka group coordinator restarted.
#13277 Refined the error handling for Kafka producers when encountering the
message_too_large
error. Previously, Kafka producers would repeatedly attempt to resend oversized message batches, hoping for a server-side adjustment inmax.message.bytes
.Now, oversized messages are automatically split into single-message batches for retry. If a message still exceeds size limits, it will be dropped to maintain data flow.
#13130 Improved the trace message formatting for Redis action batch requests. Spaces are now added between components of commands and semicolons are added between commands to make the trace message easier to read.
#13136 Improved the template-rendered traces for Oracle actions for better readability.
#13197 Fixed an issue in AWS S3 data integration that prevented automatic saving of TLS certificates and key files to the file system when they are supplied through the Dashboard UI or Connector API.
#13227 Fixed an issue in AWS S3 Sink running in aggregated mode. Before the fix, an invalid key template in the configuration was reported as an error during the Sink setup, but instead caused a storm of hard-to-recover crashes later.
5.7.0
Release Date: 2024-05-27
Enhancements
MQTT
Implemented the Durable Sessions feature, which persists MQTT Persistent Sessions and their messages to disk, and continuously replicates session metadata and MQTT messages among multiple nodes in the EMQX cluster. This achieves effective failover and recovery mechanisms, ensuring service continuity and high availability, thereby enhancing system reliability.
Added metrics related to EMQX durable storage to Prometheus:
emqx_ds_egress_batches
emqx_ds_egress_batches_retry
emqx_ds_egress_batches_failed
emqx_ds_egress_messages
emqx_ds_egress_bytes
emqx_ds_egress_flush_time
emqx_ds_store_batch_time
emqx_ds_builtin_next_time
emqx_ds_storage_bitfield_lts_counter_seek
emqx_ds_storage_bitfield_lts_counter_next
emqx_ds_storage_bitfield_lts_counter_collision
Note: these metrics are only visible when session persistence is enabled. The number of persisted messages has also been added to the Dashboard.
For more information about the Durable Sessions feature, see MQTT Durable Sessions.
Security
#12947 For JWT authentication, support new
disconnect_after_expire
option. When enabled, the client will be disconnected after the JWT token expires.Note: This is a breaking change. This option is enabled by default, so the default behavior is changed. Previously, the clients with actual JWTs could connect to the broker and stay connected even after the JWT token expired. Now, the client will be disconnected after the JWT token expires. To preserve the previous behavior, set
disconnect_after_expire
tofalse
.
Data Processing and Integration
#12711 Added the Schema Validation feature. With this feature, once validations are configured for certain topic filters, the configured checks are run against published messages. If the checking results are not accepted by validation, the message is dropped and the client may be disconnected, depending on the configuration. For more information about the Schema Validation feature, see Schema Validation.
#12899 For RocketMQ data integration, added support for namespace and key dispatch strategy.
#12671 An
unescape
function has been added to the rule engine SQL language to handle the expansion of escape sequences in strings. This addition has been done because string literals in the SQL language don't support any escape codes (e.g.,\n
and\t
). This enhancement allows for more flexible string manipulation within SQL expressions.#12898 IoTDB bridge support for iotdb 1.3.0 and batch insert(batch_size/batch_time) options.
#12934 Added CSV format file aggregation for AWS S3 action.
Observability
- #12827 It is now possible to trace rules with a new Rule ID trace filter as well as with the Client ID filter. For testing purposes, it is now also possible to use a new HTTP API endpoint (rules/:id/test) to artificially apply a rule and optionally stop its actions after they have been rendered.
- #12863 You can now format trace log entries as JSON objects by setting the formatter parameter to "json" when creating the trace pattern.
Extensibility
#12872 Implemented Client Attributes feature. It allows setting additional properties for each client using key-value pairs. Property values can be generated from MQTT client connection information (such as username, client ID, TLS certificate) or set from data accompanying successful authentication returns. Properties can be used in EMQX for authentication, authorization, data integration, and MQTT extension functions. Compared to using static properties like client ID directly, client properties offer greater flexibility in various business scenarios, simplifying the development process and enhancing adaptability and efficiency in development work.
Initialization of
client_attrs
The
client_attrs
fields can be initially populated from one of the followingclientinfo
fields:cn
: The common name from the TLS client's certificate.dn
: The distinguished name from the TLS client's certificate, that is, the certificate "Subject".clientid
: The MQTT client ID provided by the client.username
: The username provided by the client.user_property
: Extract a property value from 'User-Property' of the MQTT CONNECT packet.
Extension through Authentication Responses
Additional attributes may be merged into
client_attrs
from authentication responses. Supported authentication backends include:HTTP: Attributes can be included in the JSON object of the HTTP response body through a
client_attrs
field.JWT: Attributes can be included via a
client_attrs
claim within the JWT.
Usage in Authentication and Authorization
If
client_attrs
is initialized before authentication, it can be used in external authentication requests. For instance,${client_attrs.property1}
can be used within request templates directed at an HTTP server for authenticity validation.The
client_attrs
can be utilized in authorization configurations or request templates, enhancing flexibility and control. Examples include: Inacl.conf
, use{allow, all, all, ["${client_attrs.namespace}/#"]}
to apply permissions based on thenamespace
attribute.In other authorization backends,
${client_attrs.namespace}
can be used within request templates to dynamically include client attributes.
For more information about the Client Attributes feature, see Client Attributes.
#12910 Added plugin configuration management and schema validation. It is also possible to annotate the schema with metadata to facilitate UI rendering in the Dashboard. See more details in the plugin template and plugin documentation.
Operations and Management
#12923 Provided more specific error when importing wrong format into builtin authenticate database.
#12940 Added
ignore_readonly
argument toPUT /configs
API.Before this change, EMQX would return 400 (BAD_REQUEST) if the raw config included read-only root keys (
cluster
,rpc
, andnode
).After this enhancement it can be called as
PUT /configs?ignore_readonly=true
, EMQX will in this case ignore readonly root config keys, and apply the rest. For observability purposes, an info level message is logged if any readonly keys are dropped.Also fixed an exception when config has bad HOCON syntax (returns 500). Now bad syntax will cause the API to return 400 (BAD_REQUEST).
#12957 Started building packages for macOS 14 (Apple Silicon) and Ubuntu 24.04 Noble Numbat (LTS).
Bug Fixes
Security
#12887 Fixed MQTT enhanced auth with sasl scram.
#12962 TLS clients can now verify server hostname against wildcard certificate. For example, if a certificate is issued for host
*.example.com
, TLS clients is able to verify server hostnames likesrv1.example.com
.
MQTT
- #12996 Fixed process leak in
emqx_retainer
application. Previously, client disconnection while receiving retained messages could cause a process leak.
Data Processing and Integration
#12653 The rule engine function
bin2hexstr
now supports bitstring inputs with a bit size that is not divisible by 8. Such bitstrings can be returned by the rule engine functionsubbits
.#12657 The rule engine SQL-based language previously did not allow putting any expressions as array elements in array literals (only constants and variable references were allowed). This has now been fixed so that one can use any expressions as array elements.
The following is now permitted, for example:
bashselect [21 + 21, abs(-abs(-2)), [1 + 1], 4] as my_array from "t/#"
#12932 Previously, if a HTTP action request received a 503 (Service Unavailable) status, it was marked as a failure and the request was not retried. This has now been fixed so that the request is retried a configurable number of times.
#12948 Fixed an issue where sensitive HTTP header values like
Authorization
would be substituted by******
after updating a connector.#12895 Added some missing config keys for the DynamoDB connector and the action.
#13018 Reduced log spamming when connection goes down in a Postgres/Timescale/Matrix connector.
#13118 Fix a performance issue in the rule engine template rendering.
Observability
- #12765 Make sure stats
subscribers.count
subscribers.max
contains shared-subscribers. It only contains non-shared subscribers previously.
Operations and Management
#12812 Made resource health checks non-blocking operations. This means that operations such as updating or removing a resource won't be blocked by a lengthy running health check.
#12830 Made channel (action/source) health checks non-blocking operations. This means that operations such as updating or removing an action/source data integration won't be blocked by a lengthy running health check.
#12993 Fixed listener config update API when handling an unknown zone.
Before this fix, when a listener config is updated with an unknown zone, for example
{"zone": "unknown"}
, the change would be accepted, causing all clients to crash when connected. After this fix, updating the listener with an unknown zone name will get a "Bad request" response.#13012 The MQTT listerners config option
access_rules
has been improved in the following ways:- The listener no longer crash with an incomprehensible error message if a non-valid access rule is configured. Instead a configuration error is generated.
- One can now add several rules in a single string by separating them by comma (for example, "allow 10.0.1.0/24, deny all").
#13041 Improved HTTP authentication error log message. If HTTP content-type header is missing for POST method, it now emits a meaningful error message instead of a less readable exception with stack trace.
#13077 This fix makes EMQX only read action configurations from the global configuration when the connector starts or restarts, and instead stores the latest configurations for the actions in the connector. Previously, updates to action configurations would sometimes not take effect without disabling and enabling the action. This means that an action could sometimes run with the old (previous) configuration even though it would look like the action configuration has been updated successfully.
#13090 Attempting to start an action or source whose connector is disabled will no longer attempt to start the connector itself.
#12871 Fixed startup process of evacuated node. Previously, if a node was evacuated and stoped without stopping evacuation, it would not start back.
#12888 Fixed License related configuration loss after importing backup data.
Gateways
#12909 Fixed UDP listener process handling on errors or closure, The fix ensures the UDP listener is cleanly stopped and restarted as needed if these error conditions occur.
#13001 Fixed an issue where the syskeeper forwarder would never reconnect when the connection was lost.
#13010 Fixed the issue where the JT/T 808 gateway could not correctly reply to the REGISTER_ACK message when requesting authentication from the registration service failed.
5.6.1
Release Date: 2024-04-18
Bug Fixes
#12759 EMQX now automatically removes invalid backup files that fail during upload due to schema validation errors. This fix ensures that only valid configuration files are displayed and stored, enhancing system reliability.
#12766 Renamed
message_queue_too_long
error reason tomailbox_overflow
mailbox_overflow
. The latter is consistent with the corresponding config parameter:force_shutdown.max_mailbox_size
.#12773 Upgraded HTTP client libraries.
The HTTP client library (
gun-1.3
) incorrectly appended a:portnumber
suffix to theHost
header for standard ports (http
on port 80,https
on port 443). This could cause compatibility issues with servers or gateways performing strictHost
header checks (e.g., AWS Lambda, Alibaba Cloud HTTP gateways), leading to errors such asInvalidCustomDomain.NotFound
or "The specified CustomDomain does not exist."#12802 Improved how EMQX handles node removal from clusters via the
emqx ctl cluster leave
command. Previously, nodes could unintentionally rejoin the same cluster (unless it was stopped) if the configured clusterdiscovery_strategy
was notmanual
. With the latest update, executing thecluster leave
command now automatically disables cluster discovery for the node, preventing it from rejoining. To re-enable cluster discovery, use theemqx ctl discovery enable
command or simply restart the node.#12814 Improved error handling for the
/clients/{clientid}/mqueue_messages
and/clients/{clientid}/inflight_messages
APIs in EMQX. These updates address:- Internal Timeout: If EMQX fails to retrieve the list of Inflight or Mqueue messages within the default 5-second timeout, likely under heavy system load, the API will return 500 error with the response
{"code":"INTERNAL_ERROR","message":"timeout"}
, and log additional details for troubleshooting. - Client Shutdown: Should the client connection be terminated during an API call, the API now returns a 404 error, with the response
{"code": "CLIENT_SHUTDOWN", "message": "Client connection has been shutdown"}
. This ensures clearer feedback when client connections are interrupted.
- Internal Timeout: If EMQX fails to retrieve the list of Inflight or Mqueue messages within the default 5-second timeout, likely under heavy system load, the API will return 500 error with the response
#12824 Updated the statistics metrics
subscribers.count
andsubscribers.max
to include shared subscribers. Previously, these metrics accounted only for non-shared subscribers.#12826 Fixed issues related to the import functionality of source data integrations and retained messages in EMQX. Before this update:
- The data integration sources specified in backup files were not being imported. This included configurations under the
sources.mqtt
category with specific connectors and parameters such as QoS and topics. - Importing the
mnesia
table for retained messages was not supported.
- The data integration sources specified in backup files were not being imported. This included configurations under the
#12843 Fixed
cluster_rpc_commit
transaction ID cleanup procedure on replicator nodes after executing theemqx ctl cluster leave
command. Previously, failing to properly clear these transaction IDs impeded configuration updates on the core node.#12882 Fixed an issue with the RocketMQ action in EMQX data integration, ensuring that messages are correctly routed to their configured topics. Previously, when multiple actions shared a single RocketMQ connector, all messages were mistakenly sent to the topic configured for the first action. This fix involves starting a distinct set of RocketMQ workers for each topic, preventing cross-topic message delivery errors.
#12885 Fixed an issue in EMQX where users were unable to view "Retained Messages" under the "Monitoring" menu in the Dashboard.
The "Retained messages" backend API uses the
qlc
library. This problem was due to a permission issue where theqlc
library'sfile_sorter
function tried to use a non-writable directory,/opt/emqx
, to store temporary files, resulting from recent changes in directory ownership permissions in Docker deployments.This update modifies the ownership settings of the
/opt/emqx
directory toemqx:emqx
, ensuring that all necessary operations, including retained messages retrieval, can proceed without access errors.
5.6.0
Release Date: 2024-03-28
Enhancements
#12326 Enhanced session tracking with registration history. EMQX now has the capability to monitor the history of session registrations, including those that have expired. By configuring
broker.session_history_retain
, EMQX retains records of expired sessions for a specified duration.Session count API: Use the API
GET /api/v5/sessions_count?since=1705682238
to obtain a count of sessions across the cluster that remained active since the given UNIX epoch timestamp (with seconds precision). This enhancement aids in analyzing session activity over time.Metrics expansion with cluster sessions gauge: A new gauge metric,
cluster_sessions
, is added to better track the number of sessions within the cluster. This metric is also integrated into Prometheus for easy monitoring:# TYPE emqx_cluster_sessions_count gauge emqx_cluster_sessions_count 1234
NOTE: Please consider this metric as an approximate estimation. Due to the asynchronous nature of data collection and calculation, exact precision may vary.
#12398 Exposed the
swagger_support
option in the Dashboard configuration, allowing for the enabling or disabling of the Swagger API documentation.#12467 Started supporting cluster discovery using AAAA DNS record type.
#12483 Renamed
emqx ctl conf cluster_sync tnxid ID
toemqx ctl conf cluster_sync inspect ID
.For backward compatibility,
tnxid
is kept, but considered deprecated and will be removed in 5.7.#12495 Introduced new AWS S3 connector and action.
#12499 Enhanced client banning capabilities with extended rules, including:
- Matching
clientid
against a specified regular expression. - Matching client's
username
against a specified regular expression. - Matching client's peer address against a CIDR range.
Important Notice: Implementing a large number of broad matching rules (not specific to an individual clientid, username, or host) may affect system performance. It's advised to use these extended ban rules judiciously to maintain optimal system efficiency.
- Matching
#12509 Implemented API to re-order all authenticators / authorization sources.
#12517 Configuration files have been upgraded to accommodate multi-line string values, preserving indentation for enhanced readability and maintainability. This improvement utilizes
"""~
and~"""
markers to quote indented lines, offering a structured and clear way to define complex configurations. For example:rule_xlu4 { sql = """~ SELECT * FROM "t/#" ~""" }
See HOCON 0.42.0 release notes for details.
#12520 Implemented log throttling. The feature reduces the volume of logged events that could potentially flood the system by dropping all but the first occurance of an event within a configured time window. Log throttling is applied to the following log events that are critical yet prone to repetition:
authentication_failure
authorization_permission_denied
cannot_publish_to_topic_due_to_not_authorized
cannot_publish_to_topic_due_to_quota_exceeded
connection_rejected_due_to_license_limit_reached
dropped_msg_due_to_mqueue_is_full
#12561 Implemented HTTP APIs to get the list of client's in-flight and message queue (mqueue) messages. These APIs facilitate detailed insights and effective control over message queues and in-flight messaging, ensuring efficient message handling and monitoring.
To get the first chunk of data:
GET /clients/{clientid}/mqueue_messages?limit=100
GET /clients/{clientid}/inflight_messages?limit=100
Alternatively, for the first chunks without specifying a start position:
GET /clients/{clientid}/mqueue_messages?limit=100&position=none
GET /clients/{clientid}/inflight_messages?limit=100&position=none
To get the next chunk of data:
GET /clients/{clientid}/mqueue_messages?limit=100&position={position}
GET /clients/{clientid}/inflight_messages?limit=100&position={position}
Where
{position}
is a value (opaque string token) ofmeta.position
field from the previous response.Ordering and Prioritization:
- Mqueue Messages: These are prioritized and sequenced based on their queue order (FIFO), from higher to lower priority. By default, mqueue messages carry a uniform priority level of 0.
- In-Flight Messages: Sequenced by the timestamp of their insertion into the in-flight storage, from oldest to newest.
#12590 Removed
mfa
meta data from log messages to improve clarity.#12641 Improved text log formatter fields order. The new fields order is as follows:
tag
>clientid
>msg
>peername
>username
>topic
> [other fields]#12670 Added field
shared_subscriptions
to endpoint/monitor_current
and/monitor_current/nodes/:node
.#12679 Upgraded docker image base from Debian 11 to Debian 12.
#12700 Started supporting "b" and "B" unit in bytesize hocon fields.
For example, all three fields below will have the value of 1024 bytes:
bytesize_field = "1024b" bytesize_field2 = "1024B" bytesize_field2 = 1024
#12719 The
/clients
API has been upgraded to accommodate queries for multipleclientid
s andusername
s simultaneously, offering a more flexible and powerful tool for monitoring client connections. Additionally, this update introduces the capability to customize which client information fields are included in the API response, optimizing for specific monitoring needs.Examples of Multi-Client/Username Queries:
- To query multiple clients by ID:
/clients?clientid=client1&clientid=client2
- To query multiple users:
/clients?username=user11&username=user2
- To combine multiple client IDs and usernames in one query:
/clients?clientid=client1&clientid=client2&username=user1&username=user2
Examples of Selecting Fields for the Response:
- To include all fields in the response:
/clients?fields=all
(Note: Omitting thefields
parameter defaults to returning all fields.) - To specify only certain fields:
/clients?fields=clientid,username
- To query multiple clients by ID:
#12330 The Cassandra bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12353 The OpenTSDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12376 The Kinesis bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12386 The GreptimeDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12423 The RabbitMQ bridge has been split into connector, action and source components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12425 The ClickHouse bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12439 The Oracle bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12449 The TDEngine bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12488 The RocketMQ bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12512 The HStreamDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically, however, it is recommended to do the upgrade manually as new fields have been added to the configuration.
#12543 The DynamoDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12595 The Kafka Consumer bridge has been split into connector and source components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12619 The Microsoft SQL Server bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.
#12381 Added new SQL functions:
map_keys()
,map_values()
,map_to_entries()
,join_to_string()
,join_to_string()
,join_to_sql_values_string()
,is_null_var()
,is_not_null_var()
.For more information on the functions and their usage, refer to Built-in SQL Functions the documentation.
#12427 Introduced the capability to specify a limit on the number of Kafka partitions that can be used for Kafka data integration.
#12577 Updated the
service_account_json
field for both the GCP PubSub Producer and Consumer connectors to accept JSON-encoded strings. Now, it's possible to set this field to a JSON-encoded string. Using the previous format (a HOCON map) is still supported but not encouraged.#12581 Added JSON schema to schema registry.
#12602 Enhanced health checking for IoTDB connector, using its
ping
API instead of just checking for an existing socket connection.#12336 Refined the approach to managing asynchronous tasks by segregating the cleanup of channels into its own dedicated pool. This separation addresses performance issues encountered during channels cleanup under conditions of high network latency, ensuring that such tasks do not impede the efficiency of other asynchronous operations, such as route cleanup.
#12725 Implemented REST API to list the available source types.
#12746 Added
username
log field. If MQTT client is connected with a non-empty username the logs and traces will includeusername
field.#12785 Added
timestamp_format
configuration option to log handlers. This new option allows for the following settings:auto
: Automatically determines the timestamp format based on the log formatter being used. Utilizesrfc3339
format for text formatters, andepoch
format for JSON formatters.epoch
: Represents timestamps in microseconds precision Unix epoch format.rfc3339
: Uses RFC3339 compliant format for date-time strings. For example,2024-03-26T11:52:19.777087+00:00
.
Bug Fixes
#11868 Fixed a bug where will messages were not published after session takeover.
#12347 Implemented an update to ensure that messages processed by the Rule SQL for the MQTT egress data bridge are always rendered as valid, even in scenarios where the data is incomplete or lacks certain placeholders defined in the bridge configuration. This adjustment prevents messages from being incorrectly deemed invalid and subsequently discarded by the MQTT egress data bridge, as was the case previously.
When variables in
payload
andtopic
templates are undefined, they are now rendered as empty strings instead of the literalundefined
string.#12472 Fixed an issue where certain read operations on
/api/v5/actions/
and/api/v5/sources/
endpoints might result in a500
error code during the process of rolling upgrades.#12492 EMQX now returns the
Receive-Maximum
property in theCONNACK
message for MQTT v5 clients, aligning with protocol expectations. This implementation considers the minimum value of the client'sReceive-Maximum
setting and the server'smax_inflight
configuration as the limit for the number of inflight (unacknowledged) messages permitted. Previously, the determined value was not sent back to the client in theCONNACK
message.NOTE: A current known issue with these enhanced API responses is that the total client count provided may exceed the actual number of clients due to the inclusion of disconnected sessions.
#12505 Upgraded the Kafka producer client
wolff
from version 1.10.1 to 1.10.2. This latest version maintains a long-lived metadata connection for each connector, optimizing EMQX's performance by reducing the frequency of establishing new connections for action and connector health checks.#12513 Changed the level of several flooding log events from
warning
toinfo
.#12530 Improved the error reporting for
frame_too_large
events and malformedCONNECT
packet parsing failures. These updates now provide additional information, aiding in the troubleshooting process.#12541 Introduced a new configuration validation step for autocluster by DNS records to ensure compatibility between
node.name
andcluster.discover_strategy
. Specifically, when utilizing thedns
strategy with eithera
oraaaa
record types, it is mandatory for all nodes to use a (static) IP address as the host name.#12566 Enhanced the bootstrap file for REST API keys:
Empty lines within the file are now skipped, eliminating the previous behavior of generating an error.
API keys specified in the bootstrap file are assigned the highest precedence. In cases where a new key from the bootstrap file conflicts with an existing key, the older key will be automatically removed to ensure that the bootstrap keys take effect without issue.
#12646 Fixed an issue with the rule engine's date-time string parser. Previously, time zone adjustments were only effective for date-time strings specified with second-level precision.
#12652 Fixed a discrepancy where the subbits functions with 4 and 5 parameters, despite being documented, were missing from the actual implementation. These functions have now been added.
#12663 Fixed an issue where the
emqx_vm_cpu_use
andemqx_vm_cpu_idle
metrics, accessible via the Prometheus endpoint/prometheus/stats
, were inaccurately reflecting the average CPU usage since the operating system boot. This fix ensures that these metrics now accurately represent the current CPU usage and idle, providing more relevant and timely data for monitoring purposes.#12668 Refactored the SQL function
date_to_unix_ts()
by usingcalendar:datetime_to_gregorian_seconds/1
. This change also added validation for the input date format.#12672 Changed the process for generating the node boot configuration by incorporating the loading of
{data_dir}/configs/cluster.hocon
. Previously, changes to logging configurations made via the Dashboard and saved in{data_dir}/configs/cluster.hocon
were only applied after the initial boot configuration was generated usingetc/emqx.conf
, leading to potential loss of some log segment files due to late reconfiguration.Now, both
{data_dir}/configs/cluster.hocon
andetc/emqx.conf
are loaded concurrently, with settings fromemqx.conf
taking precedence, to create the boot configuration.#12696 Fixed an issue where attempting to reconnect an action or source could lead to wrong error messages being returned in the HTTP API.
#12714 Fixed inaccuracies in several metrics reported by the
/prometheus/stats
endpoint of the Prometheus API. The correction applies to the following metrics:emqx_cluster_sessions_count
emqx_cluster_sessions_max
emqx_cluster_nodes_running
emqx_cluster_nodes_stopped
emqx_subscriptions_shared_count
emqx_subscriptions_shared_max
Additionally, this fix rectified an issue within the
/stats
endpoint concerning thesubscriptions.shared.count
andsubscriptions.shared.max
fields. Previously, these values failed to update promptly following a client's disconnection or unsubscription from a Shared-Subscription.#12390 Fixed an issue where the
/license
API request may crash during cluster joining processes.#12411 Fixed a bug where
null
values would be inserted as1853189228
inint
columns in Cassandra data integration.#12522 Refined the parsing process for Kafka bootstrap hosts to exclude spaces following commas, addressing connection timeouts and DNS resolution failures due to malformed host entries.
#12656 Implemented a topic verification step for creating GCP PubSub Producer actions, ensuring failure notifications when the topic doesn't exist or provided credentials lack sufficient permissions.
#12678 Enhanced the DynamoDB connector to clearly report the reason for connection failures, improving upon the previous lack of error insights.
#12681 Fixed a security issue where secrets could be logged at debug level when sending messages to a RocketMQ bridge/action.
#12715 Fixed a crash that could occur during configuration updates if the connector for the ingress data integration source had active channels.
#12767 Fixed issues encountered during upgrades from 5.0.1 to 5.5.1, specifically related to Kafka Producer configurations that led to upgrade failures. The correction ensures that Kafka Producer configurations are accurately transformed into the new action and connector configuration format required by EMQX version 5.5.1 and beyond.
#12768 Addressed a startup failure issue in EMQX version 5.4.0 and later, particularly noted during rolling upgrades from versions before 5.4.0. The issue was related to the initialization of the routing schema when both v1 and v2 routing tables were empty.
The node now attempts to retrieve the routing schema version in use across the cluster instead of using the v2 routing table by default when local routing tables are found empty at startup. This approach mitigates potential conflicts and reduces the chances of diverging routing storage schemas among cluster nodes, especially in a mixed-version cluster scenario.
If conflict is detected in a running cluster, EMQX writes instructions on how to manually resolve it in the log as part of the error message with
critical
severity. The same error message and instructions will also be written on standard error to make sure this message will not get lost even if no log handler is configured.#12786 Added a strict check that prevents replicant nodes from connecting to core nodes running with a different version of EMQX application. This check ensures that during the rolling upgrades, the replicant nodes can only work when at least one core node is running the same EMQX release version.
5.5.1
Release Date: 2024-03-06
Enhancements
- #12497 Improved MongoDB connector performance, resulting in more efficient database interactions. This enhancement is supported by improvements in the MongoDB Erlang driver as well (mongodb-erlang PR).
Bug Fixes
#12471 Fixed an issue that data integration configurations failed to load correctly during upgrades from EMQX version 5.0.2 to newer releases.
#12598 Fixed an issue that users were unable to subscribe to or unsubscribe from shared topic filters via HTTP API.
The affected APIs include:
/clients/:clientid/subscribe
/clients/:clientid/subscribe/bulk
/clients/:clientid/unsubscribe
/clients/:clientid/unsubscribe/bulk
#12601 Fixed an issue where logs of the LDAP driver were not being captured. Now, all logs are recorded at the
info
level.#12606 The Prometheus API experienced crashes when the specified SSL certificate file did not exist in the given path. Now, when an SSL certificate file is missing, the
emqx_cert_expiry_at
metric will report a value of 0, indicating the non-existence of the certificate.#12608 Fixed a
function_clause
error in the IoTDB action caused by the absence of apayload
field in query data.#12610 Fixed an issue where connections to the LDAP connector could unexpectedly disconnect after a certain period of time.
#12620 Redacted sensitive information in HTTP headers to exclude authentication and authorization credentials from
debug
level logs in the HTTP Server connector, mitigating potential security risks.#12632 Fixed an issue where the rule engine's SQL built-in function
date_to_unix_ts
produced incorrect results for dates starting from March 1st on leap years.
5.5.0
Release Date: 2024-02-01
Enhancements
#12085 EMQX has been upgraded to leverage the capabilities of OTP version 26.1.2-2. NOTE: Docker images are still built with OTP 25.3.2.
#12189 Enhanced the ACL claim format in EMQX JWT authentication for greater versatility. The updated format now supports an array structure, aligning more closely with the file-based ACL rules.
For example:
json[ { "permission": "allow", "action": "pub", "topic": "${username}/#", "qos": [0, 1], "retain": true }, { "permission": "allow", "action": "sub", "topic": "eq ${username}/#", "qos": [0, 1] }, { "permission": "deny", "action": "all", "topics": ["#"] } ]
In this new format, the absence of a matching rule does not result in an automatic denial of the action. The authorization chain can advance to other configured authorizers if a match is not found in the JWT ACL. If no match is found throughout the chain, the final decision defers to the default permission set in
authorization.no_match
.#12267 Added a new
timeout
parameter to thecluster/:node/invite
interface, addressing the issue of default timeouts. The previously set 5-second default timeout often led to HTTP API call timeouts because joining an EMQX cluster usually requires more time.In addition, EMQX added a new API
/cluster/:node/invite_async
to support an asynchronous way to invite nodes to join the cluster and introduced a newcluster/invitation
API to inspect the join status.#12272 Introduced updates to the
retain
API in EMQX:- Added a new API
DELETE /retainer/messages
to clean all retained messages. - Added an optional topic filter parameter
topic
in the query string for the APIGET /retainer/messages
. For example, using a query stringtopic=t/1
filters the retained messages for a specific topic, improving the efficiency of message retrieval.
- Added a new API
#12277 Added
mqtt/delayed/messages/:topic
API to remove delayed messages by topic name.#12278 Adjusted the maximum pagination size for paginated APIs in the REST API from
3000
to10000
.#12289 Authorization caching now supports the exclusion of specific topics. For the specified list of topics and topic filters, EMQX will not generate an authorization cache. The list can be set through the
authorization.cache.excludes
configuration item or via the Dashboard. For these specific topics, permission checks will always be conducted in real-time rather than relying on previous cache results, thus ensuring the timeliness of authorization outcomes.#12329 Added
broker.routing.batch_sync
configuration item to enable a dedicated process pool that synchronizes subscriptions with the global routing table in batches, thus reducing the frequency of cross-node communication that can be slowed down by network latency. Processing multiple subscription updates collectively, not only accelerates synchronization between replica nodes and core nodes in a cluster but also reduces the load on the broker pool, minimizing the risk of overloading.#12333 Added a
tags
field for actions and connectors. Similar to thedescription
field (which is a free text annotation),tags
can be used to annotate actions and connectors for filtering and grouping.#12072 GreptimeDB data integration now supports asynchronous data write operations to provide better performance.
#12194 Improved Kafka producer performance.
#12247 The bridges for InfluxDB have been split so they are available via the connectors and actions APIs. They are still backward compatible with the old bridge API.
#12299 Exposed more metrics to improve observability:
Montior API:
- Added
retained_msg_count
field to/api/v5/monitor_current
. - Added
license_quota
field to/api/v5/monitor_current
- Added
retained_msg_count
andnode_uptime
fields to/api/v5/monitor_current/nodes/{node}
. - Added
retained_msg_count
,license_quota
andnode_uptime
fields to/api/v5/monitor_current/nodes/{node}
.
Prometheus API:
- Added
emqx_cert_expiry_at
andemqx_license_expiry_at
to/api/v5/prometheus/stats
to display TLS listener certificate expiration time and license expiration time. - Added
/api/v5/prometheus/auth
endpoint to provide metrics such as execution count and running status for all authenticatiors and authorizators. - Added
/api/v5/prometheus/data_integration
endpoint to provide metrics such as execution count and status for all rules, actions, and connectors.
Limitations: Prometheus push gateway only supports the content in
/api/v5/prometheus/stats?mode=node
.For more API details and metric type information, please see swagger api docs.
- Added
#12196 Improved network efficiency during routes cleanup.
Previously, when a node was down, a delete operation for each route to that node must be exchanged between all the other live nodes. After this change, only one
match and delete
operation is exchanged between all live nodes, significantly reducing the number of necessary network packets and decreasing the load on the inter-cluster network. This optimization must be especially helpful for geo-distributed EMQX deployments where network latency can be significantly high.#12354 The concurrent creation and updates of data integrations are now supported, significantly increasing operation speeds, such as when importing backup files.
#12396 Enhanced the user import feature in the
authentication/:id/import_users
Interface:- Added a new parameter
?type=plain
for easier importing of users with plaintext passwords, complementing the existing functionality that supports hashed passwords. - Enhanced support for
content-type: application/json
, allowing HTTP Body submissions in JSON format. This extends the current capability that exclusively supportsmultipart/form-data
for CSV files.
- Added a new parameter
#11902 Enhanced EMQX's capability to facilitate MQTT message bridging through the one-way Nari SysKeeper 2000 network isolation gateway.
#12348 Supported data integration with Elasticsearch.
Bug Fixes
- #12232 Fixed an issue when cluster commit log table was not deleted after a node was forced to leave a cluster.
- #12243 Fixed a family of subtle race conditions that could lead to inconsistencies in the global routing state.
- #12269 Improved error handling in the
/clients
interface; now returns a 400 status with more detailed error messages, instead of a generic 500, for query string validation failures. - #12285 Updated the CoAP gateway to support short parameter names for slight savings in datagram size. For example,
clientid=bar
can be written asc=bar
. - #12303 Fixed the message indexing in retainer. Previously, clients with wildcard subscriptions might receive irrelevant retained messages not matching their subscription topics.
- #12305 Corrected an issue with incomplete client/connection information being passed into
emqx_cm
, which could lead to internal inconsistencies and affect memory usage and operations like node evacuation. - #12306 Fixed an issue preventing the connectivity test for the Connector from functioning correctly after updating the password parameter via the HTTP API.
- #12359 Fixed an issue causing error messages when restarting a node configured with some types of data bridges. Additionally, these bridges were at risk of entering a failed state upon node restart, requiring a manual restart to restore functionality.
- #12404 Fixed an issue where restarting a data integration with heavy message flow could lead to a stop in the collection of data integration metrics.
- #12282 Improved the HTTP API error response for MySQL bridge creation failures. It also resolved a problem with removing MySQL Sinks containing undefined columns in their SQL.
- #12291 Fixed inconsistencies in EMQX’s handling of configuration updates involving sensitive parameters, which previously led to stray
"******"
strings in cluster configuration files. - #12301 Fixed an issue with the line protocol in InfluxDB, where numeric literals were being stored as string types.
- #12317 Removed the
resource_opts.batch_size
field from the MongoDB Action schema, as it is not yet supported.
5.4.1
Release Date: 2024-01-09
Bug Fixes
#12234 Resolved compatibility issues with Open Telemetry configurations defined in
emqx.conf
from versions before EMQX 5.4.0, ensuring smooth integration of legacy configurations with the latest EMQX release.#12236 Fixed client ID generation in MQTT broker data integration to comply with MQTT 3.1 specification of 23-byte limit. Client ID is now prefixed with user-assigned Connector name, followed by the first 8 bytes of node name's SHA hash and pool member ID. If the resulting ID exceeds 23 bytes, additional SHA hash and truncation for the first 23 characters are applied to ensure compliance.
#12238 Resolved compatibility issue with the error format configurations introduced in the HTTP Action feature of EMQX version 5.3.2.
#12240 Modified the
/file_transfer
API to return file transfer configurations in their original raw format. This change prevents the conversion of time units, such as "1h", to seconds, ensuring that callers receive the initially configured values. This modification aligns with other getter APIs, maintaining consistency in data representation.#12241 Fixed a bug where configuring additional HTTP headers for S3 API interactions disrupted file transfers using the S3 storage backend, ensuring stable and uninterrupted file transfer operations.
#12246 Stopped exposing port 11883 by default in Docker and removed it from Helm charts, as this port is no longer in use.
#12249 Fixed an issue in the
/configs
API where attempting to modify a read-only configuration value resulted in a garbled response message.#12250 Resolved an issue where the
file_transfer
configuration'ssecret_access_key
value was erroneously being updated to masked stars (*****
), ensuring that the original key value remains unaltered and secure.#12256 Fixed an issue that prevented establishing connections to MySQL resources without a password.
#12264 Fixed an issue where version 5.4 replica nodes could not join clusters with core nodes running versions earlier than 5.4 during the rolling upgrade process.
5.4.0
Release Date: 2023-12-23
Enhancements
#11884 Modified the Prometheus API and configuration to implement the following improvements:
- Restructured configuration sections to group-related settings, improving readability and maintainability.
- Introduced
enable_basic_auth
configuration for basic authentication on the scrape API endpoint, enhancing security. - Maintained backward compatibility while refactoring code, avoiding breaking changes.
#11896 Introduced an enhancement for configuring sensitive authentication fields in bridges, such as passwords, tokens, and secret keys. This improvement allows the use of secrets stored as files in the file system. These secrets can be securely referenced in configuration files using the special
file://
prefix, enhancing the security of sensitive data handling in bridge configurations.#11921 Introduced Open Telemetry Logs Handler that allows to format log events in alignment with the Open Telemetry log data model. This handler facilitates the exportation of formatted log events to a configured Open Telemetry collector or back-end, thereby enhancing log management and integration capabilities.
#11935 Switched to the new
v2
routing store schema by default. The new schema improves both subscription and routing performance, especially in scenarios with concurrent subscriptions to topic filters sharing common wildcard prefixes. However, it does come with a minor increase in memory usage. This schema also eliminates the need for a separate index, thus inconsistencies in the routing state rarely encountered in previous versions should no longer be possible.If a cluster is rolling upgraded from an older version, the cluster will continue to use
v1
store until a full cluster (non-rolling) restart happens.Users can still opt for the previous schema by configuring the
broker.routing.storage_schema
option tov1
. However, this also requires a complete, non-rolling restart of the cluster to take effect.#11984 Implemented Open Telemetry distributed tracing feature.
#12017 Implemented a dedicated REST API for the import and export of configuration and user data.
#12040 Upgraded QUIC protocol stack.
#12201 Added support for hot updates to TCP/SSL/WS/WSS MQTT listener configurations. This feature allows you to modify most configuration parameters without restarting the listener and disconnecting the clients. However, there are some limitations:
For TCP/SSL listeners, changes to the following parameters will still require a listener restart and client reconnection:
bind
tcp_options.backlog
For WS/WSS (WebSocket) listeners, modifying transport-related parameters (listed below) will result in the listening socket being reopened, but established connections will remain uninterrupted.
bind
tcp_options.*
ssl_options.*
#11608 Integrated LDAP bind operation as a new authenticator, providing a more flexible and secure method for user authentication.
#11766 Implemented a preliminary Role-Based Access Control for the REST API. In this version, there are three predefined roles:
Administrator: This role can access all resources.
Viewer: This role can only view resources and data, corresponding to all GET requests in the REST API.
Publisher: Specifically tailored for MQTT message publishing, this role is confined to accessing endpoints related to message publication.
#11773 Implemented Dashboard support for audit log management. Users can utilize this page to view all change operations performed on EMQX devices and data, such as kicking out devices, creating/deleting rules, etc.
#11778 Integrated Microsoft Entra Identity (formerly known as Azure Active Directory) support into the SAML single sign-on (SSO) process.
#11811 Improved the format for the REST API key bootstrap file to support initializing key with a role.
The new form is:
api_key:api_secret:role
.role
is optional and its default value isadministrator
.#11852 Introduced a new GB/T 32960 gateway, enabling vehicles to connect with EMQX via the GBT32960 vehicular networking protocol.
#11883 Introduced a new JT/T808 gateway, enabling vehicles to connect with EMQX via the JT/T 808 vehicular networking protocol.
#11885 Introduced a new OCPP gateway for Electric vehicle (EV) charging stations to access EMQX through the OCPP (Open Charge Point Protocol).
#11971 Made
/api/v5/load_rebalance/availability_check
public, meaning it no longer requires authentication. This change simplifies the setup of load balancers.It improved the gracefulness of the rebalance/evacuation process during the wait health check phase. The connections to nodes marked for eviction are now not prohibited during this phase. During this phase it is unknown whether these nodes are all marked unhealthy by the load balancer, so prohibiting connections to them may cause multiple unsuccessful reconnection attempts.
#12013 The data bridging design has been adjusted to split it into connectors and actions (Sinks). Connectors are used to manage the integration of data with external systems and can be reused across multiple actions, while actions are used to configure how data is processed. This design provides greater flexibility and scalability, resulting in clearer data integration configuration and management.
The adjusted data bridges includes PostgreSQL, Timescale, and Matrix, which have now been split into connectors and actions APIs, but they remain backward compatible with the old data bridge API.
#12016 Enhanced license key management.
EMQX can now load the license key from a specified file. This is enabled by setting the
license.key
configuration to a file path, which should be prefixed with"file://"
. Also added the ability to revert to the default trial license by settinglicense.key = default
. This option simplifies the process of returning to the trial license if needed.#12129 Renewed the default license, replacing the old license issued in January 2023. At the same time, the license capacity has been adjusted from 100 concurrent connections to 25 concurrent connections.
Bug Fixes
#10976 Fixed topic-filter overlapping handling in shared subscription. In the previous implementation, the storage method for subscription options did not provide adequate support for shared subscriptions. This resulted in message routing failures and leakage of routing tables between nodes during the "subscribe-unsubscribe" process with specific order and topics.
#12048 Fixed COAP gateway bug that caused it to ignore subscription options.
#12078 Upgraded grpc-erl to 0.6.12. This update addresses a potential deadlock issue where the grpc client started dependent apps lazily.
#12081 Updated
gen_rpc
library to version 3.3.1. The new version includes several performance improvements:Avoiding allocating extra memory for the packets before they are sent to the wire in some cases.
Bypassing network for the local calls.
Avoid senstive data leaking in debug logs #12202
#12111 Fixed an issue when API tokens were sometimes unavailable immediately after login due to race condition.
#12121 Fixed an issue where nodes in the cluster would occasionally return a stale view when updating configurations on different nodes concurrently.
#12158 Fixed an issue when the rule engine cannot connect to Redis hosted by Upstash.
Before the fix, after establishing a TCP connection with the Redis service, the Redis driver of EMQX used Inline Commands to send AUTH and SELECT commands. However, the
upstash
Redis service does not support Inline Commands, which causes the rule engine to fail to connect to theupstash
Redis service. After the fix, the Redis driver of EMQX uses RESP (REdis Serialization Protocol) to send AUTH and SELECT commands.#12176 Always acknowledge
DISCONNECT
packet to MQTT-SN client regardless of whether the connection has been successfully established before.#12180 Fix an issue where DTLS enabled MQTT-SN gateways could not be started, caused by incompatibility of default listener configuration with the DTLS implementation.
#12219 Fix file transfer S3 config secret deobfuscation issue while performing config updates from dashboard.
5.3.2
Release Date: 2023-12-01
Enhancements
#11752 Changed default RPC driver from
gen_rpc
torpc
for core-replica database synchronization.This improves core-replica data replication latency.
#11785 Allowed users with the "Viewer" role to change their own passwords. However, those with the "Viewer" role do not have permission to change the passwords of other users.
#11787 Improved the performance of the
emqx
command.#11790 Added validation to Redis commands in Redis authorization source. Additionally, this improvement refines the parsing of Redis commands during authentication and authorization processes. The parsing now aligns with
redis-cli
compatibility standards and supports quoted arguments.#11541 Enhanced file transfer capabilities. Now, clients can use an asynchronous method for file transfer by sending commands to the
$file-async/...
topic and subscribing to command execution results on the$file-response/{clientId}
topic. This improvement simplifies the use of the file transfer feature, particularly suitable for clients using MQTT v3.1/v3.1.1 or those employing MQTT bridging. For more details, please refer to EIP-0021.
Bug Fixes
#11757 Fixed the error response code when downloading non-existent trace files. Now the response returns
404
instead of500
.#11762 Fixed an issue in EMQX's
built_in_database
authorization source. With this update, all Access Control List (ACL) records are completely removed when an authorization source is deleted. This resolves the issue of residual records remaining in the database when re-creating authorization sources.#11771 Fixed validation of Bcrypt salt rounds in authentication management through the API/Dashboard.
#11780 Fixed validation of the
iterations
field of thepbkdf2
password hashing algorithm. Now,iterations
must be strictly positive. Previously, it could be set to 0, which led to a nonfunctional authenticator.#11791 Fixed an issue in the EMQX CoAP Gateway where heartbeats were not effectively maintaining the connection's active status. This fix ensures that the heartbeat mechanism properly sustains the liveliness of CoAP Gateway connections.
#11797 Modified HTTP API behavior for APIs managing the
built_in_database
authorization source. They will now return a404
status code ifbuilt_in_database
is not set as the authorization source, replacing the former20X
response.#11965 Improved the termination of EMQX services to ensure a graceful stop even in the presence of an unavailable MongoDB resource.
#11975 This fix addresses an issue where redundant error logs were generated due to a race condition during simultaneous socket closure by a peer and the server. Previously, concurrent socket close events triggered by the operating system and EMQX resulted in unnecessary error logging. The implemented fix improves event handling to eliminate unnecessary error messages.
#11987 Fixed a bug where attempting to set the
active_n
option on a TCP/SSL socket could lead to a connection crash.The problem occurred if the socket had already been closed by the time the connection process attempted to apply the
active_n
setting, resulting in acase_clause
crash.#11731 Added hot configuration support for the file transfer feature.
#11754 Improved the log formatting specifically for the Postgres bridge in EMQX. It addresses issues related to Unicode characters in error messages returned by the driver.
5.3.1
Release Date: 2023-11-14
Enhancements
- #11637 Added extra diagnostic checks to help debug issues when mnesia is stuck waiting for tables. Library Updates:
ekka
has been upgraded to version 0.15.15, andmria
to version 0.6.4. - #11581 Feature Preview: Planned for EMQX v5.4.0, introducing the concepts of Connector and Action base on data bridge. The existing data bridge will be gradually migrated to Connector and Action. Connector are designed to manage the integration with external systems, while Actions are solely used to configure the data processing methods. Connector can be reused across multiple Actions, providing greater flexibility and scalability. Currently, the migration has been completed for Kafka producer and Azure Event Hub producer.
- The Dashboard now supports MQTT 5.0 publish attribute settings for the rule engine's message republish action, allowing users more flexibility in publishing messages.
Bug Fixes
#11565 Upgraded jq library from v0.3.10 to v0.3.11. In this version, jq_port programs are initiated on-demand and will not appear in users' processes unless the jq function in EMQX is used. Additionally, idle jq_port programs will auto-terminate after a set period. Note: Most EMQX users are running jq in NIF mode and will not be affected by this update.
#11676 Hid a few pieces of sensitive information from debug-level logs.
#11697 Disabled outdated TLS versions and cipher suites in the EMQX backplane network (
gen_rpc
). Added support for tlsv1.3 on the backplane and introduced new configuration parameters:EMQX_RPC__TLS_VERSIONS
andEMQX_RPC__CIPHERS
.The corresponding
gen_rpc
PR: https://github.com/emqx/gen_rpc/pull/36#11734 Fixed clustering in IPv6 network. Added new configurations
rpc.listen_address
andrpc.ipv6_only
to allow EMQX cluster RPC server and client to use IPv6.#11747 Updated QUIC stack to msquic 2.2.3.
#11796 Fixed rpc schema to ensure that client/server uses same transport driver.
#11798 Fixed the issue where the node could not start after executing
./bin/emqx data import [FILE]
.The connection between
apikey_key
andapikey_name
is also enhanced for better consistency and unique identification.apikey_key
: When generating an API key via the dashboard,apikey_key
will now create a unique value derived from the provided human-readableapikey_name
.apikey_name
Conversely, when using a bootstrap file to generate an API key,apikey_name
will be generated as a unique value based on the associatedapikey_key
.
#11813 Fixed the schema to ensure that RPC client SSL port aligns with the configured server port. This fix also guarantees that the RPC ports are correctly opened in the Helm chart.
#11819 Upgraded opentelemetry library to v1.3.1-emqx. This opentelemetry release fixes invalid metrics timestamps in the exported metrics.
#11861 Fixed excessive warning message printed in remote console shell.
#11722 Fixed an issue where a Kafka Producer bridge with
sync
query mode would not buffer messages when in theconnecting
state.#11724 Fixed a metrics-related issue where messages sent to Kafka would be counted as failed even when they were successfully transmitted afterward due to internal buffering.
#11728 Enhanced the LDAP filter string parser with the following improvements:
- Automatic escaping of special characters within filter strings.
- Fixed a bug that previously prevented the use of
dn
as a filter value.
#11733 Resolved an incompatibility issue that caused crashes during session takeover or channel eviction when the session was located on a remote node running EMQX v5.2.x or an earlier version.
#11750 Eliminated logging and tracing of HTTP request bodies in HTTP authentification and HTTP bridges.
#11760 Simplified the CQL query used for the Cassandra bridge health check, which was previously generating warnings in the Cassandra server logs.
#11886 Fixed backward plugin compatibility.
Currently, EMQX validates hook point names, and invalid hook points cannot be used for hook registration. However, some older versions of plugin templates used misspelled hook points, and actual plugins in use may also have this issue. To maintain compatibility with these older plugins, we allow the use of the old hook points for hook registration, but we issue deprecated warnings for them. As before, these hooks will not be called.
#11897 Fixed the issue of waiting for a loop race condition during node configuration synchronization when cluster nodes are started approximately at the same time.
5.3.0
Release Date: 2023-09-29
Enhancements
#11597 Upgraded ekka to 0.15.13, which incorporates the following changes:
- Upgraded Mria to 0.6.2.
- Introduced the ability to configure the bootstrap data sync batch size, as detailed in Mria PR.
- Enhanced the reliability of mria_membership processes, as described in Mria PR.
- Fix log message formatting error.
- Added
node.default_bootstrap_batch_size
option to EMQX configuration. Increasing the value of this option can greatly reduce a replicant node startup time, especially when the EMQX cluster interconnect network latency is high and the EMQX built-in database holds a large amount of data, e.g. when the number of subscriptions is high.
#11620 Added a new rule-engine SQL function
bytesize
to get the size of a byte-string. e.g.SELECT * FROM "t/#" WHERE bytesize(payload) > 10
.#11642 Updated to quicer version 0.0.200 in preparation for enabling openssl3 support for QUIC transport.
#11610 Implemented a preliminary Role-Based Access Control for the Dashboard.
In this version, there are two predefined roles:
Administrator: This role could access all resources.
Viewer: This role can only view resources and data, corresponding to all GET requests in the REST API.
#11631 Added Single Sign-On (SSO) feature and integrated with LDAP.
#11656 Integrated the SAML 2.0 Support for SSO.
#11599 Supported audit logs to record operations from CLI, REST API, and Dashboard in separate log files.
Bug Fixes
- #11682 Fixed an issue where logging would stop if "Rotation Size" would be set to
infinity
on file log handlers. - #11567 Improve EMQX graceful shutdown (
emqx stop
command):- Increase timeout from 1 to 2 minutes.
- Printed an error message if EMQX can't stop gracefully within the configured timeout.
- Print periodic status messages while EMQX is shutting down.
- #11584 Fixed telemetry reporting error on Windows when os_mon module is unavailable.
- #11605 Lowered CMD_overridden log severity from warning to info.
- #11622 Upgraded rpc library gen_rpc from 2.8.1 to 3.1.0.
- #11623 Upgraded library
esockd
from 5.9.6 to 5.9.7. This upgrade included:- Enhancements regarding proxy protocol error and timeout. esockd pr#178
- Lowered
ssl_error
exceptions to info-level logging. esockd pr#180 - Malformed MQTT packet parsing exception log level is lowered from
error
toinfo
. - In command
emqx ctl listeners
output, theshutdown_count
counter is incremented when TLS handshake failure (ssl_error
) or Malformed packet (frame_error
) happens.
- #11661 Fixed log formatter when log.HANDLER.formatter is set to 'json'. The bug was introduced in v5.0.4 where the log line was no longer a valid JSON, but prefixed with timestamp string and level name.
- #11627 Fixed resources cleanup in HStreamdB bridge. Prior to this fix, HStreamDB bridge might report errors during bridge configuration updates, since hstreamdb client/producer were not stopped properly.
5.2.1
Release Date: 2023-09-20
Enhancements
#11487 The bcrypt work factor is limited to the range 5-10, because higher values consume too much CPU resources. Bcrypt library is updated to allow parallel hash evaluation.
#11568 Added support for defining templates for MQTT 5.0 publish properties and user properties in Republish rule action.
#11612 During node evacuation, evacuate all disconnected sessions, not only those started with
clean_start
set tofalse
.#11532 Improved error messaging for better clarity when parsing invalid packets.
Bug Fixes
#11493 Fixed response examples for
/api/v5/publish
bad request in RESP API documentation. Previously the documentation example said that the bad request response could return a list in the body which was not actually the case.#11499 Upgraded Erlang/OTP to version 25.3.2-2, which now excludes sensitive data from mnesia_hook log messages.
#11506 Previously, attempting to download a non-existent trace log file would result in downloading an empty file. After implementing this fix, when attempting to download an empty trace log file using the GET request
/api/v5/trace/clientempty/download
, the server will now respond with a 404 status code and the following JSON message:{"code":"NOT_FOUND","message":"Trace is empty"}
. This response will be triggered if no events matching the trace condition are found in the log file.#11522 Improved rule engine schema registry error message when schema name exceeds the permissible length.
#11531 Fixed an issue where authorization cache cleaning CLI was not working properly for specific client ID.
#11564 Fixed cluster partition autoheal functionality. Implemented autohealing for the clusters that split into multiple partitions.
#11568 Fixed an issue where an ill-defined built-in rule action config could be interpreted as a custom user function.
#11394 Upgraded Kafka producer client
wolff
from 1.7.6 to 1.7.7. This fixed a potential race condition that might cause all Kafka producers to crash if some failed to initialize.#11401 Fixed the behavior of the rule SQL
mongo_date
function in SQL statement testing in the EMQX Dashboard. The rule SQLmongo_date
function now returns a string with the formatISODate(*)
, where * is an ISO date string when running rules in test mode. This format aligns with how MongoDB stores dates.#11547 Fixed several emqx_bridge issues:
- Fixed Cassandra bridge connect error occurring when the bridge is configured without username/password (Cassandra doesn't require user credentials when it is configured with
authenticator: AllowAllAuthenticator
.) - Fixed SQL Server bridge connect error caused by an empty password.
- Made
username
a required field in Oracle bridge. - Fixed IoTDB bridge error caused by setting base URL without a scheme (e.g.
<host>:<port>
).
- Fixed Cassandra bridge connect error occurring when the bridge is configured without username/password (Cassandra doesn't require user credentials when it is configured with
#11630 Fixed an issue where the core node could get stuck in the
mria_schema:bootstrap/0
state, preventing new nodes from joining the cluster.
5.2.0
Release Date: 2023-09-07
Enhancements
#10697 This enhancement enables the configuration of the
minReadySeconds
for the StatefulSet. This feature allows for the introduction of a time gap between the restarts of individual pods triggered by upgrade or restart commands.#11124 Released packages for Amazon Linux 2023.
#11289 Released packages for Debian 12.
#11290 Updated the
jq
dependency to version 0.3.10, which includes an update to theoniguruma
library to version 6.9.8 with a few minor security fixes.#11291 Updated RocksDB version to 1.8.0-emqx-1 via ekka update to 0.15.6.
#11390 Added
node.broker_pool_size
,node.generic_pool_size
,node.channel_cleanup_batch_size
options to EMQX configuration. Tuning these options can significantly improve the performance if cluster interconnect network latency is high.#11429 Added an option to configure detection of the legacy protocol in MondoDB connectors and bridges.
#11436 Added a new API endpoint
DELETE/banned
for clearing allbanned
data.#11438 Changed the type of the
mqtt.max_packet_size
from string to byteSize for a better representation of the valid numeric range. Strings will still be accepted for backward compatibility.#11469 Added support for specifying username in Redis authentication.
#11496 Disabled the Erlang VM Prometheus exporter by default to improve performance and security.
#11497 Enhanced broker metrics collection and export by adding new metrics for messages, overload protection, authorization, authentication, and improving naming consistency for OpenTelemetry.
#10647 Implemented GreptimeDB data integration.
#11261 Implemented Amazon Kinesis Data Streams producer data integration.
#11329 Implemented Azure Event Hub Producer data integration.
#11363 Added TLS connection support to the RabbitMQ bridge.
#11367 Ported GCP IoT Hub authentication support from EMQX 4.4.
#11386 Integrated LDAP as a new authenticator.
#11392 Integrated LDAP as an authorization source.
#11402 Added support for using placeholders to define MQTT Topic in Kafka Consumer bridge topic mappings. This allows dynamically setting the MQTT Topic.
#11403 Added support for defining message attributes and ordering key templates for GCP PubSub Producer bridge.
Also updated our HOCON library to fix an issue where objects in an array were concatenated even if they were laid on different lines.
#11459 Added the option to configure health check interval for Kafka bridges.
#11478 Added HStreamDB bridge support (both TCP and TLS connection allowed), adapted to the HStreamDB
v0.16.1
.Updated driver to
0.4.5+v0.16.1
in PR#11530.#11389 Improved retained message publishing latency by consolidating multiple index update operations into a single Mnesia activity, leveraging the new APIs introduced in Mria 0.6.0.
#11396 Introduced topic index for the rule engine runtime to speed up matching messages' topics to topic filters configured in rule definitions by avoiding full scan of the rule set, significantly improving EMQX's performance when handling a substantial number of rules.
#11399 Improved the placeholder syntax in the rule engine. The republishing actions support placeholder syntax to dynamically fill in the content of strings in the payload variable. The format of the placeholder syntax is
\${key}
. Before this improvement, thekey
in\${key}
could only contain letters, numbers, and underscores. Now thekey
supports any UTF8 characters.#11405 Made the error message for
date_to_unix_ts
function more understandable.#11490 Added fast error handling for undefined passwords in various authentication backends. This improves the consistency and user-friendliness of the authentication process.
Bug Fixes
#11065 Silenced irrelevant error messages during EMQX shutdown.
#11279 Fixed an issue where clients could not send messages with large payloads when debug/trace logging was enabled in EMQX.
#11296 Added support for importing additional configurations from EMQX backup file using the
emqx ctl import
command):- rule_engine (previously not imported due to the bug)
- topic_metrics (previously not implemented)
- slow_subs (previously not implemented)
#11327 Updated ekka to version 0.15.8, mria to version 0.15.8, and optvar to 1.0.5. This fixes occasional assertion failures.
#11346 Updated ekka to version 0.15.9. This fixes dangling etcd locks that occurred when acquiring the lock failed with a timeout.
#11347 Ensured that OCSP request path is properly URL encoded.
#11352 Fixed a crash issue that occurred when starting on Windows or any other platform without RocksDB support.
#11388 Increased
emqx_router_sup
restart intensity to improve tolerance for occasional crashes that can occur under normal conditions, without necessitating the shutdown of the entire EMQX application. For example, mria write/delete call delegated from a replicant to a core node byemqx_router_helper
may fail, if the core node undergoes stopping, restarting, or is in an unready state. The modified restart intensity ensures that the system remains stable and operational.This fixes issues found when trying to upgrade from 5.1.3 where that option was set in the configuration files or persisted in EMQX Operator settings.
#11424 Added a check for the maximum value of the timestamp in the API to ensure it is a valid Unix timestamp.
#11445 Removed os_mon application monitor support on Windows platforms to prevent VM crashes. Functionality remains on non-Windows platforms.
#11454 Fixed crashing when debugging/tracing with large payloads (introduced in #11279).
#11456 Removed validation that enforced non-empty PEM for the CA cert file, allowing the CA certificate file PEM to be empty.
#11466 Fixed a crash that occurred when setting the
ssl_options.ciphers
configuration option to an empty string ("").#11480 Improves the error handling and testing of SQL functions in the rule engine when rule functions receive bad arguments.
#11520 Fixed issue where
packets_connack_sent
metric was not incremented on CONNACK packets sent with non-zeroack_flag
.#11523 Corrected a misleading prompt when specifying invalid certificates/keys for the
/configs
API.#11534 Fixed the increment on data bridge statistics when the bridge is unhealthy. Now, messages sent to unhealthy bridges are counted as dropped messages.
#11540 Improved HTTP response when attempting to create a bridge with an invalid name.
#11548 Fixed an issue that prevented the plugin order from being updated across the entire cluster.
#11366 Fixed an issue that could prevent a pod from starting if some bridge configurations were specified in
bootstrapConfig
using EMQX Operator.#11453 Fixed an issue that would yield false negatives when testing the connectivity of InfluxDB bridges.
#11461 Aligned the timeout for testing bridge connectivity more closely with the configured health check timeout.
#11492 Fixed an issue that would yield false negatives when testing the connectivity of GreptimeDB bridges.
#11508 Fixed error handling in Kafka bridge when headers are translated to an invalid value.
#11513 Fixed a bug that prevented the Kafka Producer bridge from using the correct template for the
timestamp
field.#11527 Fixed an issue related to Kafka header template handling. The issue occurs when placeholders are resolved into an array of key-value pairs (e.g.:
[{"key": "foo", "value": "bar"}]
).
5.1.1
Release Date: 2023-07-27
Enhancements
#10667 The MongoDB connector and bridge have been refactored into a separate app to improve the code structure.
#11115 Added info logs to indicate when buffered messages are dropped due to time-to-live (TTL) expiration.
#11133 Renamed
deliver_rate
todelivery_rate
in the configuration ofretainer
, while being compatible with the previousdeliver_rate
.#11137 Refactored the Dashboard listener configuration to use a nested
ssl_options
field for SSL settings.#11138 Changed the default value of k8s
api_server
fromhttp://127.0.0.1:9091
tohttps://kubernetes.default.svc:443
.emqx_ctl conf show cluster
no longer displays irrelevant configuration items whendiscovery_strategy=static
. Configuration information related toetcd/k8s/dns
will not be shown.- Removed
zones
(deprecated config key) fromemqx_ctl conf show_keys
.
#11165 Removed the
/configs/limiter
API fromswagger.json
. Only the API documentation was removed, and the/configs/limiter
API functionalities remain unchanged.#11166 Added 3 random SQL functions to the rule engine:
random()
: Generates a random number between 0 and 1 (0.0 =< X < 1.0).uuid_v4()
: Generates a random UUID (version 4) string.uuid_v4_no_hyphen()
: Generates a random UUID (version 4) string without hyphens.
#11180 Added a new configuration API
/configs
(GET/PUT) that supports reloading the HOCON format configuration file.#11226 Unified the listener switch to
enable
, while being compatible with the previousenabled
.#11249 Added
/license/setting
REST API endpoint to read and update licensed connections usage alarm watermark.#11251 Added the
/cluster/topology
REST API endpoint:A
GET
request to this endpoint returns the cluster topology, showing connections between RLOG core and replicant nodes.#11253 The Webhook/HTTP bridge has been refactored into its own Erlang application. This allows for more flexibility in the future and allows the bridge to be run as a standalone application.
#11079 Added support for custom headers in messages for Kafka bridge producer mode.
#11132 Added support for MQTT action authorization based on QoS level and Retain flag values. Now, EMQX can verify whether clients have the permission to publish/subscribe using specific QoS levels, and whether they have the permission to publish retained messages.
#11207 Updated the driver versions of multiple data bridges to enhance security and ensure that sensitive data will not be leaked. This includes:
- TDengine
- MongoDB
- MySQL
- Clickhouse
#11241 Schema Registry has been refactored into its own Erlang application. This allows for more flexibility in the future.
#11020 Upgraded emqtt dependency to prevent sensitive data leakage in the debug log.
#11135 Improved time offset parser in rule engine and return uniform error codes.
#11236 Improved the speed of clients querying in REST API
/clients
endpoint with default parameters.
Bug Fixes
#11004 Wildcards are no longer allowed for the destination topic in topic rewrite.
#11026 Addressed an inconsistency in the usage of
div
andmod
operations within the rule engine. Previously, thediv'
operation could only be used as an infix operation, andmod
could only be applied through a function call. Now, bothdiv
andmod
can be used via function call syntax and infix syntax.#11037 When starting an HTTP connector, EMQX now returns a descriptive error in case the system is unable to connect to the remote target system.
#11039 Fixed database number validation for Redis connector. Previously, negative numbers were accepted as valid database numbers.
#11074 Fixed a bug to adhere to Protocol spec MQTT-5.0 [MQTT-3.8.3-4].
#11077 Fixed a crash when updating listener binding with a non-integer port.
#11094 Fixed an issue where connection errors in Kafka Producer would not be reported when reconnecting the bridge.
#11103 Updated
erlcloud
dependency.#11106 Added validation for the maximum number of
worker_pool_size
of a bridge resource.Now the maximum amount is 1024 to avoid large memory consumption from an unreasonable number of workers.
#11118 Ensured that validation errors in REST API responses are slightly less confusing. Now, if there are out-of-range errors, they will be presented as
{"value": 42, "reason": {"expected": "1..10"}, ...}
, replacing the previous usage ofexpected_type
withexpected
.#11126 Rule metrics for async mode bridges will set failure counters correctly now.
#11134 Fixed the value of the uppercase
authorization
header not being obfuscated in the log.#11139 The Redis connector has been refactored into its own Erlang application to improve the code structure.
#11145 Added several fixes and improvements in Ekka and Mria.
Ekka:
- Improved cluster discovery log messages to consistently describe actual events. Ekka PR
- Removed deprecated cluster auto-clean configuration parameter (it has been moved to Mria). Ekka PR
Mria:
- Ping now only runs on replicant nodes. Previously,
mria_lb
was trying to ping both stopped and running replicant nodes, which could result in timeout errors. Mria PR - Used
null_copies
storage when copying$mria_rlog_sync
table. This fix has no effect on EMQX for now, as$mria_rlog_sync
is only used inmria:sync_transaction/2,3,4
, which is not utilized by EMQX. Mria PR
#11148 Fixed an issue when nodes tried to synchronize configuration update operations to a node which has already left the cluster.
#11150 Wait for Mria table when emqx_psk app is being started to ensure that PSK data is synced to replicant nodes even if they don't have init PSK file.
#11151 The MySQL connector has been refactored into its own Erlang application to improve the code structure.
#11158 Wait for Mria table when the mnesia backend of retainer starts to avoid a possible error of the retainer when joining a cluster.
#11162 Fixed an issue in webhook bridge where, in async query mode, HTTP status codes like 4XX and 5XX would be treated as successes in the bridge metrics.
#11164 Reintroduced support for nested (i.e.:
${payload.a.b.c}
) placeholders for extracting data from rule action messages without the need for callingjson_decode(payload)
first.#11172 Fixed the
payload
field in rule engine SQL being duplicated in the below situations:- When using a
foreach
sentence without theas
sub-expression and selecting all fields (using the*
or omitting thedo
sub-expression).
For example:
FOREACH payload.sensors FROM "t/#"
- When selecting the
payload
field and all fields.
For example:
SELECT payload.sensors, * FROM "t/#"
- When using a
#11174 Fixed the encoding of the
server
key coming from an ingress MQTT bridge.Before the fix, it was encoded as a list of integers corresponding to the ASCII characters of the server string.
#11184 Config value for
mqtt.max_packet_size
now has a max value of 256MB as defined by the protocol.#11192 Fixed an issue with producing invalid HOCON file when an atom type was used. Also removed unnecessary
"
around keys and latin1 strings from HOCON file.#11195 Fixed an issue where the REST API could create duplicate subscriptions for specified clients of the Stomp gateway.
#11206 Made the
username
andpassword
params of CoAP client optional in connection mode.#11208 Fixed the issue of abnormal data statistics for LwM2M clients.
#11211 HTTP API
DELETE
operations on non-existent resources now consistently returns404
.#11214 Fixed a bug where node configuration may fail to synchronize correctly when the node joins the cluster.
#11229 Fixed an issue that prevented plugins from starting/stopping after changing configuration via
emqx ctl conf load
.#11237 The
headers
default value in /prometheus API should be a map instead of a list.#11250 Fixed a bug when the order of MQTT packets withing a WebSocket packet will be reversed.
#11271 Ensured that the range of all percentage type configurations is from 0% to 100% in the REST API and configuration. For example,
sysom.os.sysmem_high_watermark=101%
is invalid now.#11272 Fixed a typo in the log, where an abnormal
PUBREL
packet was mistakenly referred to aspubrec
.#11281 Restored support for the special
$queue/
shared subscription topic prefix.#11294 Fixed
emqx ctl cluster join
,leave
, andstatus
commands.#11306 Fixed rule action metrics inconsistency where dropped requests were not accounted for.
#11309 Improved startup order of EMQX applications. Simplified build scripts and improved code reuse.
#11322 Added support for importing additional configurations from EMQX backup file (
emqx ctl import
command):- rule_engine (previously not imported due to the bug)
- topic_metrics (previously not implemented)
- slow_subs (previously not implemented).
#10645 Changed health check for Oracle Database, PostgreSQL, MySQL and Kafka Producer data bridges to ensure target table/topic exists.
#11107 MongoDB bridge health check now returns the failure reason.
#11139 The Redis bridge has been refactored into its own Erlang application to improve the code structure and to make it easier to maintain.
#11151 The MySQL bridge has been refactored into its own Erlang application to improve the code structure and to make it easier to maintain.
#11163 Hid
topology.pool_size
in MondoDB bridges and fixed it to 1 to avoid confusion.#11175 Now when using a nonexistent hostname for connecting to MySQL, a 400 error is returned rather than 503 in the REST API.
#11198 Fixed global rebalance status evaluation on replicant nodes. Previously,
/api/v5/load_rebalance/global_status
API method could return incomplete results if handled by a replicant node.#11223 In InfluxDB bridging, mixing decimals and integers in a field may lead to serialization failure in the Influx Line Protocol, resulting in the inability to write to the InfluxDB bridge (when the decimal point is 0, InfluxDB mistakenly interprets it as an integer).
See also: InfluxDB v2.7 Line-Protocol.
#11225 The
username
field in PostgreSQL/Timescale/MatrixDB bridges configuration is now a required one.#11242 Restarted emqx_ee_schema_registry when a node joins a cluster. As emqx_ee_schema_registry uses Mria tables, a node joining a cluster needs to restart this application in order to start relevant Mria shard processes, ensuring a correct behaviour in Core/Replicant mode.
#11266 Fixed and improved support for TDengine
insert
syntax:Added support for inserting into multi-table in the template.
For example:
insert into table_1 values (${ts}, '${id}', '${topic}') table_2 values (${ts}, '${id}', '${topic}')
Added support for mixing prefixes/suffixes and placeholders in the template.
For example:
insert into table_${topic} values (${ts}, '${id}', '${topic}')
Note: This is a breaking change. Previously, the values of string type were quoted automatically, but now they must be quoted explicitly.
For example:
insert into table values (${ts}, '${a_string}')
#11307 Fixed check for table existence to return a more friendly message in the Oracle bridge.
#11316 Fixed Pool Size value not being considered in Oracle Bridge.
#11326 Fixed return error checking on table validation in the Oracle bridge.
5.1.0
Release Date: 2023-06-21
Enhancements
- #11035 Upgraded Cassandra driver to avoid username and password leakage in data bridge logs.
- #10584 Added log level configuration to SSL communication
- #10678 Optimized counter increment calls to avoid work if increment is zero.
- #10690 Added a retry mechanism to webhook bridge that attempts to improve throughput. This optimization retries request failures without blocking the buffering layer, which can improve throughput in situations of high messaging rate.
- #10702 Introduced a more straightforward configuration option
keepalive_multiplier
and deprecate the oldkeepalive_backoff
configuration. After this enhancement, EMQX checks the client's keepalive timeout status period by multiplying the "Client Requested Keepalive Interval" withkeepalive_multiplier
. - #10698 Optimized memory usage when accessing the configuration during runtime.
- #10778 Refactored Pulsar Producer bridge to avoid leaking resources in case bridge crashed during initialization phase.
- #10813 Refactored Kafka Producer and Consumer bridges to avoid leaking resources in case bridge crashed during initialization phase.
- #10858 A new utility function timezone_to_offset_seconds/1 has been added to the rule engine SQL language. This function converts a timezone string (for example, "+02:00", "Z" and "local") to the corresponding offset in seconds.
- #10841 Added a schema validation to ensure message key is not empty when "key_dispatch" strategy is selected in Kafka and Pulsar Producer bridges.
- #10754 The MQTT bridge has been enhanced to utilize connection pooling and leverage available parallelism, substantially improving throughput. As a consequence, single MQTT bridge now uses a pool of
clientid
s to connect to the remote broker. - #10782 Added a new
deliver_rate
option to the retainer configuration, which can limit the maximum delivery rate per session in the retainer. - #10877 Upgraded RocketMQ driver to enhance security for sensitive data.
- #10598 Provided a callback method of Unary type in ExProto to avoid possible message disorder issues.
- #10895 Refactored most of the bridges to avoid resource leaks in case bridge crashed during initialization phase.
- #10790 Optimized access to configuration in runtime by reducing overhead of reading configuration per zone.
- #10892 Added the requirement for setting SID or Service Name in Oracle Database bridge creation.
- #10910 The data bridge resource option
auto_restart_interval
was deprecated in favor ofhealth_check_interval
, andrequest_timeout
was renamed torequest_ttl
. Also, the defaultrequest_ttl
value went from 15 seconds to 45 seconds. The previous existence of bothauto_restart_interval
andhealth_check_interval
was a source of confusion, as both parameters influenced the recovery of data bridges under failures. An inconsistent configuration of those two parameters could lead to messages being expired without a chance to retry. Now,health_check_interval
is used both to control the interval of health checks that may transition the data bridge intodisconnected
orconnecting
states, as well as recovering fromdisconnected
. - #10929 Upgraded Erlang/OTP to 25.3.2-1.
- #10909 Removed the deprecated HTTP APIs for gateways.
- #10908 Refactored the RocketMQ bridge to avoid resources leaks in case bridge crashed during initialization phase.
- #10924 Refactored Influxdb bridge connector to avoid resource leaks in case bridge crashed during initialization phase.
- #10944 Improved the GCP PubSub bridge to avoid a potential issue that the bridge could fail to send messsages after node restart.
- #10933 Added support for configuring TCP keep-alive in MQTT/TCP and MQTT/SSL listeners.
- #10948 Added
live_connections
field for some HTTP APIs, i.e:/monitor_current
,/monitor_current/nodes/{node}
/monitor/nodes/{node}
,/monitor
/node/{node}
,/nodes
- #10941 Improved the collection speed of Prometheus metrics when setting
prometheus.vm_dist_collector=disabled
and metricerlang_vm_statistics_run_queues_length_total
is renamed toerlang_vm_statistics_run_queues_length
- #10985 Renamed
emqx ctl
commandcluster_call
toconf cluster_sync
. The old commandcluster_call
is still a valid command, but not included in usage info. - #10988 Improved log security when data bridge creation fails to ensure sensitive data is always obfuscated.
- #10926 Allowed
enable
as well asenabled
as the state flag for listeners. Prior to this change, listener can be enable/disabled by setting thetrue
orfalse
on theenabled
config. This is slightly different naming comparing to other state flags in the system. Now theenable
flag is added as an alias in listener config. - #10970 A query_mode parameter has been added to the Kafka producer bridge. This parameter allows you to specify if the bridge should use the asynchronous or synchronous mode when sending data to Kafka. The default is asynchronous mode.
- #10676 Added CLI commands
emqx ctl export
andemqx ctl import
for importing/exporting configuration and user data. This allows exporting configurations and built-in database data from a running EMQX cluster and importing them into the same or another running EMQX cluster. - #11003 Added an option to configure TCP keepalive in Kafka bridge.
- #10961 Added support for unlimited max connections for gateway listeners by allowing infinity as a valid value for the
max_connections
field in the configuration and HTTP API. - #11019 Improved log security for JWT, now it will be obfuscated before print.
- #11024 Added a small improvement to reduce the chance of seeing the
connecting
state when creating/updating a Pulsar Producer bridge. - #11034 Hid the broker config and changed the
broker.shared_subscription_strategy
tomqtt.shared_subscription_strategy
as it belongs tomqtt
. - #11045 The listener's authentication and zone related apis have been officially removed in version
5.1.0
. - #11062 Renamed config
log.file.to
tolog.file.path
.
Bug Fixes
#11018 Fixed multiple issues with the Stomp gateway, including:
- Fixed an issue where
is_superuser
was not working correctly. - Fixed an issue where the mountpoint was not being removed in message delivery.
- After a message or subscription request fails, the Stomp client should be disconnected immediately after replying with an ERROR message.
- Fixed an issue where
#11051 Added validation to ensure that certificate
depth
(listener SSL option) is a non negative integer.#10563 Corrected an issue where the no_local flag was not functioning correctly in subscription.
#10653 Stored gateway authentication TLS certificates and keys in the data directory to fix the problem of memory leakage.
#10682 Fixed the timestamp for the will message is incorrectly assigned at the session creation time, now this timestamp is the disconnected time of the session.
#10701 RPM package for Amazon Linux 2 did not support TLS v1.3 as it was assembled with Erlang/OTP built with openssl 1.0.
#10677 Fixed an issue in the Rule API where attempting to delete a non-existent rule resulted in a 404 HTTP error code response.
#10715 Support for getting the client certificate in the client.connected hook. Previously, this data was removed after the connection was established to reduce memory usage.
#10737 Fixed the issue where the HTTP API interface of Gateway cannot handle ClientIDs with special characters, such as:
!@#$%^&*()_+{}:"<>?/
.#10809 Addressed
** ERROR ** Mnesia post_commit hook failed: error:badarg
error messages happening during node shutdown or restart. Mria pull request: https://github.com/emqx/mria/pull/142#10807 The debug-level logs related to license checks will no longer be printed. These logs were generated too frequently and could interfere with log recording.
#10818 Fixed
emqx_ctl traces
command error where thetraces start
command in theemqx_mgmt_cli
module was not working properly with some filters.#10600 Deleted emqx_statsd application.
#10820 Fixed the issue where newly added nodes in the cluster would not apply the new license after a cluster license update and would continue to use the old license. Sometimes the new node must start with a outdated license. e.g. use emqx-operator deployed and needed to scale up after license expired. At the time the cluster's license key already updated by API/CLI, but the new node won't use it.
#10851 Obfuscated sensitive data in the bad API logging.
#10884 Fixed an issue where trying to get rule info or metrics could result in a crash when a node is joining a cluster.
#10887 Fixed a potential issue where requests to bridges might take a long time to be retried. This only affected low throughput scenarios, where the buffering layer could take a long time to detect connectivity and driver problems.
#10878 Fixed a vulnerability in the RabbitMQ bridge, which could potentially expose passwords to log files.
#10871 Fixed an issue where the Dashboard shows that the connection still exists after a CoAP connection is disconnected, but deletion and message posting requests do not take effect.
#10880 Added a new REST API
POST /clients/kickout/bulk
for kicking out multiple clients in bulk.#10913 Fixed an issue where the plugin status REST API of a node would still include the cluster node status after the node left the cluster.
#10923 Fixed a race-condition in channel info registration. Prior to this fix, when system is under heavy load, it might happen that a client is disconnected (or has its session expired) but still can be found in the clients page in dashboard. One of the possible reasons is a race condition fixed in this PR: the connection is killed in the middle of channel data registration.
#10930 Added a schema validation for duration data type to avoid invalid values. Before this fix, it was possible to use absurd values in the schema that would exceed the system limit, causing a crash.
#10952 Disallow enabling
fail_if_no_peer_cert
in listener SSL options ifverify = verify_none
is set. Settingfail_if_no_peer_cert = true
andverify = verify_none
caused connection errors due to incompatible options. This fix validates the options when creating or updating a listener to avoid these errors.Note: any old listener configuration with
fail_if_no_peer_cert = true
andverify = verify_none
that was previously allowed will fail to load after applying this fix and must be manually fixed.#10951 Fixed the issue in MQTT-SN gateway when the
mountpoint
did not take effect on message publishing.#10943 Deprecated UDP mcast mechanism for cluster discovery. This feature has been planed for deprecation since 5.0 mainly due to the lack of actual production use. This feature code is not yet removed in 5.1, but the document interface is demoted.
#10902 Avoid syncing cluser.hocon file from the nodes running a newer version than the self-node. During cluster rolling upgrade, if an older version node has to restart due to whatever reason, if it copies the
cluster.hocon
file from a newer version node, it may fail to start. After this fix, the older version node will not copy thecluster.hocon
file from a newer, so it will use its owncluster.hocon
file to start.#10967 Fixed error message formatting in rebalance API: previously they could be displayed as unclear dumps of internal Erlang structures. Added
wait_health_check
option to node evacuation CLI and API. This is a time interval when the node reports "unhealthy status" without beginning actual evacuation. We need this to allow a Load Balancer (if any) to remove the evacuated node from balancing and not forward (re)connecting clients to the evacuated node.#10911 The error message and log entry that appear when one tries to create a bridge with a name the exceeds 255 bytes is now easier to understand.
#10983 Fixed the issue when mqtt clients could not connect over TLS if the listener was configured to use TLS v1.3 only. The problem was that TLS connection was trying to use options incompatible with TLS v1.3.
#10977 Fixed the delay in updating subscription count metric and corrected configuration issues in Stomp gateway.
#10950 Fixed the issue where the
enable_qos
option does not take effect in the MQTT-SN gateway.#10999 Changed schema validation for Kafka fields 'Partition Count Refresh Interval' and 'Offset Commit Interval' to avoid accepting values larger then maximum allowed.
#10997 The ClickHouse bridge had a problem that could cause messages to be dropped when the ClickHouse server is closed while sending messages even when the request_ttl is set to infinity. This has been fixed by treating errors due to a closed connection as recoverable errors.
#10994 Redacted
proxy-authorization
headers as used by HTTP connector to avoid leaking secrets into log files.#10996 For any unknown HTTP/API request, the default response is a 404 error rather than the dashboard's index.html.
#11005 Fixed the issue where the
method
field cannot be correctly printed in the trace logs of AuthN HTTP.#11006 Fixed QUIC listeners's default cert file paths. Prior to this change, the default cert file paths are prefixed with environment variable
${EMQX_ETC_DIR}
which were not interpolated before used in QUIC listeners.#10998 Do not allow
batch_size
option for MongoDB bridge resource. MongoDB connector currently does not support batching, thebatch_size
config value is forced to be 1 if provided.#10955 Fixed the issue in MQTT-SN gateway where deleting Predefined Topics configuration does not work.
#11025 Fixed a
case_clause
error that could arise in race conditions in Pulsar Producer bridge.#11030 Improved error messages when a validation error occurs while using the Listeners HTTP API.
#11033 Deprecated the
mountpoint
field inAuthenticateRequest
in ExProto gateway. This field was introduced in e4.x, but in fact, in e5.0 we have providedgateway.exproto.mountpoint
for configuration, so there is no need to override it through the Authenticate request.Additionally, updates the default value of
subscriptions_max
,inflight_max
,mqueue_max
toinfinity
.#11040 Fixed a health check issue for Kafka Producer that could lead to loss of messages when the connection to Kafka's brokers were down.
#11038 Fixed a health check issue for Pulsar Producer that could lead to loss of messages when the connection to Pulsar's brokers were down.
#11042 Fixed crash on REST API
GET /listeners
when listener'smax_connections
is set to a string.#11028 Disallowed using multiple TLS versions in the listener config that include tlsv1.3 but exclude tlsv1.2. Using TLS configuration with such version gap caused connection errors. Additionally, drop and log TLS options that are incompatible with the selected TLS version(s).
Note: any old listener configuration with the version gap described above will fail to load after applying this fix and must be manually fixed.
#11031 Fixed credential validation when creating bridge and checking status for InfluxDB Bridges.
#11056 Fixed the issue where newly created listeners sometimes do not start properly. When you delete a system default listener and add a new one named 'default', it will not start correctly.
- Fixed the bug where configuration failure on certain nodes can cause Dashboard unavailability.
#11070 Fixed the problem that the
cluster.autoclean
configuration item does not take effect.#11092 and #11100 Fixed problem when replicat nodes were unable to connect to the core node due to timeout in
mria_lb:core_nodes()
call. Relevant mria pull request: https://github.com/emqx/mria/pull/143
5.0.4
Release Date: 2023-05-26
Enhancements
#10389 Unified the configuration formats for
cluster.core_nodes
andcluster.statics.seeds
. Now they both support formats in array["emqx1@127.0.0.1", "emqx2@127.0.0.1"]
and the comma-separated string"emqx1@127.0.0.1,emqx2@127.0.0.1"
.#10392 Introduced a new function to convert a formatted date to an integer timestamp: date_to_unix_ts/3.
date_to_unix_ts(TimeUnit, FormatString, InputDateTimeString)
#10426 Optimized the configuration priority mechanism to fix the issue where the configuration changes made to
etc/emqx.conf
do not take effect after restarting EMQX.More information about the new mechanism: Configure Override Rules
#10457 Deprecated the integration with StatsD.
#10458 Set the level of plugin configuration options to low, users usually manage the plugins through the dashboard, rarely modify them manually, so we lowered the level.
#10491 Renamed
etcd.ssl
toetcd.ssl_options
to keep all SSL options consistent in the configuration file.#10512 Improved the storage format of Unicode characters in data files, Now we can store Unicode characters. For example:
SELECT * FROM "t/1" WHERE clientid = "-测试专用-"
.#10568 Added
shutdown_count
printout toemqx ctl listeners
command.#10588 Increased the time precision of trace logs from second to microsecond. For example, change from
2023-05-02T08:43:50+00:00
to2023-05-02T08:43:50.237945+00:00
.#10623 Renamed
max_message_queue_len
tomax_mailbox_size
in theforce_shutdown
configuration. The old name is kept as an alias, so this change is backward compatible.#10713 Hide the
resource_option.request_timeout
of the webhook and it will use the value ofhttp
request_timeout
.#10075 Added node rebalance/node evacuation functionality. See also: EIP doc
#10378 Implemented Pulsar Producer Bridge and only producer role is supported now.
#10408 Introduced 3 built-in functions in the rule engine SQL-like language for creating values of the MongoDB date type.
#10409 #10337 Supported Protocol Buffers and Apache Avro schemas in Schema Registry.
#10425 Implemented OpenTSDB data bridge.
#10498 Implemented Oracle Database Bridge.
#10560 Added enterprise data bridge for Apache IoTDB.
#10417 Improved get config items performance by eliminating temporary references.
#10430 Simplified the configuration of the
retainer
feature. Markedflow_control
as a non-importance field.#10511 Improved the security and privacy of some resource logs by masking sensitive information in the log.
#10525 Reduced resource usage per MQTT packet handling.
#10528 Reduced memory footprint in hot code path. The hot path includes the code that is frequently executed in core functionalities such as message handling, connection management, authentication, and authorization.
#10591 #10625 Improved the configuration of the limiter.
Reduced the complexity of the limiter's configuration.
Updated the
configs/limiter
API to suit this refactor.Reduced the memory usage of the limiter configuration.
#10487 Optimized the instance of limiter for whose rate is
infinity
to reduce memory and CPU usage.#10490 Removed the default limit of connect rate which used to be
1000/s
.#10077 Added support for QUIC TLS password-protected certificate file.
Bug Fixes
#10340 Fixed the issue that could lead to crash logs being printed when stopping EMQX via
systemd
.#10369 Fixed error in
/api/v5/monitor_current
API endpoint that happens when some EMQX nodes are down.Prior to this fix, sometimes the request returned HTTP code 500 and the following message:
{"code":"INTERNAL_ERROR","message":"error, badarg, [{erlang,'++',[{error,nodedown},[{node,'emqx@10.42.0.150'}]], ...
#10407 Fixed the crash issue of the alarm system.
Leverage Mnesia dirty operations and circumvent extraneous calls to enhance 'emqx_alarm' performance.
Use 'emqx_resource_manager' for reactivating alarms that have already been triggered.
Implement the newly developed, fail-safe 'emqx_alarm' API to control the activation and deactivation of alarms, thus preventing 'emqx_resource_manager' from crashing due to alarm timeouts.
The alarm system is susceptible to crashing under these concurrent conditions:
A significant number of resources fail, such as when bridges continuously attempt to trigger alarms due to recurring errors.
The system is under an extremely high load.
#10420 Fixed HTTP path handling when composing the URL for the HTTP requests in authentication and authorization modules.
Avoid unnecessary URL normalization since we cannot assume that external servers treat original and normalized URLs equally. This led to bugs like #10411.
Fixed the issue that path segments could be HTTP encoded twice.
#10422 Fixed a bug where external plugins could not be configured via environment variables in a lone-node cluster.
#10448 Fixed a compatibility issue of limiter configuration introduced by e5.0.3 which broke the upgrade from previous versions if the
capacity
isinfinity
.In e5.0.3 we have replaced
capacity
withburst
. After this fix, acapacity = infinity
config will be automatically converted to equivalentburst = 0
.#10462 Deprecated config
broker.shared_dispatch_ack_enabled
. This was designed to avoid dispatching messages to a shared-subscription session that has the client disconnected. However, since e5.0.0, this feature is no longer helpful because the shared-subscription messages in an expired session will be redispatched to other sessions in the group. See also: https://github.com/emqx/emqx/pull/9104 .#10463 Improved bridges API error handling. If Webhook bridge URL is not valid, the bridges API will return '400' error instead of '500'.
#10484 Fixed the issue that the priority of the configuration cannot be set during the rolling upgrade. For example, when authorization is modified in e5.0.2 and then upgraded e5.0.3 through the rolling upgrade, the authorization will be restored to the default.
#10495 Added the limiter API
/configs/limiter
which was deleted by mistake back.#10500 Added several fixes, enhancements, and features in Mria:
Protect
mria:join/1,2
with a global lock to prevent conflicts between two nodes trying to join each other simultaneously Mria PRImplement new function
mria:sync_transaction/4,3,2
, which blocks the caller until a transaction is imported to the local node (if the local node is a replicant, otherwise, it behaves exactly the same asmria:transaction/3,2
) Mria PROptimize
mria:running_nodes/0
Mria PROptimize
mria:ro_transaction/2
when called on a replicant node Mria PR.
#10518 Added the following fixes and features in Mria:
#10556 Wrapped potentially sensitive data in
emqx_connector_http
ifAuthorization
headers are being passed at initialization.#10571 Stopped emitting useless crash report when EMQX stops.
#10659 Fixed the issue where EMQX cannot start when
sysmon.os.mem_check_interval
is disabled.#10717 Fixed an issue where the buffering layer processes could use a lot of CPU when inflight window is full.
#10724 A summary has been added for all endpoints in the HTTP API documentation (accessible at "http://<emqx_host_name>:18083/api-docs").
#10726 Health Check Interval and Auto Restart Interval now support the range from 1ms to 1 hour.
#10728 Fixed an issue where the rule engine was unable to access variables exported by
FOREACH
-DO
clause.Given a payload:
{"date": "2023-05-06", "array": ["a"]}
, as well as the following SQL statement:FOREACH payload.date as date, payload.array as elem DO date, elem FROM "t/#" -- {"date": "2023-05-06", "array": ["a"]}
Prior to the fix, the
date
variable exported byFOREACH
could not be accessed in theDO
clause of the above SQL, resulting in the following output for the SQL statement:[{"elem": "a","date": "undefined"}]
.#10742 Correctness check of the rules is enforced before saving the authorization file source. Previously, Saving wrong rules could lead to EMQX restart failure.
#10743 Fixed an issue where trying to get bridge info or metrics could result in a crash when a node is joining a cluster.
#10755 Fixed data bridge resource update race condition.
In the 'delete + create' process for EMQX resource updates, long bridge creation times could cause dashboard request timeouts. If a bridge resource update was initiated before completion of its creation, it led to an erroneous deletion from the runtime, despite being present in the config file.
This fix addresses the race condition in bridge resource updates, ensuring the accurate identification and addition of new resources, and maintaining consistency between runtime and configuration file statuses.
#10761 Fixed the issue where the default value of SSL certificate for Dashboard Listener was not correctly interpolated, which caused HTTPS to be inaccessible when
verify_peer
andcacertfile
were using the default configuration.#10672 Fixed the issue where the lack of a default value for
ssl_options
in listeners results in startup failure. For example, such command(EMQX_LISTENERS__WSS__DEFAULT__BIND='0.0.0.0:8089' ./bin/emqx console
) would have caused a crash before.#10738 TDEngine data bridge now supports "Supertable" and "Create Tables Automatically". Before this fix, an insert with a supertable in the template will fail, like this:
insert into ${clientid} using msg TAGS (${clientid}) values (${ts},${msg})
.
#10746 Add missing support of the event
$events/delivery_dropped
into the rule engine test APIrule_test
.#10747 Ported some time formating fixes in Rule-Engine functions from version 4.4.
#10760 Fix "internal error 500" when getting bridge statistics page while a node is joining the cluster.
#10801 Avoid double percent-decode for topic name in API
/topics/{topic}
and/topics
.#10817 Fix a config value handling for bridge resource option
auto_restart_interval
, now it can be set toinfinity
.
5.0.3
Release Date: 2023-05-08
Enhancements
#10128 Add support for OCSP stapling for SSL MQTT listeners.
#10156 Change the configuration overlay order:
If it is a new installation of EMQX,
emqx.conf
+ Environment variables overlays on top of API Updated Configs (cluster.hocon
)If EMQX is upgraded from an older version (i.e., the
cluster-override.conf
file still exists in EMQX'sdata
directory), then it’s the same as before, that iscluster-override.conf
overlays on top ofemqx.conf
+ Environment variables.Please note that
data/configs/cluster-override.conf
is considered deprecated. After upgrade, you are encouraged to updateemqx.conf
to delete configs which are overridden bycluster-override.conf
and move the configs incluster-override.conf
tocluster.hocon
. After upgrade, EMQX will continue to readlocal-override.conf
(if it exists) as before, but you are encouraged to merge the configs toemqx.conf
.#10164 Add CRL check support for TLS MQTT listeners.
#10207 Improve OpenAPI (swagger) document readability. Prior to this change, there were a few
summary
docs which are lengthy and lack of translation, now it makes use of the more conciselabel
field from schema i18n database instead.#10210 Eliminated a few harmless error level logs. Prior to this change, there might be some Mnesia callback (hook) failures occasionally occurring when stopping/restarting Mria. Now the callbacks (hooks) are unregistered prior to stop. See also Mria PR.
#10224 Add the option to customize
clusterIP
in Helm chart, so that a user may set it to a fixed IP.#10263 Add command
eval-ex
for Elixir expression evaluation.#10278 Refactor the directory structure of all gateways.
#10206 Support async query mode for all data bridges.
Prior to this change, setting the query mode of a resource such as a bridge to sync would force the buffer to call the underlying connector in a synchronous way, even if it supports async calls.
#10306 Add support for async query mode for most bridges.
This is a follow-up change after #10206. Before this change, some bridges (Cassandra, MongoDB, MySQL, Postgres, Redis, RocketMQ, TDengine) were only allowed to be created with a sync query mode. Now async mode is also supported.
#10318 Prior to this enhancement, only double quotes (") were allowed in rule engine SQL language's FROM clause. Now it also supports single quotes (').
#10336 Add
/rule_engine
API endpoint to manage configuration of rule engine.#10354 More specific error messages when configure with
bad max_heap_size
value. Log current value and the max value when themessage_queue_too_long
error is thrown.#10358 Hide
flapping_detect/conn_congestion/stats
configuration. Deprecateflapping_detect.enable
.#10359 Metrics now are not implicitly collected in places where API handlers don't make any use of them. Instead, a separate backplane RPC gathers cluster-wide metrics.
#10373 Deprecate the
trace.payload_encode
configuration. Addpayload_encode=[text,hidden,hex]
option when creating a trace via HTTP API.#10381 Hide the
auto_subscribe
configuration items so that they can be modified later only through the HTTP API.#10391 Hide a large number of advanced options to simplify the configuration file.
That includes
rewrite
,topic_metric
,persistent_session_store
,overload_protection
,flapping_detect
,conn_congestion
,stats,auto_subscribe
,broker_perf
,shared_subscription_group
,slow_subs
,ssl_options.user_lookup_fun
and some advance items innode
anddashboard
section, #10358, #10381, #10385.#10404 Change the default queue mode for buffer workers to
memory_only
. Before this change, the default queue mode wasvolatile_offload
. When under high message rate pressure and when the resource is not keeping up with such rate, the buffer performance degraded a lot due to the constant disk operations.#10140 Integrate Cassandra into bridges as a new backend. At the current stage only support Cassandra version 3.x, not yet 4.x.
#10143 Add RocketMQ data integration bridge.
#10165 Support escaped special characters in InfluxDB data bridge
write_syntax
. This update allows to use escaped special characters in string elements in accordance with InfluxDB line protocol.#10211 Hide
broker.broker_perf
config and API documents. The two configsroute_lock_type
andtrie_compaction
are rarely used and requires a full cluster restart to take effect. They are not suitable for being exposed to users. Detailed changes can be found here: https://gist.github.com/zmstone/01ad5754b9beaeaf3f5b86d14d49a0b7/revisions.#10294 When configuring a MongoDB bridge, you can now use the
${field}
syntax to reference fields in the message. This enables you to select the collection to insert data into dynamically.#10363 Implement Microsoft SQL Server bridge.
#10573 Improved performance of Webhook bridge when using synchronous query mode. This also should improve the performance of other bridges when they are configured with no batching.
Bug Fixes
#10145 Add field
status_reason
toGET /bridges/:id
response in case this bridge is in statusdisconnected
if internal health-check reports an error condition. Include this same error condition in message when creating an alarm for a failing bridge.#10172 Fix the incorrect regular expression in default ACL rule to allow specify username(dashboard) to subscribe
$SYS/#
.#10174 Upgrade library
esockd
from 5.9.4 to 5.9.6. Fix an unnecessary error level logging when a connection is closed before proxy protocol header is sent by the proxy.#10195 Add labels to API schemas where description contains raw HTML, which would break formatting of generated documentation otherwise.
#10196 Use lower-case for schema summaries and descriptions to be used in menu of generated online documentation.
#10209 Fix bug where a last will testament (LWT) message could be published when kicking out a banned client.
#10225 Allow installing a plugin if its name matches the beginning of another (already installed) plugin name. For example: if plugin
emqx_plugin_template_a
is installed, it must not block installing pluginemqx_plugin_template
.#10226 Handle validation error in
/bridges
API and return400
instead of500
.#10242 Fixed a log data field name clash. Prior to this fix, some debug logs may report a wrong Erlang PID which may affect troubleshooting session takeover issues.
#10257 Fixed the issue where
auto_observe
was not working in LwM2M Gateway.Before the fix,
OBSERVE
requests were sent without a token, causing failures that LwM2M clients could not handle.After the fix, LwM2M Gateway can correctly observe the resource list carried by client, furthermore, unknown resources will be ignored and printing the following warning log:
2023-03-28T18:50:27.771123+08:00 [warning] msg: ignore_observer_resource, mfa: emqx_lwm2m_session:observe_object_list/3, line: 522, peername: 127.0.0.1:56830, clientid: testlwm2mclient, object_id: 31024, reason: no_xml_definition
#10286 Enhance logging behaviour during boot failure. When EMQX fails to start due to corrupted configuration files, excessive logging is eliminated and no crash dump file is generated.
#10297 Keeps
eval
command backward compatible with v4 by evaluating only Erlang expressions, even on Elixir node. For Elixir expressions, useeval-ex
command.#10300 Fixed issue with Elixir builds that prevented plugins from being configured via environment variables.
#10315 Fix crash checking
limit
andpage
parameters in/mqtt/delayed/messages
API call.#10317 Do not expose listener level authentications before extensive verification.
#10323 For security reasons, the value of the password field in the API examples is replaced with
******
.#10410 Fix config check failed when gateways are configured in emqx.conf. This issue was first introduced in v5.0.22 via #10278, the boot-time config check was missing.
#10533 Fixed an issue that could cause (otherwise harmless) noise in the logs.
During some particularly slow synchronous calls to bridges, some late replies could be sent to connections processes that were no longer expecting a reply, and then emit an error log like:
2023-04-19T18:24:35.350233+00:00 [error] msg: unexpected_info, mfa: emqx_channel:handle_info/2, line: 1278, peername: 172.22.0.1:36384, clientid: caribdis_bench_sub_1137967633_4788, info: {#Ref<0.408802983.1941504010.189402>,{ok,200,[{<<"cache-control">>,<<"max-age=0, ...">>}}
Those logs are harmless, but they could flood and worry the users without need.
#10449 Validate the
ssl_options
andheader
configurations when creating authentication http (authn_http
). Prior to this, incorrectssl
configuration could result in successful creation but the entire authn being unusable.#10548 Fixed a race condition in the HTTP driver that would result in an error rather than a retry of the request. Related fix in the driver: emqx/ehttpc#45
#10201 In TDengine data bridge, removed the redundant database name from the SQL template.
#10270 ClickHouse data bridge has got a fix that makes the error message better when users click the test button in the settings dialog.
#10324 Previously, when attempting to reconnect to a misconfigured ClickHouse bridge through the dashboard, users would not receive an error message. This issue is now resolved, and error messages will now be displayed.
#10438 Fix some configuration item terminology errors in the DynamoDB data bridge:
- Changed
database
totable
- Changed
username
toaws_access_key_id
- Changed
password
toaws_secret_access_key
- Changed
5.0.2
Release Date: 2023-04-12
Enhancements
#10022 Release installation packages for Rocky Linux 9 (compatible with Red Hat Enterprise Linux 9) and macOS 12 for Intel platform.
#10139 Add
extraVolumeMounts
to EMQX Helm Chart, you can mount user's own files to EMQX instance, such as ACL rule files mentioned in #9052.#9893 When connecting with the flag
clean_start=false
, EMQX will filter out messages that published by clients banned by the blacklist feature in the session. Previously, messages sent by clients banned by the blacklist feature could still be delivered to subscribers in this case.#9986 Add MQTT ingress to helm charts and remove obsolete mgmt references.
#9564 Implement Kafka Consumer Bridge, which supports consuming messages from Kafka and publishing them to MQTT topics.
#9881 Improve error logging related to health checks for InfluxDB connections.
#9985 Implement ClickHouse Data Bridge
#10123 Improve the performance of
/bridges
API. Earlier, when the number of nodes in the cluster was large or the node was busy, the API may had a request timeout.#9998 Obfuscate request body in error log when using HTTP service for client authentication for security reasons.
#10026 Metrics are now only exposed via the
/bridges/:id/metrics
endpoint, and no longer returned in other API operations.#10052 Improve startup failure logs in daemon mode.
Bug Fixes
#10013 Fix return type structure for error case in API schema for
/gateways/:name/clients
.#10014 Ensure Monitor API
/monitor(_current)/nodes/:node
returns404
instead of400
if node does not exist.#10027 Allow setting node name via environment variable
EMQX_NODE__NAME
in Docker.#10050 Ensure Bridge API returns
404
status code consistently for resources that don't exist.#10055 The configuration parameter
mqtt.max_awaiting_rel
was not functional and has now been corrected.#10056 Fix
/bridges
API status code. Return400
instead of403
in case of removing a data bridge that is dependent on an active rule. Return400
instead of403
in case of calling operations (start|stop|restart) when Data-Bridging is not enabled.#10066 Improve error messages for
/briges_probe
and[/node/:node]/bridges/:id/:operation
API calls to make them more readable. And set HTTP status code to400
instead of500
.#10074 Check if type in
PUT /authorization/sources/:type
matchestype
given in the request body.#10079 Fix wrong description about
shared_subscription_strategy
.#10085 Consistently return
404
for all requests on non-existent source in/authorization/sources/:source[/*]
.#10098 Fix an issue where the MongoDB connector crashed when MongoDB authorization was configured.
#10100 Fix channel crash for slow clients with enhanced authentication. Previously, when the client was using enhanced authentication, but the Auth message was sent slowly or the Auth message was lost, the client process would crash.
#10107 For operations on Bridges API if
bridge-id
is unknown we now return404
instead of400
.#10117 Fix an error occurring when a joining node doesn't have plugins that are installed on other nodes in the cluster. After this fix, the joining node will copy all the necessary plugins from other nodes.
#10118 Fix problems related to manual joining of EMQX replicant nodes to the cluster.
#10119 Fix crash when
statsd.server
is set to an empty string.#10124 The default heartbeat period for MongoDB has been increased to reduce the risk of too excessive logging to the MongoDB log file.
#10130 Fix garbled config display in dashboard when the value is originally from environment variables.
#10132 Fix some error logs generated by
systemctl stop emqx
command. Prior to the fix, the command was not stoppingjq
andos_mon
applications properly.#10144 Fix an issue where emqx cli failed to set the Erlang cookie when the emqx directory was read-only.
#10154 Change the default
resume_interval
for bridges and connectors to be the minimum ofhealth_check_interval
andrequest_timeout / 3
to resolve issue of request timeout.#10157 Fix default rate limit configuration not being applied correctly when creating a new listener.
#10237 Ensure we return
404
status code for unknown node names in/nodes/:node[/metrics|/stats]
API.#10251 Fix an issue where rule dependencies were not prompted when deleting an ingress-type bridge in use.
#10313 Ensure that when the core or replicant node starting, the
cluster-override.conf
file is only copied from the core node.#10327 Don't increase “actions.failed.unknown” rule metrics counter upon receiving unrecoverable data bridge errors.
#10095 Fix an issue where when the MySQL connector was in batch mode, clients would keep querying the server with unnecessary
PREPARE
statements on each batch, possibly causing server resource exhaustion. Footer
5.0.1
Release Date: 2023-03-10
Enhancements
- #10019 Add low-level tuning settings for QUIC listeners.
- #10059 Errors returned by rule engine API are formatted in a more human-readable way rather than dumping the raw error including the stack trace.
- #9213 Add pod disruption budget to helm chart
- #9949 QUIC transport Multistreams support and QUIC TLS ca-cert support.
- #9932 Integrate
TDengine
intobridges
as a new backend. - #9967 New common TLS option 'hibernate_after' to reduce memory footprint per idle connection, default: 5s.
Bug Fixes
#10009 Validate
bytes
param toGET /trace/:name/log
to not exceed signed 32bit integer.#10015 To prevent errors caused by an incorrect EMQX node cookie provided from an environment variable, we have implemented a fail-fast mechanism. Previously, when an incorrect cookie was provided, the command would still attempt to ping the node, leading to the error message 'Node xxx not responding to pings'. With the new implementation, if a mismatched cookie is detected, a message will be logged to indicate that the cookie is incorrect, and the command will terminate with an error code of 1 without trying to ping the node.
#10020 Fix bridge metrics when running in async mode with batching enabled (
batch_size
> 1).#10021 Fix error message when the target node of
emqx_ctl cluster join
command is not running.#10032 When resources on some nodes in the cluster are still in the 'initializing/connecting' state, the
bridges/
API will crash due to missing Metrics information for those resources. This fix will ignore resources that do not have Metrics information.#10041 For InfluxDB bridge, added integer value placeholder annotation hint to
write_syntax
documentation. Also supported setting a constant value for thetimestamp
field.#10042 Improve behavior of the
replicant
nodes when thecore
cluster becomes partitioned (for example when a core node leaves the cluster). Previously, the replicant nodes were unable to rebalance connections to the core nodes, until the core cluster became whole again. This was indicated by the error messages:[error] line: 182, mfa: mria_lb:list_core_nodes/1, msg: mria_lb_core_discovery divergent cluster
.#10054 Fix the problem that the obfuscated password is used when using the
/bridges_probe
API to test the connection in Data-Bridge.#10058 Deprecate unused QUIC TLS options. Only following TLS options are kept for the QUIC listeners:
- cacertfile
- certfile
- keyfile
- verify
#10076 Fix webhook bridge error handling: connection timeout should be a retriable error. Prior to this fix, connection timeout was classified as unrecoverable error and led to the request being dropped.
#10078 Fix an issue that invalid QUIC listener setting could cause a segfault.
#10084 Fix the problem when joining core nodes running different EMQX versions into a cluster.
#10086 Upgrade HTTP client ehttpc to
0.4.7
. Prior to this upgrade, HTTP clients for authentication, authorization, and webhook may crash ifbody
is empty but content-type HTTP header is set. For more details see ehttpc PR#44.#9939 Allow 'emqx ctl cluster' command to be issued before Mnesia starts. Prior to this change, EMQX
replicant
could not usemanual
discovery strategy. Now it's possible to join cluster using 'manual' strategy.#9958 Fix the error code and error message returned by the
clients
API when the Client ID does not exist.#9961 Avoid parsing config files for node name and cookie when executing non-boot commands in bin/emqx.
#9974 Report memory usage to statsd and Prometheus using the same data source as dashboard. Prior to this fix, the memory usage data source was collected from an outdated source which did not work well in containers.
#9997 Fix Swagger API schema generation.
deprecated
metadata field is now always boolean, as Swagger specification suggests.#10007 Change Kafka bridge's config
memory_overload_protection
default value fromtrue
tofalse
. EMQX logs case when messages get dropped due to overload protection, and this is also reflected in counters. However, since there is by default no alerting based on the logs and counters, setting it totrue
may cause messages being dropped without notice. At the time being, the better option is to let sysadmin set it explicitly so they are fully aware of the benefits and risks.#10087 Use default template
${timestamp}
if thetimestamp
config is empty (undefined) when inserting data in InfluxDB. Prior to this change, InfluxDB bridge inserted a wrong timestamp when template is not provided.
5.0.0
Release Date: 2023-02-03
EMQX Enterprise 5.0 is a completely new release. In this version, we have implemented a new cluster architecture and refined the main features, as well as introduced many new features.
New Core + Replica clustering architecture
EMQX Enterprise 5.0 adopts a brand-new Core + Replica clustering architecture, offering better scalability and reliability.
- A single cluster can now support up to 23 nodes and 100+ million MQTT connections, a 10x increase compared with EMQX Enterprise 4.x.
- The stateless nature of Replicant nodes ensures a stable cluster performance during dynamic scaling.
- Reduce the risk of split-brain under large-scale deployment and minimize the impact of split-brain on business.
The EMQX Kubernetes Operator has been adapted for this new clustering architecture so that you can deploy a scalable and stateless MQTT service with EMQX effortlessly.
Support MQTT over QUIC
By adopting QUIC as a transport layer, EMQX Enterprise 5.0 now supports the MQTT over QUIC transmission protocol.
MQTT over QUIC (Quick UDP Internet Connections) is a new transport protocol that aims to provide low-latency, reliable, and secure data communication in IoT (Internet of Things) networks. As demonstrated from some performance tests, MQTT over QUIC can overcome some of the challenges of traditional networking protocols, such as slow and unreliable data transmission, high latency, and security vulnerabilities. Therefore, it is said that MQTT over QUIC has the potential to revolutionize the way how data is collected in the Internet of Vehicles (IoV) and mobile data acquisition scenarios.
With EMQX Enterprise 5.0, users can now create an MQTT over QUIC listener and use the EMQ SDK to connect to IoT devices. EMQ is submitting a draft to the MQTT protocol as a member of OASIS.
For more information, please refer to MQTT over QUIC: Next-Generation IoT Standard Protocol.
Visualized rules orchestration and bidirectional data integration
EMQX Enterprise 5.0 offers real-time processing capability for IoT data and supports integration with third-party data systems in a more flexible and low-code way.
Visualized orchestration rules with Flows editor
EMQX Enterprise 5.0 has added a visualized Flows page in EMQX Dashboard to facilitate the management of rules. In the Flows editor, users can now easily view and monitor the data filtering, processing, and bridging flow. With this Flow editor to visualize the connection between the IoT hardware and data flow, developers can now focus on works with business significance.
EMQX plans to support rules orchestration and data bridge creation by drag and drop in future releases.
More flexible bidirectional data integration
Besides bridging device data to external systems, EMQX Enterprise 5.0 also supports bridging data from external data systems to specific clients after rule processing, for example, other MQTT services and Kafka.
Bidirectional data integration is suitable for sending messages from the cloud to the device, with our support for real-time processing and delivering under large-scale messages, EMQX 5.0 offers more possibilities for IoT service application scenarios.
Disk-based buffer queue
EMQX Enterprise 5.0 also provides a buffer queue feature to better support our data bridging services. With this buffer queue feature, messages generated under abnormal connections can be cached for the moment and continue to be sent after the connection is resumed.
This buffer queue feature helps to ensure excellent reliability for data integration and greatly improves business availability.
Supported data integrations
EMQX Enterprise 5.0 currently supports integration with the following data systems:
- Webhook
- MQTT
- Kafka
- InfluxDB
- MySQL
- Redis
- GCP Pub/Sub
- MongoDB
We plan to add support to more data systems, please stay tuned.
Improved security management
More flexible access control
EMQX Enterprise 5.0 provides authentication options such as password-based authentication, LDAP, JWT, PSK, and X.509 certificates. It also provides authorization checking for message publishing and subscriptions.
Besides the configuration files, users can also configure their access control with EMQX Dashboard, a more flexible and user-friendly method. With this Dashboard configuration option, you can enable access control for EMQX clusters without rebooting.
EMQX also offers statistical metrics at both cluster and node levels to help our users better monitor access control running status, including:
- Allow: Number of authentication/authorization passed
- Deny: Number of authentication/authorization failed
- No match: Number of client authentication/authorization data not found
- Rate: Rate of request
Overload protection with Limiter
EMQX introduces the overload protection mechanism and a new Limiter feature. This Limiter delivers a more accurate and layered rate control and ensures that the system operates under the expected workload, because it supports limiting client behavior at the client, listener, or node levels.
The combination of these 2 features prevents the clients from becoming too busy or receiving excessive request traffic and ensures stable system operation.
User-friendly EMQX Dashboard with better observability
In EMQX Enterprise 5.0, we have redesigned the EMQX Dashboard with a new UI design style, enhancing the visual experience and supporting more powerful and user-friendly features. Users can manage client connections, authenticate/authorize various subscribe/publish requests, and integrate with different data systems via data bridges and rule engine with our brand-new EMQX Dashboard.
The main improvements are as follows:
- Access control management
- Introduce the Flows editor to visualize the data integration
- More powerful hot configuration
- More diagnostic tools, such as slow subscriptions and log trace
- Powerful data managing capability: users can manage retained messages or postpone the publishing schedules with Dashboard
More flexible extensions
With EMQX Enterprise 5.0, users can now compile, distribute, and install extension plugins with standalone plugin packages. You can upload these packages via the Dashboard to finish the configuration with no need to reboot the EMQX cluster.
A standard plugin will come with complete documentation and a website URL, so users can easily follow the instructions to use the plugins and communicate with the developers.
Easy-to-use ExHook/gRPC
Users can create multiple ExHooks at the same time. With the relevant metrics, users can view the detailed usage statics and hooks (and their arguments) registered under each Exhook, so they can better understand the load of the ExHook extensions.
More "native" multi-protocol connectivity
The gateway is re-implemented in a unified design layer. EMQX Enterprise 5.0 provides unique client management pages and security authentication configurations for each protocol feature, so users can manage the access in a more protocol-native manner.
As each gateway can be configured with its authentication methods, the authentication credentials of different gateway devices can now be isolated from each other to meet advanced security requirements.