Node Evacuation and Cluster Load Rebalancing

MQTT is a stateful long-lived connection access protocol, which means connections will not be easily disconnected once the connection is established. Therefore, upgrading, maintaining, and scaling cluster nodes will become more challenging. EMQX provides node evacuation and cluster load rebalancing functions to facilitate users' cluster operation and maintenance.

Node Evacuation

When you need to maintain or upgrade a node in the cluster, directly shutting down the node can result in lost connections and sessions, causing data loss. In addition, this type of operation can cause a large number of devices to go offline and reconnect for a period of time, increasing server load and potentially affecting overall business.

Therefore, EMQX provides node evacuation functionality to help migrate all connection and session data from the node to other nodes in the cluster before shutting down, reducing the impact on the overall business.

How It Works

The node evacuation works in the following sequence:

The node to be evacuated stops receiving connections.
The node to be evacuated gradually disconnects its current clients at a preset rate (specified by conn-evict-rate). The disconnected clients use the reconnection mechanism to connect to other nodes (the target nodes) in the cluster. The reconnection mechanisms differ depending on the protocol versions:
- MQTT v3.1/v3.1.1 clients: specified by load balancing strategy and the client needs to enable the reconnection mechanism;
- MQTT v5.0 clients: specified by the redirect-to parameter.
Wait for the target node to complete reconnection with the clients and take over the sessions (specified by wait-takeover).
After the reconnection waiting time has elapsed, the remaining unclaimed sessions on the nodes to be evacuated will be migrated to the target node:

The node to which the session will be migrated is specified by migrate-to;
The speed of session migration is specified by sess-evict-rate.

You can stop the evacuation at any time. If the node to be evacuated closes during the evacuation, the evacuation process will be resumed after the node is restarted.

Start and Stop Node Evacuation via CLI

You can use CLI command to start the node evacuation, get the evacuation status, and stop the node evacuation.

Start Node Evacuation

You can use the following CLI command to start the node evacuation:

emqx_ctl rebalance start --evacuation \
    [--redirect-to "Host1:Port1 Host2:Port2 ..."] \
    [--conn-evict-rate CountPerSec] \
    [--migrate-to "node1@host1 node2@host2 ..."] \
    [--wait-takeover Secs] \
    [--sess-evict-rate CountPerSec]

Parameter	Type	Description
`--redirect-to`	String	The redirected server address during reconnection, for MQTT 5.0 clients; Refer to MQTT 5.0 Specification - Server redirection for more details.
`--conn-evict-rate`	Positive integer	Client disconnection rate, count/second; 500 connections per second by default
`--migrate-to`	String	Space or comma-separated list of nodes to which sessions will be evacuated
`--wait-takeover`	Positive integer	Amount of time in seconds to wait before starting session evacuation; Unit: second, 60 seconds by default
`--sess-evict-rate`	Positive integer	Client evacuation rate, count/second; 500 sessions per second by default

Code Example

If you want to migrate the clients on the node emqx@127.0.0.1 to the nodes emqx2@127.0.0.1 and emqx3@127.0.0.1, you can execute the following command on the node emqx@127.0.0.1:

bash

./bin/emqx_ctl rebalance start --evacuation \
	--wait-takeover 200 \
	--conn-evict-rate 30 \
	--sess-evict-rate 30 \
	--migrate-to "emqx2@127.0.0.1 emqx3@127.0.0.1"
Rebalance(evacuation) started

This command will disconnect existing clients at a rate of 30 connections per second. After all connections are disconnected, it will wait for 200 seconds during which client sessions will be migrated to the reconnected nodes. Afterward, the remaining sessions will be migrated at a rate of 30 sessions per second to the emqx2@127.0.0.1 and emqx3@127.0.0.1 nodes.

Get Evacuation Status

You can use the following CLI command to get the evacuation status:

emqx_ctl rebalance node-status

Below is an example of the returned results:

bash

./bin/emqx_ctl rebalance node-status
Rebalance type: rebalance
Rebalance state: evicting_conns
Coordinator node: 'emqx2@127.0.0.1'
Connection eviction rate: 5 connections/second
Session eviction rate: 5 sessions/second
Connection goal: 504.0
Recipient nodes: ['emqx2@127.0.0.1']
Channel statistics:
  current_connected: 960
  current_disconnected_sessions: 35
  current_sessions: 995
  initial_connected: 1000
  initial_sessions: 1000

Stop Node Evacuation

You can use the following CLI command to stop evacuation:

emqx_ctl rebalance stop

Below is an example of the returned results:

./bin/emqx_ctl rebalance stop
Rebalance(evacuation) stopped

Start and Stop Node Evacuation via HTTP API

You can also use HTTP API to start the node evacuation and you need to specify the node to be evacuated in the parameters.

Start Node Evacuation

For example, you want to evacuate the connections and sessions of the emqx1@127.0.0.1 node to the emqx2@127.0.0.1 and emqx3@127.0.0.1 nodes, disconnect existing clients and sessions at a rate of 5 connections per second, and transfer them to emqx2@127.0.0.1 and emqx3@127.0.0.1 nodes.

Code Example

bash

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/evacuation/start' -d '{"conn_evict_rate": 5, "sess_evict_rate": 5, "migrate_to": ["emqx3@127.0.0.1", "emqx2@127.0.0.1"]}'

{"data":[],"code":0}

The above statement contains the following fields:

Fields	Type	Required Fields	Description
`nodes`	String	Yes	Node name
`redirect_to`	Positive integer	Yes	The redirected server address during reconnection, for MQTT 5.0 clients only; Refer to MQTT 5.0 Specification - Server redirection for more details.
`conn_evict_rate`	Positive integer	Yes	Client disconnection rate, count/second; 500 connections per second by default
`migrate_to`	String	Yes	Space or comma-separated list of nodes to which sessions will be evacuated
`wait_takeover`	Positive integer	No	Amount of time in seconds to wait before starting session evacuation; Unit: second, 60 seconds by default
`sess_evict_rate`	Positive integer	No	Client evacuation rate, count/second; 500 sessions per second by default

Stop Node Evacuation

You can stop the node evacuation using the following code example:

bash

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/evacuation/stop'

{"data":[],"code":0}

Rebalancing

Due to the same reason as MQTT being a stateful long-lived connection protocol, connections will not be easily disconnected once the connection is established. Even after scaling up the nodes, existing connections do not automatically shift to the newly added nodes. Consequently, if there are not a significant number of new client connections, the additional nodes may remain underutilized for a long period. In such cases, you need to manually migrate connections from high-load nodes to low-load nodes to achieve cluster load balancing.

How It Works

Rebalancing is a more complicated process since it involves several nodes.

You can initiate a cluster load rebalancing task on any node. EMQX will automatically calculate the necessary connection migration plan based on the current connection load of each node. It will then migrate the corresponding number of connections and sessions from high-load nodes to low-load nodes to achieve load balancing between nodes. The workflow is as follows:

Calculate the migration plan and divide the nodes involved in rebalancing (specified by --nodes) into source nodes and target nodes:
- Source nodes: High-load nodes
- Target nodes: Low-load nodes
Stop accepting new connections on the source nodes.
Wait for a period of time (specified by wait-health-check) until the load balancer (LB) removes the source nodes from the active backend node list.
Gradually disconnect connected clients on the source nodes until the average number of connections matches that of the target nodes.
Wait for the target nodes to reconnect with clients and take over the sessions (specified by wait-takeover).
After the reconnection waiting time exceeds, the source nodes will migrate the remaining unclaimed sessions to the target nodes at the rate specified by sess-evict-rate.

At this point, the load rebalancing task is completed, and the source nodes return to their normal state.

TIP

Rebalancing is ephemeral. If one of the participating nodes crashes, the whole process is aborted on all nodes.

Start and Stop Rebalancing via CLI

You can use CLI command to start the rebalancing, get the rebalancing status, and stop the rebalancing.

Start Rebalancing

The command for starting the rebalancing includes the following fields:

bash

rebalance start \
    [--nodes "node1@host1 node2@host2"] \
    [--wait-health-check Secs] \
    [--conn-evict-rate ConnPerSec] \
    [--abs-conn-threshold Count] \
    [--rel-conn-threshold Fraction] \
    [--conn-evict-rate ConnPerSec] \
    [--wait-takeover Secs] \
    [--sess-evict-rate CountPerSec] \
    [--abs-sess-threshold Count] \
    [--rel-sess-threshold Fraction]

Fields	Type	Description
`--nodes`	String	Space or comma-separated list of nodes participating in the rebalance. It may or may not include the coordinator (node on which the command is run)
`--wait-health-check`	Positive integer	Specified waiting time (in seconds, default 60 seconds) for the LB to remove the source node from the active backend node list. Once the specified waiting time is exceeded, the load rebalancing task will start.
`--conn-evict-rate`	Positive integer	Client disconnection rate on source nodes; 500 connections per second by default
`--abs-conn-threshold`	Positive integer	Absolute threshold for checking connection balance; 1000 by default
`--rel-conn-threshold`	Number > 1.0	Relative threshold for checking connection balance; 1.1 by default
`--wait-takeover`	Positive integer	Specified waiting time (in seconds, default 60 seconds) for clients to reconnect and take over the sessions after all connections are disconnected.
`--sess-evict-rate`	Positive integer	Session evacuation rate on source nodes; 500 sessions per second by default
`--abs-sess-threshold`	Positive integer	Absolute threshold for checking session balance; 1000 by default
`--rel-sess-threshold`	Number > 1.0	Relative threshold for checking session balance; 1.1 by default

Check Session Balance

Connections are considered to be balanced when the following condition holds:

bash

avg(DonorConns) < avg(RecipientConns) + abs_conn_threshold 
OR 
avg(DonorConns) < avg(RecipientConns) * rel_conn_threshold

A similar rule is applied to disconnected sessions.

Example

To achieve load rebalancing among the three nodes emqx@127.0.0.1, emqx2@127.0.0.1, and emqx3@127.0.0.1, you can use the following command:

bash

./bin/emqx_ctl rebalance start \
	--wait-health-check 10 \
	--wait-takeover 60  \
	--conn-evict-rate 5 \
	--sess-evict-rate 5 \
	--abs-conn-threshold 30 \
	--abs-sess-threshold 30 \
	--nodes "emqx1@127.0.0.1 emqx2@127.0.0.1 emqx3@127.0.0.1"
Rebalance started

Get Rebalance Status

The CLI command for getting rebalance status is:

emqx_ctl rebalance node-status

Example:

bash

./bin/emqx_ctl rebalance node-status
Node 'emqx1@127.0.0.1': rebalance coordinator
Rebalance state: evicting_conns
Coordinator node: 'emqx1@127.0.0.1'
Donor nodes: ['emqx2@127.0.0.1','emqx3@127.0.0.1']
Recipient nodes: ['emqx1@127.0.0.1']
Connection eviction rate: 5 connections/second
Session eviction rate: 5 sessions/second
Connection goal: 0.0
Current average donor node connection count: 300.0

Stop Rebalancing

The CLI command to stop rebalancing is:

bash

emqx_ctl rebalance stop

Example:

bash

./bin/emqx_ctl rebalance stop
Rebalance stopped

Start and Stop Rebalancing via HTTP API

All the operations available from the CLI are also available from the API. Start/stop commands require a node as a parameter.

Start Rebalancing

The request body for rebalancing should includes the following fields:

nodes
conn_evict_rate
sess_evict_rate
wait_takeover
wait_health_check
abs_conn_threshold
rel_conn_threshold
abs_sess_threshold
rel_sess_threshold

The meaning of these fields are the same as those in the corresponding CLI command.

Example:

bash

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/start' -d '{"conn_evict_rate": 5, "sess_evict_rate": 5, "nodes": ["emqx1@127.0.0.1", "emqx2@127.0.0.1"]}'

{"data":[],"code":0}

Get Node-Local Status

The code example for obtaining the rebalancing status for the requested node is as follows:

bash

curl -s -u admin:public -H "Content-Type: application/json" -X GET 'http://127.0.0.1:8081/api/v4/load_rebalance/status'

{
  "status": "enabled",
  "stats": {
    "initial_sessions": 0,
    "initial_connected": 0,
    "current_sessions": 0,
    "current_connected": 0
  },
  "state": "waiting_takeover",
  "session_recipients": [
    "emqx3@127.0.0.1",
    "emqx2@127.0.0.1"
  ],
  "session_goal": 0,
  "session_eviction_rate": 5,
  "process": "evacuation",
  "connection_goal": 0,
  "connection_eviction_rate": 5
}

Get Cluster-Wide Status

The code example for obtaining the rebalancing status for a cluster is as follows:

bash

curl -s -u admin:public -H "Content-Type: application/json" -X GET 'http://127.0.0.1:8081/api/v4/load_rebalance/global_status'
{
  "rebalances": [],
  "evacuations": [
    {
      "node": "emqx1@127.0.0.1",
      "stats": {
        "initial_sessions": 0,
        "initial_connected": 0,
        "current_sessions": 0,
        "current_connected": 0
      },
      "state": "waiting_takeover",
      "session_recipients": [
        "emqx3@127.0.0.1",
        "emqx2@127.0.0.1"
      ],
      "session_goal": 0,
      "session_eviction_rate": 5,
      "connection_goal": 0,
      "connection_eviction_rate": 5
    }
  ]
}

Stop Rebalancing

The code example for stopping rebalancing is as follows:

bash

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/stop'

{"data":[],"code":0}

Load Balancer Integration

During evacuation/rebalance, it is up to the user to provide the necessary configuration for the load balancer (if any). This configuration should help disconnected clients to be directed to the recipient nodes when they reconnect. Without such a configuration, there may be an excess number of disconnections.

To help create that configuration, EMQX provides health check endpoints:

GET /api/v4/load_rebalance/availability_check

They respond with 503 HTTP code for the donor or evacuated nodes and 200 HTTP code for nodes operating normally and receiving connections.

For example, the described configuration for Haproxy and a 3-node cluster could look like this:

bash

defaults
  timeout connect 5s
  timeout client 60m
  timeout server 60m

listen mqtt
  bind *:1883
  mode tcp
  maxconn 50000
  timeout client 6000s
  default_backend emqx_cluster

backend emqx_cluster
  mode tcp
  balance leastconn
  option httpchk
  http-check send meth GET uri /api/v4/load_rebalance/availability_check hdr Authorization "Basic YWRtaW46cHVibGlj"
  server emqx1 127.0.0.1:3001 check port 5001 inter 1000 fall 2 rise 5 weight 1 maxconn 1000
  server emqx2 127.0.0.1:3002 check port 5002 inter 1000 fall 2 rise 5 weight 1 maxconn 1000
  server emqx3 127.0.0.1:3003 check port 5003 inter 1000 fall 2 rise 5 weight 1 maxconn 1000

Here we have 3 nodes with MQTT listeners on ports 3001, 3002, and 3003 and HTTP listeners on ports 5001, 5002, and 5003, respectively.

Schema Codecs

Bridges

Node Evacuation and Cluster Load Rebalancing

Node Evacuation

How It Works

Start and Stop Node Evacuation via CLI

Start Node Evacuation

Get Evacuation Status

Stop Node Evacuation

Start and Stop Node Evacuation via HTTP API

Start Node Evacuation

Stop Node Evacuation

Rebalancing

How It Works

Start and Stop Rebalancing via CLI

Start Rebalancing

Get Rebalance Status

Stop Rebalancing

Start and Stop Rebalancing via HTTP API

Start Rebalancing

Get Node-Local Status

Get Cluster-Wide Status

Stop Rebalancing

Load Balancer Integration

Node Evacuation and Cluster Load Rebalancing ​

Node Evacuation ​

How It Works ​

Start and Stop Node Evacuation via CLI ​

Start Node Evacuation ​

Get Evacuation Status ​

Stop Node Evacuation ​

Start and Stop Node Evacuation via HTTP API ​

Start Node Evacuation ​

Stop Node Evacuation ​

Rebalancing ​

How It Works ​

Start and Stop Rebalancing via CLI ​

Start Rebalancing ​

Get Rebalance Status ​

Stop Rebalancing ​

Start and Stop Rebalancing via HTTP API ​

Start Rebalancing ​

Get Node-Local Status ​

Get Cluster-Wide Status ​

Stop Rebalancing ​

Load Balancer Integration ​

Node Evacuation and Cluster Load Rebalancing

Node Evacuation

How It Works

Start and Stop Node Evacuation via CLI

Start Node Evacuation

Get Evacuation Status

Stop Node Evacuation

Start and Stop Node Evacuation via HTTP API

Start Node Evacuation

Stop Node Evacuation

Rebalancing

How It Works

Start and Stop Rebalancing via CLI

Start Rebalancing

Get Rebalance Status

Stop Rebalancing

Start and Stop Rebalancing via HTTP API

Start Rebalancing

Get Node-Local Status

Get Cluster-Wide Status

Stop Rebalancing

Load Balancer Integration