# Node Evacuation and Cluster Load Rebalancing

MQTT is a stateful long-lived connection access protocol, which means connections will not be easily disconnected once the connection is established. Therefore, upgrading, maintaining, and scaling cluster nodes will become more challenging. EMQX provides node evacuation and cluster load rebalancing functions to facilitate users' cluster operation and maintenance.

# Node Evacuation

When you need to maintain or upgrade a node in the cluster, directly shutting down the node can result in lost connections and sessions, causing data loss. In addition, this type of operation can cause a large number of devices to go offline and reconnect for a period of time, increasing server load and potentially affecting overall business.

Therefore, EMQX provides node evacuation functionality to help migrate all connection and session data from the node to other nodes in the cluster before shutting down, reducing the impact on the overall business.

# How It Works

The node evacuation works in the following sequence:

  1. The node to be evacuated stops receiving connections.
  2. The node to be evacuated gradually disconnects its current clients at a preset rate (specified by conn-evict-rate). The disconnected clients use the reconnection mechanism to connect to other nodes (the target nodes) in the cluster. The reconnection mechanisms differ depending on the protocol versions:
    • MQTT v3.1/v3.1.1 clients: specified by load balancing strategy and the client needs to enable the reconnection mechanism;
    • MQTT v5.0 clients: specified by the redirect-to parameter.
  3. Wait for the target node to complete reconnection with the clients and take over the sessions (specified by wait-takeover).
  4. After the reconnection waiting time has elapsed, the remaining unclaimed sessions on the nodes to be evacuated will be migrated to the target node:
  • The node to which the session will be migrated is specified by migrate-to;
  • The speed of session migration is specified by sess-evict-rate.

You can stop the evacuation at any time. If the node to be evacuated closes during the evacuation, the evacuation process will be resumed after the node is restarted.

# Start and Stop Node Evacuation via CLI

You can use CLI command to start the node evacuation, get the evacuation status, and stop the node evacuation.

# Start Node Evacuation

You can use the following CLI command to start the node evacuation:

emqx_ctl rebalance start --evacuation \
    [--redirect-to "Host1:Port1 Host2:Port2 ..."] \
    [--conn-evict-rate CountPerSec] \
    [--migrate-to "node1@host1 node2@host2 ..."] \
    [--wait-takeover Secs] \
    [--sess-evict-rate CountPerSec]
Copied!
1
2
3
4
5
6
Parameter Type Description
--redirect-to String The redirected server address during reconnection, for MQTT 5.0 clients; Refer to MQTT 5.0 Specification - Server redirection (opens new window) for more details.
--conn-evict-rate Positive integer Client disconnection rate, count/second; 500 connections per second by default
--migrate-to String Space or comma-separated list of nodes to which sessions will be evacuated
--wait-takeover Positive integer Amount of time in seconds to wait before starting session evacuation; Unit: second, 60 seconds by default
--sess-evict-rate Positive integer Client evacuation rate, count/second; 500 sessions per second by default

Code Example

If you want to migrate the clients on the node emqx@127.0.0.1 to the nodes emqx2@127.0.0.1 and emqx3@127.0.0.1, you can execute the following command on the node emqx@127.0.0.1:

./bin/emqx_ctl rebalance start --evacuation \
	--wait-takeover 200 \
	--conn-evict-rate 30 \
	--sess-evict-rate 30 \
	--migrate-to "emqx2@127.0.0.1 emqx3@127.0.0.1"
Rebalance(evacuation) started
Copied!
1
2
3
4
5
6

This command will disconnect existing clients at a rate of 30 connections per second. After all connections are disconnected, it will wait for 200 seconds during which client sessions will be migrated to the reconnected nodes. Afterward, the remaining sessions will be migrated at a rate of 30 sessions per second to the emqx2@127.0.0.1 and emqx3@127.0.0.1 nodes.

# Get Evacuation Status

You can use the following CLI command to get the evacuation status:

emqx_ctl rebalance node-status
Copied!
1

Below is an example of the returned results:

./bin/emqx_ctl rebalance node-status
Rebalance type: rebalance
Rebalance state: evicting_conns
Coordinator node: 'emqx2@127.0.0.1'
Connection eviction rate: 5 connections/second
Session eviction rate: 5 sessions/second
Connection goal: 504.0
Recipient nodes: ['emqx2@127.0.0.1']
Channel statistics:
  current_connected: 960
  current_disconnected_sessions: 35
  current_sessions: 995
  initial_connected: 1000
  initial_sessions: 1000
Copied!
1
2
3
4
5
6
7
8
9
10
11
12
13
14

# Stop Node Evacuation

You can use the following CLI command to stop evacuation:

emqx_ctl rebalance stop
Copied!
1

Below is an example of the returned results:

./bin/emqx_ctl rebalance stop
Rebalance(evacuation) stopped
Copied!
1
2

# Start and Stop Node Evacuation via HTTP API

You can also use HTTP API to start the node evacuation and you need to specify the node to be evacuated in the parameters.

# Start Node Evacuation

For example, you want to evacuate the connections and sessions of the emqx1@127.0.0.1 node to the emqx2@127.0.0.1 and emqx3@127.0.0.1 nodes, disconnect existing clients and sessions at a rate of 5 connections per second, and transfer them to emqx2@127.0.0.1 and emqx3@127.0.0.1 nodes.

Code Example

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/evacuation/start' -d '{"conn_evict_rate": 5, "sess_evict_rate": 5, "migrate_to": ["emqx3@127.0.0.1", "emqx2@127.0.0.1"]}'

{"data":[],"code":0}
Copied!
1
2
3

The above statement contains the following fields:

Fields Type Required Fields Description
nodes String Yes Node name
redirect_to Positive integer Yes The redirected server address during reconnection, for MQTT 5.0 clients only;
Refer to MQTT 5.0 Specification - Server redirection (opens new window) for more details.
conn_evict_rate Positive integer Yes Client disconnection rate, count/second; 500 connections per second by default
migrate_to String Yes Space or comma-separated list of nodes to which sessions will be evacuated
wait_takeover Positive integer No Amount of time in seconds to wait before starting session evacuation; Unit: second, 60 seconds by default
sess_evict_rate Positive integer No Client evacuation rate, count/second; 500 sessions per second by default

# Stop Node Evacuation

You can stop the node evacuation using the following code example:

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/evacuation/stop'

{"data":[],"code":0}
Copied!
1
2
3

# Rebalancing

Due to the same reason as MQTT being a stateful long-lived connection protocol, connections will not be easily disconnected once the connection is established. Even after scaling up the nodes, existing connections do not automatically shift to the newly added nodes. Consequently, if there are not a significant number of new client connections, the additional nodes may remain underutilized for a long period. In such cases, you need to manually migrate connections from high-load nodes to low-load nodes to achieve cluster load balancing.

rebalancing

# How It Works

Rebalancing is a more complicated process since it involves several nodes.

You can initiate a cluster load rebalancing task on any node. EMQX will automatically calculate the necessary connection migration plan based on the current connection load of each node. It will then migrate the corresponding number of connections and sessions from high-load nodes to low-load nodes to achieve load balancing between nodes. The workflow is as follows:

  1. Calculate the migration plan and divide the nodes involved in rebalancing (specified by --nodes) into source nodes and target nodes:
    • Source nodes: High-load nodes
    • Target nodes: Low-load nodes
  2. Stop accepting new connections on the source nodes.
  3. Wait for a period of time (specified by wait-health-check) until the load balancer (LB) removes the source nodes from the active backend node list.
  4. Gradually disconnect connected clients on the source nodes until the average number of connections matches that of the target nodes.
  5. Wait for the target nodes to reconnect with clients and take over the sessions (specified by wait-takeover).
  6. After the reconnection waiting time exceeds, the source nodes will migrate the remaining unclaimed sessions to the target nodes at the rate specified by sess-evict-rate.

At this point, the load rebalancing task is completed, and the source nodes return to their normal state.

TIP

Rebalancing is ephemeral. If one of the participating nodes crashes, the whole process is aborted on all nodes.

# Start and Stop Rebalancing via CLI

You can use CLI command to start the rebalancing, get the rebalancing status, and stop the rebalancing.

# Start Rebalancing

The command for starting the rebalancing includes the following fields:

rebalance start \
    [--nodes "node1@host1 node2@host2"] \
    [--wait-health-check Secs] \
    [--conn-evict-rate ConnPerSec] \
    [--abs-conn-threshold Count] \
    [--rel-conn-threshold Fraction] \
    [--conn-evict-rate ConnPerSec] \
    [--wait-takeover Secs] \
    [--sess-evict-rate CountPerSec] \
    [--abs-sess-threshold Count] \
    [--rel-sess-threshold Fraction]
Copied!
1
2
3
4
5
6
7
8
9
10
11
Fields Type Description
--nodes String Space or comma-separated list of nodes participating in the rebalance. It may or may not include the coordinator (node on which the command is run)
--wait-health-check Positive integer Specified waiting time (in seconds, default 60 seconds) for the LB to remove the source node from the active backend node list. Once the specified waiting time is exceeded, the load rebalancing task will start.
--conn-evict-rate Positive integer Client disconnection rate on source nodes; 500 connections per second by default
--abs-conn-threshold Positive integer Absolute threshold for checking connection balance; 1000 by default
--rel-conn-threshold Number
> 1.0
Relative threshold for checking connection balance; 1.1 by default
--wait-takeover Positive integer Specified waiting time (in seconds, default 60 seconds) for clients to reconnect and take over the sessions after all connections are disconnected.
--sess-evict-rate Positive integer Session evacuation rate on source nodes; 500 sessions per second by default
--abs-sess-threshold Positive integer Absolute threshold for checking session balance; 1000 by default
--rel-sess-threshold Number
> 1.0
Relative threshold for checking session balance; 1.1 by default

Check Session Balance

Connections are considered to be balanced when the following condition holds:

avg(DonorConns) < avg(RecipientConns) + abs_conn_threshold 
OR 
avg(DonorConns) < avg(RecipientConns) * rel_conn_threshold
Copied!
1
2
3

A similar rule is applied to disconnected sessions.

Example

To achieve load rebalancing among the three nodes emqx@127.0.0.1, emqx2@127.0.0.1, and emqx3@127.0.0.1, you can use the following command:

./bin/emqx_ctl rebalance start \
	--wait-health-check 10 \
	--wait-takeover 60  \
	--conn-evict-rate 5 \
	--sess-evict-rate 5 \
	--abs-conn-threshold 30 \
	--abs-sess-threshold 30 \
	--nodes "emqx1@127.0.0.1 emqx2@127.0.0.1 emqx3@127.0.0.1"
Rebalance started
Copied!
1
2
3
4
5
6
7
8
9

# Get Rebalance Status

The CLI command for getting rebalance status is:

emqx_ctl rebalance node-status
Copied!
1

Example:

./bin/emqx_ctl rebalance node-status
Node 'emqx1@127.0.0.1': rebalance coordinator
Rebalance state: evicting_conns
Coordinator node: 'emqx1@127.0.0.1'
Donor nodes: ['emqx2@127.0.0.1','emqx3@127.0.0.1']
Recipient nodes: ['emqx1@127.0.0.1']
Connection eviction rate: 5 connections/second
Session eviction rate: 5 sessions/second
Connection goal: 0.0
Current average donor node connection count: 300.0
Copied!
1
2
3
4
5
6
7
8
9
10

# Stop Rebalancing

The CLI command to stop rebalancing is:

emqx_ctl rebalance stop
Copied!
1

Example:

./bin/emqx_ctl rebalance stop
Rebalance stopped
Copied!
1
2

# Start and Stop Rebalancing via HTTP API

All the operations available from the CLI are also available from the API. Start/stop commands require a node as a parameter.

# Start Rebalancing

The request body for rebalancing should includes the following fields:

  • nodes
  • conn_evict_rate
  • sess_evict_rate
  • wait_takeover
  • wait_health_check
  • abs_conn_threshold
  • rel_conn_threshold
  • abs_sess_threshold
  • rel_sess_threshold

The meaning of these fields are the same as those in the corresponding CLI command.

Example:

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/start' -d '{"conn_evict_rate": 5, "sess_evict_rate": 5, "nodes": ["emqx1@127.0.0.1", "emqx2@127.0.0.1"]}'

{"data":[],"code":0}
Copied!
1
2
3

# Get Node-Local Status

The code example for obtaining the rebalancing status for the requested node is as follows:

curl -s -u admin:public -H "Content-Type: application/json" -X GET 'http://127.0.0.1:8081/api/v4/load_rebalance/status'

{
  "status": "enabled",
  "stats": {
    "initial_sessions": 0,
    "initial_connected": 0,
    "current_sessions": 0,
    "current_connected": 0
  },
  "state": "waiting_takeover",
  "session_recipients": [
    "emqx3@127.0.0.1",
    "emqx2@127.0.0.1"
  ],
  "session_goal": 0,
  "session_eviction_rate": 5,
  "process": "evacuation",
  "connection_goal": 0,
  "connection_eviction_rate": 5
}
Copied!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

# Get Cluster-Wide Status

The code example for obtaining the rebalancing status for a cluster is as follows:

curl -s -u admin:public -H "Content-Type: application/json" -X GET 'http://127.0.0.1:8081/api/v4/load_rebalance/global_status'
{
  "rebalances": [],
  "evacuations": [
    {
      "node": "emqx1@127.0.0.1",
      "stats": {
        "initial_sessions": 0,
        "initial_connected": 0,
        "current_sessions": 0,
        "current_connected": 0
      },
      "state": "waiting_takeover",
      "session_recipients": [
        "emqx3@127.0.0.1",
        "emqx2@127.0.0.1"
      ],
      "session_goal": 0,
      "session_eviction_rate": 5,
      "connection_goal": 0,
      "connection_eviction_rate": 5
    }
  ]
}
Copied!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

# Stop Rebalancing

The code example for stopping rebalancing is as follows:

curl -v -u admin:public -H "Content-Type: application/json" -X POST 'http://127.0.0.1:8081/api/v4/load_rebalance/emqx1@127.0.0.1/stop'

{"data":[],"code":0}
Copied!
1
2
3

# Load Balancer Integration

During evacuation/rebalance, it is up to the user to provide the necessary configuration for the load balancer (if any). This configuration should help disconnected clients to be directed to the recipient nodes when they reconnect. Without such a configuration, there may be an excess number of disconnections.

To help create that configuration, EMQX provides health check endpoints:

GET /api/v4/load_rebalance/availability_check
Copied!
1

They respond with 503 HTTP code for the donor or evacuated nodes and 200 HTTP code for nodes operating normally and receiving connections.

For example, the described configuration for Haproxy and a 3-node cluster could look like this:

defaults
  timeout connect 5s
  timeout client 60m
  timeout server 60m

listen mqtt
  bind *:1883
  mode tcp
  maxconn 50000
  timeout client 6000s
  default_backend emqx_cluster

backend emqx_cluster
  mode tcp
  balance leastconn
  option httpchk
  http-check send meth GET uri /api/v4/load_rebalance/availability_check hdr Authorization "Basic YWRtaW46cHVibGlj"
  server emqx1 127.0.0.1:3001 check port 5001 inter 1000 fall 2 rise 5 weight 1 maxconn 1000
  server emqx2 127.0.0.1:3002 check port 5002 inter 1000 fall 2 rise 5 weight 1 maxconn 1000
  server emqx3 127.0.0.1:3003 check port 5003 inter 1000 fall 2 rise 5 weight 1 maxconn 1000
Copied!
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20

Here we have 3 nodes with MQTT listeners on ports 3001, 3002, and 3003 and HTTP listeners on ports 5001, 5002, and 5003, respectively.