Skip to content

Perform Blue-Green Upgrade of EMQX Cluster

Objective

Perform a graceful upgrade of the EMQX cluster through blue-green deployment.

Background

In traditional EMQX cluster deployment, StatefulSet's default rolling upgrade strategy is usually used to update EMQX Pods. However, this approach has the following two problems:

  • During the rolling update, both new and old Pods are selected by the corresponding Service. This may cause MQTT clients to connect to old Pods that are being terminated, resulting in frequent disconnections and reconnections.
  • During the rolling update process, only N - 1 Pods can provide services at any given time because it takes some time for new Pods to start up and become ready. This may lead to a decrease in service availability.
null

Solution

EMQX Operator performs blue-green deployment by default. When an EMQX cluster is updated through the corresponding EMQX CR, EMQX Operator initiates an upgrade.

The entire upgrade process is roughly divided into the following steps:

  1. Create a set of new EMQX nodes with updated specifications.
  2. Redirect the Service resources to the new set of nodes once they are ready, ensuring that no new connections are routed to the old set.
  3. Safely migrate existing MQTT connections from the old set of nodes to the new set of nodes at a controlled rate to avoid reconnect storms.
  4. Gradually scale down the old set of EMQX nodes.
  5. Complete the upgrade.
null

Procedure

Configure the Update Strategy

  1. Create an apps.emqx.io/v2beta1 EMQX CR and configure the update strategy.
yaml
apiVersion: apps.emqx.io/v2beta1
kind: EMQX
metadata:
  name: emqx-ee
spec:
  image: emqx/emqx:6.0.1
  config:
    data: |
      license {
        key = "..."
      }
  updateStrategy:
    evacuationStrategy:
      # MQTT client evacuation rate, connections per second:
      connEvictRate: 1000
      # MQTT Session evacuation rate, sessions per second:
      sessEvictRate: 1000
      # Time to wait before deleting a Pod:
      waitTakeover: 10
    # Time to wait before starting the upgrade once all nodes are ready:
    initialDelaySeconds: 10
    type: Recreate
  1. Save the above content as emqx-update.yaml and deploy it using kubectl apply:
bash
$ kubectl apply -f emqx-update.yaml
emqx.apps.emqx.io/emqx-ee created
  1. Check the status of the EMQX cluster.

Make sure that STATUS is Ready. This may take a while.

bash
$ kubectl get emqx
NAME      STATUS   AGE
emqx-ee   Ready    8m33s

Connect to EMQX Cluster

MQTTX is an open-source MQTT 5.0 compatible command line client tool that supports automatic reconnection, designed to help in development and debugging of MQTT services and applications.

Use MQTTX to connect to the EMQX cluster:

bash
mqttx bench conn -h ${IP} -p ${PORT} -c 3000
[10:05:21 AM] › ℹ  Start the connect benchmarking, connections: 3000, req interval: 10ms
  success   [3000/3000] - Connected
[10:06:13 AM] › ℹ  Done, total time: 31.113s

Trigger the upgrade

  1. Any modifications made to the Pod template will trigger the upgrade strategy of EMQX Operator.

In this example, we trigger the upgrade by modifying the Pod's ImagePullPolicy.

bash
$ kubectl patch emqx emqx-ee --type=merge -p '{"spec": {"imagePullPolicy": "Never"}}'
emqx.apps.emqx.io/emqx-ee patched
  1. Check the status of the upgrade process.
bash
$ kubectl get emqx emqx-ee -o json | jq ".status.nodeEvacuationsStatus"
[
  {
    "connection_eviction_rate": 200,
    "node": "emqx-ee@emqx-ee-54fc496fb4-2.emqx-ee-headless.default.svc.cluster.local",
    "session_eviction_rate": 200,
    "session_goal": 0,
    "connection_goal": 22,
    "session_recipients": [
      "emqx-ee@emqx-ee-5d87d4c6bd-2.emqx-ee-headless.default.svc.cluster.local",
      "emqx-ee@emqx-ee-5d87d4c6bd-1.emqx-ee-headless.default.svc.cluster.local",
      "emqx-ee@emqx-ee-5d87d4c6bd-0.emqx-ee-headless.default.svc.cluster.local"
    ],
    "state": "waiting_takeover",
    "stats": {
      "current_connected": 0,
      "current_sessions": 0,
      "initial_connected": 33,
      "initial_sessions": 0
    }
  }
]
FieldDescription
nodeThe node currently being evacuated.
stateNode evacuation phase.
session_recipientsMQTT session recipients.
session_eviction_rateMQTT session eviction rate on this node (sessions per second).
connection_eviction_rateMQTT connection eviction rate on this node (connections per second).
initial_sessionsInitial number of sessions on this node.
initial_connectedInitial number of connections on this node.
current_sessionsCurrent number of sessions on this node.
current_connectedCurrent number of connections on this node.
  1. Wait for the upgrade to complete.
bash
$ kubectl get emqx
NAME      STATUS   AGE
emqx-ee   Ready    8m33s

Make sure that the STATUS is Ready. Depending on the number of MQTT clients and sessions, the upgrade process may take a while.

After the upgrade is completed, you can verify that the old EMQX nodes have been deleted using kubectl get pods.

Grafana Monitoring

The following monitoring graph shows the number of connections during the upgrade process, using 10,000 connections as an example.

Label/PrefixDescription
TotalTotal number of connections; shown as the top line in the graph.
emqx-ee-86f864f975Name prefix for the set of 3 old EMQX nodes.
emqx-ee-648c45c747Name prefix for the set of 3 upgraded EMQX nodes.

This timeline illustrates how EMQX Operator performs a smooth blue-green upgrade. Throughout the process, the total number of connections remained stable (subject to factors such as migration rate, server capacity, and client reconnection strategy). This approach ensures minimal disruption, prevents server overload, and enhances overall service stability.