Upgrade the EMQX cluster elegantly through blue-green deployment.
Task objective
How to gracefully upgrade the EMQX cluster through blue-green deployment
TIP
This feature only supports apps.emqx.io/v1beta4 EmqxEnterprise
and apps.emqx.io/v2beta1 EMQX
.
Background
In traditional EMQX cluster deployment, the default rolling upgrade strategy of StatefulSet is usually used to update EMQX Pods. However, this approach has the following two problems:
During the rolling update, both new and old Pods are selected by the corresponding Service. This may cause MQTT clients to connect to the wrong Pod, resulting in frequent disconnections and reconnections.
During the rolling update process, only N - 1 Pods can provide services because it takes some time for new Pods to start up and become ready. This may lead to a decrease in service availability.
Solution
Regarding the issue of rolling updates mentioned in the previous text, EMQX Operator provides a blue-green deployment upgrade solution. When upgrading the EMQX cluster using EMQX custom resources, EMQX Operator will create a new EMQX cluster and redirect the Kubernetes Service to the new EMQX cluster after it is ready. It will then gradually delete Pods from the old EMQX cluster to achieve the purpose of updating the EMQX cluster.
When deleting Pods from the old EMQX cluster, EMQX Operator can also take advantage of the node evacuation feature of EMQX to transfer MQTT connections to the new cluster at a desired rate, avoiding issues with a large number of connections for a period of time.
The entire upgrade process can be roughly divided into the following steps:
Create a cluster with the same specifications.
After the new cluster is ready, redirect the service to the new cluster and remove the old cluster from the service. At this time, the new cluster starts to receive traffic, and existing connections in the old cluster are not affected.
(Only supported by EMQX Enterprise Edition) Use EMQX node evacuation function to evacuate connections on each node one by one.
Gradually scale down the old cluster to 0 nodes.
Complete the upgrade.
How to update the EMQX cluster through blue-green deployment.
Configuration update strategy
Connect to EMQX cluster using MQTT X CLI.
MQTT X CLI is an open-source MQTT 5.0 CLI Client that supports automatic reconnection. It is also a pure command-line mode MQTT X. It aims to help develop and debug MQTT services and applications faster without using a graphical interface. For documentation about MQTT X CLI, please refer to: MQTTX CLI.
Execute the following command to connect to the EMQX cluster:
mqttx bench conn -h ${IP} -p ${PORT} -c 3000
Output is similar to:
[10:05:21 AM] › ℹ Start the connect benchmarking, connections: 3000, req interval: 10ms
✔ success [3000/3000] - Connected
[10:06:13 AM] › ℹ Done, total time: 31.113s
Upgrade EMQX cluster.
Any modifications made to the Pod Template will trigger the upgrade strategy of EMQX Operator.
In this article, we trigger the upgrade by modifying the Container ImagePullPolicy. Users can modify it according to their actual needs.
bash$ kubectl patch emqx emqx-ee --type=merge -p '{"spec": {"imagePullPolicy": "Never"}}' emqx.apps.emqx.io/emqx-ee patched
Check status.
bash$ kubectl get emqx emqx-ee -o json | jq ".status.nodEvacuationsStatus" [ { "connection_eviction_rate": 200, "node": "emqx-ee@emqx-ee-54fc496fb4-2.emqx-ee-headless.default.svc.cluster.local", "session_eviction_rate": 200, "session_goal": 0, "connection_goal": 22, "session_recipients": [ "emqx-ee@emqx-ee-5d87d4c6bd-2.emqx-ee-headless.default.svc.cluster.local", "emqx-ee@emqx-ee-5d87d4c6bd-1.emqx-ee-headless.default.svc.cluster.local", "emqx-ee@emqx-ee-5d87d4c6bd-0.emqx-ee-headless.default.svc.cluster.local" ], "state": "waiting_takeover", "stats": { "current_connected": 0, "current_sessions": 0, "initial_connected": 33, "initial_sessions": 0 } } ]
connection_eviction_rate
: Node evacuation rate (unit: count/second).node
: The node being evacuated currently.session_eviction_rate
: Node session evacuation rate (unit: count/second).session_recipients
: Session evacuation recipient list.state
: Node evacuation phase.stats
: Evacuation node statistical indicators, including current number of connections (current_connected), current number of sessions (current_sessions), initial number of connections (initial_connected), and initial number of sessions (initial_sessions).Waiting for the upgrade to complete.
bash$ kubectl get emqx NAME STATUS AGE emqx-ee Ready 8m33s
Please make sure that the STATUS is Running, which requires some time to wait for the EMQX cluster to complete the upgrade.
After the upgrade is completed, you can observe that the old EMQX nodes have been deleted by using the command $ kubectl get pods.
Grafana Monitoring
The monitoring graph of the number of connections during the upgrade process is shown below (using 10,000 connections as an example).
Total: Total number of connections, represented by the top line in the graph.
emqx-ee-86f864f975: This prefix represents the 3 EMQX nodes before the upgrade.
emqx-ee-648c45c747: This prefix represents the 3 EMQX nodes after the upgrade.
As shown in the figure above, we have implemented graceful upgrade in Kubernetes through EMQX Kubernetes Operator's blue-green deployment. Through this solution, the total number of connections did not have a significant shake (depending on migration rate, server reception rate, client reconnection policy, etc.) during the upgrade process, which can greatly ensure the smoothness of the upgrade process, effectively prevent server overload, reduce business perception, and improve the stability of the service.