Optimize leader transfer in rolling updates¶
NebulaGraph clusters use a distributed architecture to divide data into multiple logical partitions, which are typically evenly distributed across different nodes. In distributed systems, there are usually multiple replicas of the same data. To ensure the consistency of data across multiple replicas, NebulaGraph clusters use the Raft protocol to synchronize multiple partition replicas. In the Raft protocol, each partition elects a leader replica, which is responsible for handling write requests, while follower replicas handle read requests.
When a NebulaGraph cluster created by NebulaGraph Operator performs a rolling update, a storage node temporarily stops providing services for the update. For an overview of rolling updates, see Performing a Rolling Update. If the node hosting the leader replica stops providing services, it will result in the unavailability of read and write operations for that partition. To avoid this situation, by default, Operator migrates the leader replicas to other unaffected nodes during the rolling update process of a NebulaGraph cluster. This way, when a storage node is being updated, the leader replicas on other nodes can continue processing client requests, ensuring the read and write availability of the cluster.
The process of migrating all leader replicas from one storage node to the other nodes may take a long time. To better control the rolling update duration, Operator provides a field called
enableForceUpdate. When it is confirmed that there is no external access traffic, you can set this field to
true. This way, the leader replicas will not be migrated to other nodes, thereby speeding up the rolling update process.
Rolling update trigger conditions¶
Operator triggers a rolling update of the NebulaGraph cluster under the following circumstances:
- The version of the NebulaGraph cluster changes.
- The configuration of the NebulaGraph cluster changes.
- NebulaGraph cluster services are restarted.
Specify a rolling update strategy¶
In the YAML file for creating a cluster instance, add the
spec.storaged.enableForceUpdate field and set it to
false to control the rolling update speed.
enableForceUpdate is set to
true, it means that the partition leader replicas will not be migrated, thus speeding up the rolling update process. Conversely, when set to
false, it means that the leader replicas will be migrated to other nodes to ensure the read and write availability of the cluster. The default value is
true, make sure there is no traffic entering the cluster for read and write operations. This is because this setting will force the cluster pods to be rebuilt, and during this process, data loss or client request failures may occur.
... spec: ... storaged: # When set to true, # it means that the partition leader replicas will not be migrated, # but the cluster pods will be rebuilt directly. enableForceUpdate: true ...