Restart service Pods in a NebulaGraph cluster on K8s¶
Note
Restarting NebulaGraph cluster service Pods is a feature in the Alpha version.
During routine maintenance, it might be necessary to restart a specific service Pod in the NebulaGraph cluster, for instance, when the Pod's status is abnormal or to enforce a restart. Restarting a Pod essentially means restarting the service process. To ensure high availability, NebulaGraph Operator supports gracefully restarting all Pods of the Graph, Meta, or Storage service respectively and gracefully restarting an individual Pod of the Storage service.
Prerequisites¶
A NebulaGraph cluster is created in a K8s environment. For details, see Create a NebulaGraph cluster.
Restart all Pods of a certain service type¶
To gracefully roll restart all Pods of a certain service type in the cluster, you can add an annotation (nebula-graph.io/restart-timestamp
) with the current time to the configuration of the StatefulSet controller of the corresponding service.
When NebulaGraph Operator detects that the StatefulSet controller of the corresponding service has the annotation nebula-graph.io/restart-timestamp
and its value is changed, it triggers the graceful rolling restart operation for all Pods of that service type in the cluster.
In the following example, the annotation is added for all Graph services so that all Pods of these Graph services are restarted one by one.
Assume that the cluster name is nebula
and the cluster resources are in the default
namespace. Run the following command:
-
Check the name of the StatefulSet controller.
kubectl get statefulset
Sample output:
NAME READY AGE nebula-graphd 2/2 33s nebula-metad 3/3 69s nebula-storaged 3/3 69s
-
Get the current timestamp.
date -u +%s
Example output:
1700547115
-
Overwrite the timestamp annotation of the StatefulSet controller to trigger the graceful rolling restart operation.
kubectl annotate statefulset nebula-graphd nebula-graph.io/restart-timestamp="1700547115" --overwrite
Example output:
statefulset.apps/nebula-graphd annotate
-
Observe the restart process.
kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=graphd -w
Example output:
NAME READY STATUS RESTARTS AGE nebula-graphd-0 1/1 Running 0 9m37s nebula-graphd-1 0/1 Running 0 17s nebula-graphd-1 1/1 Running 0 20s nebula-graphd-0 1/1 Terminating 0 9m40s nebula-graphd-0 0/1 Terminating 0 9m41s nebula-graphd-0 0/1 Terminating 0 9m42s nebula-graphd-0 0/1 Terminating 0 9m42s nebula-graphd-0 0/1 Terminating 0 9m42s nebula-graphd-0 0/1 Pending 0 0s nebula-graphd-0 0/1 Pending 0 0s nebula-graphd-0 0/1 ContainerCreating 0 0s nebula-graphd-0 0/1 Running 0 2s
This above output shows the status of Graph service Pods during the restart process.
-
Verify that the StatefulSet controller annotation is updated.
kubectl get statefulset nebula-graphd -o yaml | grep "nebula-graph.io/restart-timestamp"
Example output:
nebula-graph.io/last-applied-configuration: '{"persistentVolumeClaimRetentionPolicy":{"whenDeleted":"Retain","whenScaled":"Retain"},"podManagementPolicy":"Parallel","replicas":2,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"serviceName":"nebula-graphd-headless","template":{"metadata":{"annotations":{"nebula-graph.io/cm-hash":"7c55c0e5ac74e85f","nebula-graph.io/restart-timestamp":"1700547815"},"creationTimestamp":null,"labels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"spec":{"containers":[{"command":["/bin/sh","-ecx","exec nebula-graph.io/restart-timestamp: "1700547115" nebula-graph.io/restart-timestamp: "1700547815"
The above output indicates that the annotation of the StatefulSet controller is updated, and all graph service Pods are restarted.
Restart a single Storage service Pod¶
To gracefully roll restart a single Storage service Pod, you can add an annotation (nebula-graph.io/restart-ordinal
) with the value set to the ordinal number of the Storage service Pod you want to restart. This triggers a graceful restart or state transition for that specific Storage service Pod. The added annotation will be automatically removed after the Storage service Pod is restarted.
In the following example, the annotation is added for the Pod with ordinal number 1
, indicating a graceful restart for the nebula-storaged-1
Storage service Pod.
Assume that the cluster name is nebula
, and the cluster resources are in the default
namespace. Run the following commands:
-
Check the name of the StatefulSet controller.
kubectl get statefulset
Example output:
NAME READY AGE nebula-graphd 2/2 33s nebula-metad 3/3 69s nebula-storaged 3/3 69s
-
Get the ordinal number of the Storage service Pod.
kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged
Example output:
NAME READY STATUS RESTARTS AGE nebula-storaged-0 1/1 Running 0 13h nebula-storaged-1 1/1 Running 0 13h nebula-storaged-2 1/1 Running 0 13h nebula-storaged-3 1/1 Running 0 13h nebula-storaged-4 1/1 Running 0 13h nebula-storaged-5 1/1 Running 0 13h nebula-storaged-6 1/1 Running 0 13h nebula-storaged-7 1/1 Running 0 13h nebula-storaged-8 1/1 Running 0 13h
-
Add the annotation for the
nebula-storaged-1
Pod to trigger a graceful restart for that specific Pod.kubectl annotate statefulset nebula-storaged nebula-graph.io/restart-ordinal="1"
Example output:
statefulset.apps/nebula-storaged annotate
-
Observe the restart process.
kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged -w
Example output:
NAME READY STATUS RESTARTS AGE nebula-storaged-0 1/1 Running 0 13h nebula-storaged-1 1/1 Running 0 13h nebula-storaged-2 1/1 Running 0 13h nebula-storaged-3 1/1 Running 0 13h nebula-storaged-4 1/1 Running 0 13h nebula-storaged-5 1/1 Running 0 12h nebula-storaged-6 1/1 Running 0 12h nebula-storaged-7 1/1 Running 0 12h nebula-storaged-8 1/1 Running 0 12h nebula-storaged-1 1/1 Running 0 13h nebula-storaged-1 1/1 Terminating 0 13h nebula-storaged-1 0/1 Terminating 0 13h nebula-storaged-1 0/1 Terminating 0 13h nebula-storaged-1 0/1 Terminating 0 13h nebula-storaged-1 0/1 Terminating 0 13h nebula-storaged-1 0/1 Pending 0 0s nebula-storaged-1 0/1 Pending 0 0s nebula-storaged-1 0/1 ContainerCreating 0 0s nebula-storaged-1 0/1 Running 0 1s nebula-storaged-1 1/1 Running 0 10s
The above output indicates that the
nebula-storaged-1
Storage service Pod has been successfully restarted.After restarting a single Storage service Pod, the distribution of storage leader replicas may become unbalanced. You can execute the
BALANCE LEADER
command to rebalance the distribution of leader replicas. For information about how to view the leader distribution, seeSHOW HOSTS
.