Restart service Pods in a NebulaGraph cluster on K8s¶

Note

Restarting NebulaGraph cluster service Pods is a feature in the Alpha version.

During routine maintenance, it might be necessary to restart a specific service Pod in the NebulaGraph cluster, for instance, when the Pod's status is abnormal or to enforce a restart. Restarting a Pod essentially means restarting the service process. To ensure high availability, NebulaGraph Operator supports gracefully restarting all Pods of the Graph, Meta, or Storage service respectively and gracefully restarting an individual Pod of the Storage service.

Prerequisites¶

A NebulaGraph cluster is created in a K8s environment. For details, see Create a NebulaGraph cluster.

Restart all Pods of a certain service type¶

To gracefully roll restart all Pods of a certain service type in the cluster, you can add an annotation (nebula-graph.io/restart-timestamp) with the current time to the configuration of the StatefulSet controller of the corresponding service.

When NebulaGraph Operator detects that the StatefulSet controller of the corresponding service has the annotation nebula-graph.io/restart-timestamp and its value is changed, it triggers the graceful rolling restart operation for all Pods of that service type in the cluster.

In the following example, the annotation is added for all Graph services so that all Pods of these Graph services are restarted one by one.

Assume that the cluster name is nebula and the cluster resources are in the default namespace. Run the following command:

Check the name of the StatefulSet controller.

kubectl get statefulset

Sample output:

NAME              READY   AGE
nebula-graphd     2/2     33s
nebula-metad      3/3     69s
nebula-storaged   3/3     69s

Get the current timestamp.
```
date -u +%s
```
Example output:
```
1700547115
```

Overwrite the timestamp annotation of the StatefulSet controller to trigger the graceful rolling restart operation.

kubectl annotate statefulset nebula-graphd nebula-graph.io/restart-timestamp="1700547115" --overwrite

Example output:

statefulset.apps/nebula-graphd annotate

Observe the restart process.

kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=graphd -w

Example output:

NAME              READY   STATUS    RESTARTS   AGE
nebula-graphd-0   1/1     Running   0          9m37s
nebula-graphd-1   0/1     Running   0          17s
nebula-graphd-1   1/1     Running   0          20s
nebula-graphd-0   1/1     Terminating   0          9m40s
nebula-graphd-0   0/1     Terminating   0          9m41s
nebula-graphd-0   0/1     Terminating   0          9m42s
nebula-graphd-0   0/1     Terminating   0          9m42s
nebula-graphd-0   0/1     Terminating   0          9m42s
nebula-graphd-0   0/1     Pending       0          0s
nebula-graphd-0   0/1     Pending       0          0s
nebula-graphd-0   0/1     ContainerCreating   0          0s
nebula-graphd-0   0/1     Running             0          2s

This above output shows the status of Graph service Pods during the restart process.

Verify that the StatefulSet controller annotation is updated.

kubectl get statefulset nebula-graphd -o yaml | grep "nebula-graph.io/restart-timestamp"

Example output:

nebula-graph.io/last-applied-configuration: '{"persistentVolumeClaimRetentionPolicy":{"whenDeleted":"Retain","whenScaled":"Retain"},"podManagementPolicy":"Parallel","replicas":2,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"serviceName":"nebula-graphd-headless","template":{"metadata":{"annotations":{"nebula-graph.io/cm-hash":"7c55c0e5ac74e85f","nebula-graph.io/restart-timestamp":"1700547815"},"creationTimestamp":null,"labels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"spec":{"containers":[{"command":["/bin/sh","-ecx","exec
nebula-graph.io/restart-timestamp: "1700547115"
    nebula-graph.io/restart-timestamp: "1700547815"

The above output indicates that the annotation of the StatefulSet controller is updated, and all graph service Pods are restarted.

Restart a single Storage service Pod¶

To gracefully roll restart a single Storage service Pod, you can add an annotation (nebula-graph.io/restart-ordinal) with the value set to the ordinal number of the Storage service Pod you want to restart. This triggers a graceful restart or state transition for that specific Storage service Pod. The added annotation will be automatically removed after the Storage service Pod is restarted.

In the following example, the annotation is added for the Pod with ordinal number 1, indicating a graceful restart for the nebula-storaged-1 Storage service Pod.

Assume that the cluster name is nebula, and the cluster resources are in the default namespace. Run the following commands:

Check the name of the StatefulSet controller.

kubectl get statefulset

Example output:

NAME              READY   AGE
nebula-graphd     2/2     33s
nebula-metad      3/3     69s
nebula-storaged   3/3     69s

Get the ordinal number of the Storage service Pod.

kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged

Example output:

NAME                READY   STATUS    RESTARTS   AGE
nebula-storaged-0   1/1     Running   0          13h
nebula-storaged-1   1/1     Running   0          13h
nebula-storaged-2   1/1     Running   0          13h
nebula-storaged-3   1/1     Running   0          13h
nebula-storaged-4   1/1     Running   0          13h
nebula-storaged-5   1/1     Running   0          13h
nebula-storaged-6   1/1     Running   0          13h
nebula-storaged-7   1/1     Running   0          13h
nebula-storaged-8   1/1     Running   0          13h

Add the annotation for the nebula-storaged-1 Pod to trigger a graceful restart for that specific Pod.

kubectl annotate statefulset nebula-storaged nebula-graph.io/restart-ordinal="1"

Example output:

statefulset.apps/nebula-storaged annotate

Observe the restart process.

kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged -w

Example output:

NAME                READY   STATUS    RESTARTS   AGE
nebula-storaged-0   1/1     Running   0          13h
nebula-storaged-1   1/1     Running   0          13h
nebula-storaged-2   1/1     Running   0          13h
nebula-storaged-3   1/1     Running   0          13h
nebula-storaged-4   1/1     Running   0          13h
nebula-storaged-5   1/1     Running   0          12h
nebula-storaged-6   1/1     Running   0          12h
nebula-storaged-7   1/1     Running   0          12h
nebula-storaged-8   1/1     Running   0          12h


nebula-storaged-1   1/1     Running   0          13h
nebula-storaged-1   1/1     Terminating   0          13h
nebula-storaged-1   0/1     Terminating   0          13h
nebula-storaged-1   0/1     Terminating   0          13h
nebula-storaged-1   0/1     Terminating   0          13h
nebula-storaged-1   0/1     Terminating   0          13h
nebula-storaged-1   0/1     Pending       0          0s
nebula-storaged-1   0/1     Pending       0          0s
nebula-storaged-1   0/1     ContainerCreating   0          0s
nebula-storaged-1   0/1     Running             0          1s
nebula-storaged-1   1/1     Running             0          10s

The above output indicates that the nebula-storaged-1 Storage service Pod has been successfully restarted.

After restarting a single Storage service Pod, the distribution of storage leader replicas may become unbalanced. You can execute the BALANCE LEADER command to rebalance the distribution of leader replicas. For information about how to view the leader distribution, see SHOW HOSTS.

Last update: March 6, 2024