Skip to content

Restart service Pods in a NebulaGraph cluster on K8s

Note

Restarting NebulaGraph cluster service Pods is a feature in the Alpha version.

During routine maintenance, it might be necessary to restart a specific service Pod in the NebulaGraph cluster, for instance, when the Pod's status is abnormal or to enforce a restart. Restarting a Pod essentially means restarting the service process. To ensure high availability, NebulaGraph Operator supports gracefully restarting all Pods of the Graph, Meta, or Storage service respectively and gracefully restarting an individual Pod of the Storage service.

Prerequisites

A NebulaGraph cluster is created in a K8s environment. For details, see Create a NebulaGraph cluster.

Restart all Pods of a certain service type

To gracefully roll restart all Pods of a certain service type in the cluster, you can add an annotation (nebula-graph.io/restart-timestamp) with the current time to the configuration of the StatefulSet controller of the corresponding service.

When NebulaGraph Operator detects that the StatefulSet controller of the corresponding service has the annotation nebula-graph.io/restart-timestamp and its value is changed, it triggers the graceful rolling restart operation for all Pods of that service type in the cluster.

In the following example, the annotation is added for all Graph services so that all Pods of these Graph services are restarted one by one.

Assume that the cluster name is nebula and the cluster resources are in the default namespace. Run the following command:

  1. Check the name of the StatefulSet controller.

    kubectl get statefulset 
    

    Sample output:

    NAME              READY   AGE
    nebula-graphd     2/2     33s
    nebula-metad      3/3     69s
    nebula-storaged   3/3     69s
    
  2. Get the current timestamp.

    date -u +%s
    

    Example output:

    1700547115
    
  3. Overwrite the timestamp annotation of the StatefulSet controller to trigger the graceful rolling restart operation.

    kubectl annotate statefulset nebula-graphd nebula-graph.io/restart-timestamp="1700547115" --overwrite
    

    Example output:

    statefulset.apps/nebula-graphd annotate
    
  4. Observe the restart process.

    kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=graphd -w
    

    Example output:

    NAME              READY   STATUS    RESTARTS   AGE
    nebula-graphd-0   1/1     Running   0          9m37s
    nebula-graphd-1   0/1     Running   0          17s
    nebula-graphd-1   1/1     Running   0          20s
    nebula-graphd-0   1/1     Terminating   0          9m40s
    nebula-graphd-0   0/1     Terminating   0          9m41s
    nebula-graphd-0   0/1     Terminating   0          9m42s
    nebula-graphd-0   0/1     Terminating   0          9m42s
    nebula-graphd-0   0/1     Terminating   0          9m42s
    nebula-graphd-0   0/1     Pending       0          0s
    nebula-graphd-0   0/1     Pending       0          0s
    nebula-graphd-0   0/1     ContainerCreating   0          0s
    nebula-graphd-0   0/1     Running             0          2s
    

    This above output shows the status of Graph service Pods during the restart process.

  5. Verify that the StatefulSet controller annotation is updated.

    kubectl get statefulset nebula-graphd -o yaml | grep "nebula-graph.io/restart-timestamp"
    

    Example output:

    nebula-graph.io/last-applied-configuration: '{"persistentVolumeClaimRetentionPolicy":{"whenDeleted":"Retain","whenScaled":"Retain"},"podManagementPolicy":"Parallel","replicas":2,"revisionHistoryLimit":10,"selector":{"matchLabels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"serviceName":"nebula-graphd-headless","template":{"metadata":{"annotations":{"nebula-graph.io/cm-hash":"7c55c0e5ac74e85f","nebula-graph.io/restart-timestamp":"1700547815"},"creationTimestamp":null,"labels":{"app.kubernetes.io/cluster":"nebula","app.kubernetes.io/component":"graphd","app.kubernetes.io/managed-by":"nebula-operator","app.kubernetes.io/name":"nebula-graph"}},"spec":{"containers":[{"command":["/bin/sh","-ecx","exec
    nebula-graph.io/restart-timestamp: "1700547115"
        nebula-graph.io/restart-timestamp: "1700547815" 
    

The above output indicates that the annotation of the StatefulSet controller has been updated, and all graph service Pods has been restarted.

Restart a single Storage service Pod

To gracefully roll restart a single Storage service Pod, you can add an annotation (nebula-graph.io/restart-ordinal) with the value set to the ordinal number of the Storage service Pod you want to restart. This triggers a graceful restart or state transition for that specific Storage service Pod. The added annotation will be automatically removed after the Storage service Pod is restarted.

In the following example, the annotation is added for the Pod with ordinal number 1, indicating a graceful restart for the nebula-storaged-1 Storage service Pod.

Assume that the cluster name is nebula, and the cluster resources are in the default namespace. Run the following commands:

  1. Check the name of the StatefulSet controller.

    kubectl get statefulset 
    

    Example output:

    NAME              READY   AGE
    nebula-graphd     2/2     33s
    nebula-metad      3/3     69s
    nebula-storaged   3/3     69s
    
  2. Get the ordinal number of the Storage service Pod.

    kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged
    

    Example output:

    NAME                READY   STATUS    RESTARTS   AGE
    nebula-storaged-0   1/1     Running   0          13h
    nebula-storaged-1   1/1     Running   0          13h
    nebula-storaged-2   1/1     Running   0          13h
    nebula-storaged-3   1/1     Running   0          13h
    nebula-storaged-4   1/1     Running   0          13h
    nebula-storaged-5   1/1     Running   0          13h
    nebula-storaged-6   1/1     Running   0          13h
    nebula-storaged-7   1/1     Running   0          13h
    nebula-storaged-8   1/1     Running   0          13h
    
  3. Add the annotation for the nebula-storaged-1 Pod to trigger a graceful restart for that specific Pod.

    kubectl annotate statefulset nebula-storaged nebula-graph.io/restart-ordinal="1" 
    

    Example output:

    statefulset.apps/nebula-storaged annotate
    
  4. Observe the restart process.

    kubectl get pods -l app.kubernetes.io/cluster=nebula,app.kubernetes.io/component=storaged -w
    

    Example output:

    NAME                READY   STATUS    RESTARTS   AGE
    nebula-storaged-0   1/1     Running   0          13h
    nebula-storaged-1   1/1     Running   0          13h
    nebula-storaged-2   1/1     Running   0          13h
    nebula-storaged-3   1/1     Running   0          13h
    nebula-storaged-4   1/1     Running   0          13h
    nebula-storaged-5   1/1     Running   0          12h
    nebula-storaged-6   1/1     Running   0          12h
    nebula-storaged-7   1/1     Running   0          12h
    nebula-storaged-8   1/1     Running   0          12h
    
    
    nebula-storaged-1   1/1     Running   0          13h
    nebula-storaged-1   1/1     Terminating   0          13h
    nebula-storaged-1   0/1     Terminating   0          13h
    nebula-storaged-1   0/1     Terminating   0          13h
    nebula-storaged-1   0/1     Terminating   0          13h
    nebula-storaged-1   0/1     Terminating   0          13h
    nebula-storaged-1   0/1     Pending       0          0s
    nebula-storaged-1   0/1     Pending       0          0s
    nebula-storaged-1   0/1     ContainerCreating   0          0s
    nebula-storaged-1   0/1     Running             0          1s
    nebula-storaged-1   1/1     Running             0          10s  
    

    The above output indicates that the nebula-storaged-1 Storage service Pod is successfully restarted.

    After restarting a single Storage service Pod, the distribution of storage leader replicas may become unbalanced. You can execute the BALANCE LEADER command to rebalance the distribution of leader replicas. For information about how to view the leader distribution, see SHOW HOSTS.


Last update: March 7, 2024