Skip to content

Self-healing

Nebula Operator calls the interface provided by NebulaGraph clusters to dynamically sense cluster service status. Once an exception is detected (for example, a component in a NebulaGraph cluster stops running), Nebula Operator automatically performs fault tolerance. This topic shows how Nebular Operator performs self-healing by simulating cluster failure of deleting one Storage service Pod in a NebulaGraph cluster.

Prerequisites

Install Nebula Operator

Steps

  1. Create a NebulaGraph cluster. For more information, see Deploy NebulaGraph clusters with Kubectl or Deploy NebulaGraph clusters with Helm.

  2. Delete the Pod named <cluster_name>-storaged-2 after all pods are in the Running status.

    kubectl delete pod <cluster-name>-storaged-2 --now
    
    <cluster_name> is the name of your NebulaGraph cluster.

  3. Nebula Operator automates the creation of the Pod named <cluster-name>-storaged-2 to perform self-healing.

    Run the kubectl get pods command to check the status of the Pod <cluster-name>-storaged-2.

    ...
    nebula-cluster-storaged-1        1/1     Running             0          5d23h
    nebula-cluster-storaged-2        0/1     ContainerCreating   0          1s
    ...
    

    ...
    nebula-cluster-storaged-1        1/1     Running     0          5d23h
    nebula-cluster-storaged-2        1/1     Running     0          4m2s
    ...
    
    When the status of <cluster-name>-storaged-2 is changed from ContainerCreating to Running, the self-healing is performed successfully.


Last update: February 19, 2024