Self-healing¶

Nebula Operator calls the interface provided by NebulaGraph clusters to dynamically sense cluster service status. Once an exception is detected (for example, a component in a NebulaGraph cluster stops running), Nebula Operator automatically performs fault tolerance. This topic shows how Nebular Operator performs self-healing by simulating cluster failure of deleting one Storage service Pod in a NebulaGraph cluster.

Prerequisites¶

Install Nebula Operator

Steps¶

Create a NebulaGraph cluster. For more information, see Deploy NebulaGraph clusters with Kubectl or Deploy NebulaGraph clusters with Helm.
Delete the Pod named <cluster_name>-storaged-2 after all pods are in the Running status.
```
kubectl delete pod <cluster-name>-storaged-2 --now
```
<cluster_name> is the name of your NebulaGraph cluster.

Nebula Operator automates the creation of the Pod named <cluster-name>-storaged-2 to perform self-healing.

Run the kubectl get pods command to check the status of the Pod <cluster-name>-storaged-2.

...
nebula-cluster-storaged-1        1/1     Running             0          5d23h
nebula-cluster-storaged-2        0/1     ContainerCreating   0          1s
...

...
nebula-cluster-storaged-1        1/1     Running     0          5d23h
nebula-cluster-storaged-2        1/1     Running     0          4m2s
...

When the status of <cluster-name>-storaged-2 is changed from ContainerCreating to Running, the self-healing is performed successfully.

Last update: February 19, 2024