Full backup¶
Full backup refers to backing up the entire NebulaGraph cluster data. This topic describes how to use NebulaGraph Operator to fully back up NebulaGraph cluster data to Google Cloud Storage (GCS) and S3-compatible storage services (such as AWS S3 and Minio).
Cautions¶
- During the backup operation, DDL and DML statements in the specified graph space are blocked. We recommend that you perform operations during off-peak business hours, such as between 2 a.m. and 5 a.m.
- The backup of specified graph space data is not supported.
Prerequisites¶
To fully back up data using NebulaGraph Operator, the following conditions must be met:
- NebulaGraph Operator version is 1.8.0 or higher.
- A NebulaGraph cluster is running on Kubernetes.
- Access credentials for Google Cloud Storage (GCS) or cloud storage services compatible with the S3 protocol are prepared for data backup.
Steps¶
The following example is a full backup process with all created resource objects in the default namespace default
.
-
Execute
kubectl edit nc <cluster_name>
to edit the cluster's YAML file.<cluster_name>
is the name of your cluster.
-
Set
spec.enableBR
totrue
to enable the backup feature. You can also customize the Agent configuration. For more information, see Create a cluster.Partial configuration of the clusterapiVersion: apps.nebula-graph.io/v1alpha1 kind: NebulaCluster metadata: name: nebula spec: enableBR: true # Set to true to enable the backup and restore feature. The default value is false. agent: # A component used for backup and restore. Default settings are used if not modified. image: vesoft/nebula-agent # Agent image address. The default value is vesoft/nebula-agent. version: latest # Agent image version. The default value is latest. resources: requests: cpu: "100m" # Minimum CPU usage. memory: "128Mi" # Minimum memory usage. limits: cpu: "1" # Maximum CPU usage. memory: "1Gi" # Maximum memory usage. # Limit the speed of file upload and download, in Mbps. The default value is 0, indicating no limit. # rateLimit: 0 # The connection timeout between the Agent and metad, in seconds. The default value is 60. # heartbeatInterval: 60 ...
-
Create a Secret for pulling the NebulaGraph Backup&Restore image from the private repository.
kubectl -n <nebula> create secret docker-registry <br_secret_name> \ --docker-server=<registry_server> \ --docker-username=<registry_username> \ --docker-password=<registry_password> \
<nebula>
: The namespace where the Secret is stored. Ensure that the namespace is the same as the namespace of theNebulaBackup
object. This example does not set the namespace, so the default namespacedefault
is used.<br_secret_name>
: The name of the Secret.<registry_server>
: The private repository server address for pulling the image, such asreg.example-inc.com
.<registry_username>
: The image repository username.<registry_password>
: The image repository password.
-
Create a YAML file for the full backup, such as
backup_file_name.yaml
.apiVersion: v1 kind: Secret metadata: name: gcs-secret # Name of the Secret used to access the GCS storage service. type: Opaque data: credentials: <GOOGLE_APPLICATION_CREDENTIALS_JSON> # JSON string that represents the Google service account key. --- apiVersion: apps.nebula-graph.io/v1alpha1 kind: NebulaBackup metadata: name: nb2024-full # Name of the backup job. spec: image: reg.vesoft-inc.com/cloud-dev/br-ent # Image address of the NebulaGraph Backup&Restore tool. version: v3.7.1 # Version of the NebulaGraph Backup&Restore tool. resources: # Resource requests and limits required for the backup job. limits: cpu: "1" memory: 300Mi requests: cpu: 100m memory: 200Mi imagePullSecrets: # Name of the Secret needed to pull the NebulaGraph Backup&Restore image from a private repository. - name: br_secret_name autoRemoveFinished: false # Determines whether to automatically delete the job after the backup job is completed or fails. The default value is false. cleanBackupData: false # Determines whether to delete the backup data in the cloud storage service when the backup object is deleted. The default value is false. config: # Configuration information for the backup job. concurrency: 5 # The concurrency of the backup jobs. The default value is 5. clusterName: nebula # Name of the cluster to be backed up. gs: # Configuration information for the GCS storage service. location: "us-central1" # Geographic region of the GCS storage bucket. bucket: "ng-2025" # Name of the GCS storage bucket used to store backup data. secretName: "gcs-secret" # Name of the Secret used to access the GCS storage bucket.
apiVersion: v1 kind: Secret metadata: name: aws-s3-secret # Name of the Secret used to access the S3-compatible storage service. type: Opaque data: access_key: QVNJQVE0WFlxxx # AccessKey for the S3-compatible storage service. secret_key: ZFJ6OEdNcDdxenMwVGxxx # SecretKey for the S3-compatible storage service. --- apiVersion: apps.nebula-graph.io/v1alpha1 kind: NebulaBackup metadata: name: nb2024-full # Name of the backup job. spec: image: reg.vesoft-inc.com/cloud-dev/br-ent # Image address of the NebulaGraph Backup&Restore tool. version: v3.7.1 # Version of the NebulaGraph Backup&Restore tool. resources: # Resource requests and limits required for the backup job. limits: cpu: "1" memory: 300Mi requests: cpu: 100m memory: 200Mi imagePullSecrets: # Name of the Secret needed to pull the NebulaGraph Backup&Restore image from a private repository. - name: br_secret_name autoRemoveFinished: false # Determines whether to automatically delete the job after the backup task is completed or fails. The default value is false. cleanBackupData: false # Determines whether to delete the backup data in the cloud storage service when the backup object is deleted. The default value is false. config: # Configuration information for the backup job. concurrency: 5 # The concurrency of the backup jobs. The default value is 5. clusterName: nebula # Name of the cluster to be backed up. s3: # Configuration information for the S3-compatible storage service. region: "us-east-2" # Geographic region of the S3 storage bucket. bucket: "nebula-test" # Name of the S3 storage bucket used to store backup data. endpoint: "https://s3.us-east-2.amazonaws.com" # Access address of the S3 storage bucket. secretName: "aws-s3-secret" # Name of the Secret used to access the S3 storage bucket.
In addition to the parameters in the above YAML files, the following parameters can also be set:
Parameter Default Value Description spec.env
[]
Configures environment variables for the backup job. spec.imagePullPolicy
Always
The image pull policy for the backup job. spec.nodeSelector
{}
The node selector for the backup job. spec.affinity
{}
The affinity for the backup job. spec.tolerations
[]
The tolerations for the backup job. spec.initContainers
[]
The initialization containers for the backup job. spec.sidecarContainers
[]
The sidecar containers for the backup job. spec.volumes
[]
The volumes for the backup job. spec.volumeMounts
[]
The volume mounts for the backup job. spec.config.clusterNamespace
default
Namespace where the cluster to be backed up is located. Cross-namespace cluster backup is supported, which means the namespace of the cluster may be different from that of the NebulaBackup
object. -
Start the backup job.
kubectl create -f backup_file_name.yaml
Upon initiating a backup job, the system automatically creates a backup instance
nb
, which in turn creates a Job object. This Job object is responsible for creating and managing a Pod object. The Pod object runs the NebulaGraph Backup&Restore tool to execute the backup job. If thespec.autoRemoveFinished
configuration of theNebulaBackup
resource object is set totrue
, the system automatically deletes the Job object upon completion of the backup job. -
Check the status of the backup object.
View the status of the
NebulaBackup
object.kubectl get nb
Output:
NAME TYPE BACKUP STATUS STARTED COMPLETED AGE nb2024-full full BACKUP_2024_02_05_08_05_13 Complete 71s 1s 71s
When
spec.autoRemoveFinished
is set tofalse
, you can execute the following commands to check the status of the Job and Pod objects.View the status of the Job object.
kubectl get job
Output:
NAME COMPLETIONS DURATION AGE backup-full-nb202402v5 1/1 96s 96s
View the status of the Pod object.
kubectl get pod
Output:
NAME READY STATUS RESTARTS AGE backup-full-nb202402v5-6994k 1/1 Running 0 109s