Scheduled backup¶

NebulaGraph Operator supports scheduled backup, which allows for regular full or incremental backups of data to GCS or S3-compatible storage services. This topic describes how to use NebulaGraph Operator to schedule backups for a NebulaGraph cluster.

NebulaGraph Operator provides the NebulaCronBackup resource object to enable the execution of backup jobs at scheduled intervals. The NebulaCronBackup resource object can be configured with a backup template spec.backupTemplate and a schedule spec.schedule for backup jobs.

Cautions¶

During the backup operation, DDL and DML statements in the specified graph space are blocked. We recommend that you perform operations during off-peak business hours, such as between 2 a.m. and 5 a.m.
The cluster performing the incremental backup and the cluster specified for the previous backup must be the same, and the storage bucket specified must be the same as the previous backup.
Ensure that the time interval between each incremental backup and the previous backup is less than one wal_ttl duration.
The backup of specified graph space data is not supported.

Prerequisites¶

To perform scheduled backups of data, the following conditions must be met:

NebulaGraph Operator version is 1.8.0 or higher.
A NebulaGraph cluster is running on Kubernetes.
Access credentials for Google Cloud Storage (GCS) or cloud storage services compatible with the S3 protocol are prepared for data backup.

Steps¶

The following example is a scheduled full backup process with all created resource objects in the default namespace default.

Execute kubectl edit nc <cluster_name> to edit the cluster's YAML file.
- <cluster_name> is the name of your cluster.

Set spec.enableBR to true to enable the backup feature. You can also customize the Agent configuration. For details, see Create a cluster.

Partial configuration of the cluster

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
spec:  
  enableBR: true      # Set to true to enable the backup and restore feature. The default value is false.
  agent:              # A component used for backup and restore. Default settings are used if not modified.
    image: vesoft/nebula-agent  # Agent image address. The default value is vesoft/nebula-agent.
    version: latest             # Agent image version. The default value is latest.
    resources:                  
      requests:
        cpu: "100m"             # Minimum CPU usage.
        memory: "128Mi"         # Minimum memory usage.
      limits:
        cpu: "1"                # Maximum CPU usage.
        memory: "1Gi"           # Maximum memory usage.
  # Limit the speed of file upload and download, in Mbps. The default value is 0, indicating no limit.
  # rateLimit: 0
  # The connection timeout between the Agent and metad, in seconds. The default value is 60.
  # heartbeatInterval: 60
...

Create a Secret for pulling the NebulaGraph Backup&Restore image from the private repository.
```
kubectl -n <nebula> create secret docker-registry <br_secret_name> \
--docker-server=<registry_server> \
--docker-username=<registry_username> \
--docker-password=<registry_password> \
```
- <nebula>: The namespace where the Secret is stored. Ensure that the namespace is the same as the namespace of the NebulaCronBackup object. This example does not set the namespace, so the default namespace default is used.
- <br_secret_name>: Name of the Secret.
- <registry_server>: The private repository server address for pulling the image, such as reg.example-inc.com.
- <registry_username>: The image repository username.
- <registry_password>: The image repository password.

Create a YAML file for the scheduled backup, such as backup_file_name.yaml.

The YAML file example for scheduled incremental backup is similar to the scheduled full backup, except that you need to add the baseBackupName parameter in the spec.backupTemplate.config of the scheduled incremental backup YAML file.

Example YAML content for scheduled backup to GCSExample YAML content for scheduled backup to S3-compatible storage service

apiVersion: v1
kind: Secret                                                       
metadata:
  name: gcs-secret                                # Name of the Secret for accessing GCS storage service.
type: Opaque
data:
  credentials: <GOOGLE_APPLICATION_CREDENTIALS_JSON>  # JSON string representing Google service account key.
---      
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCronBackup
metadata:
  name: cron123
spec:
  schedule: "*/5 * * * *"                        # Execution time for the scheduled jobs.
                                                 # For setting the execution time of scheduled jobs,
                                                 # see https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#schedule
  maxReservedTime: 15m                           # Specifies the maximum retention time for the backup object. The backup object will be deleted after this time. Supported time units: s, m, h.
  backupTemplate:
    image: reg.vesoft-inc.com/cloud-dev/br-ent   # Image URL for the NebulaGraph Backup&Restore tool.
    version: v3.7.1                      # Version of the NebulaGraph Backup&Restore tool.
    resources:                                   # Defines the resource requests and limits required for the backup jobs.
      limits:
        cpu: "200m"
        memory: 300Mi
      requests:
        cpu: 100m
        memory: 200Mi
    imagePullSecrets:                            # Name of the Secret required to pull the NebulaGraph Backup&Restore image from the private repository.
    - name: br_secret_name
    autoRemoveFinished: false                     # Whether to automatically delete the Job after the backup jobs is completed or failed. The default value is false.
    cleanBackupData: false                        # Whether to delete the backup data in the cloud storage service when deleting the backup object. The default value is false.
    config:                                      # Configuration information for the backup jobs.
      concurrency: 5                             # The concurrency of the backup jobs. The default value is 5.
      clusterName: nebula                        # Name of the backup cluster.
      gs:                                        # Configuration information for the GCS storage service.
        location: "us-central1"                  # Geographic region of the GCS storage bucket.
        bucket: "ng-2025"                        # Name of the GCS storage bucket for storing backup data.
        secretName: "gcs-secret"                 # Name of the Secret for accessing the GCS storage bucket.

apiVersion: v1
kind: Secret                                   
metadata:
  name: aws-s3-secret                            # Name of the Secret for accessing S3-compatible storage service.
type: Opaque
data:                                          
  access_key: QVNJQVE0WFlxxx                     # AccessKey for the S3-compatible storage service.
  secret_key: ZFJ6OEdNcDdxenMwVGxxx              # SecretKey for the S3-compatible storage service.
---
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCronBackup
metadata:
  name: cron123
spec:
  schedule: "*/5 * * * *"                        # Execution time for the scheduled jobs.
                                                 # For setting the execution time of scheduled jobs,
                                                 # see https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#schedule
  maxReservedTime: 15m                           # Specifies the maximum retention time for the backup object. The backup object will be deleted after this time. Supported time units: s, m, h.
  backupTemplate:
    image: reg.vesoft-inc.com/cloud-dev/br-ent   # Image URL for the NebulaGraph Backup&Restore tool.
    version: v3.7.1                      # Version of the NebulaGraph Backup&Restore tool.
    resources:                                   # Defines the resource requests and limits required for the backup jobs.
      limits:
        cpu: "200m"
        memory: 300Mi
      requests:
        cpu: 100m
        memory: 200Mi
    imagePullSecrets:                            # Name of the Secret required to pull the NebulaGraph Backup&Restore image from the private repository.
    - name: br_secret_name
    autoRemoveFinished: false                     # Whether to automatically delete the Job after the backup jobs is completed or failed. The default value is false.
    cleanBackupData: false                        # Whether to delete the backup data in the cloud storage service when deleting the backup object. The default value is false.
    config:                                      # Configuration information for the backup jobs.
      concurrency: 5                             # The concurrency of the backup jobs. The default value is 5.
      clusterName: nebula                        # Name of the backup cluster.
      s3:                                        # Configuration information for the S3-compatible storage service.
        region: "us-east-2"                      # Geographic region of the S3 storage bucket.
        bucket: "nebula-test"                    # Name of the S3 storage bucket for storing backup data.
        endpoint: "https://s3.us-east-2.amazonaws.com"  # Access URL for the S3 storage bucket.
        secretName: "aws-s3-secret"              # Name of the Secret for accessing the S3 storage bucket.

In addition to the parameters in the above YAML files, the following parameters can also be set:

Parameter	Default Value	Description
`spec.backupTemplate.env`	`[]`	Environment variables for the scheduled backup job.
`spec.backupTemplate.imagePullPolicy`	`Always`	Image pull policy for the scheduled backup jobs.
`spec.backupTemplate.nodeSelector`	`{}`	Node selector for the scheduled backup jobs.
`spec.backupTemplate.affinity`	`{}`	Affinity for the scheduled backup jobs.
`spec.backupTemplate.tolerations`	`[]`	Tolerations for node taints in the scheduled backup jobs.
`spec.backupTemplate.initContainers`	`[]`	Initialization containers for the scheduled backup jobs.
`spec.backupTemplate.sidecarContainers`	`[]`	Sidecar containers for the backup jobs.
`spec.backupTemplate.volumes`	`[]`	Volumes for the scheduled backup jobs.
`spec.backupTemplate.volumeMounts`	`[]`	Volume mounts for the scheduled backup jobs.
`spec.backupTemplate.config.clusterNamespace`	`default`	Namespace where the cluster to be backed up locates. Cross-namespace cluster backup is supported, which means the namespace of the cluster may be different from that of the `NebulaBackup` object.

Start the backup job.

kubectl create -f backup_file_name.yaml

View the status of the scheduled backup job.

kubectl get ncb

Output:

NAME      SCHEDULE      LASTBACKUP                LASTSCHEDULETIME   LASTSUCCESSFULTIME   BACKUPCLEANTIME   AGE
cron123   */5 * * * *   cron123-20240205t102500   2m40s              64s                  54s               45m

View the scheduled backup object.

kubectl get nb -l "apps.nebula-graph.io/cron-backup=cron123"

Output:

NAME                      TYPE   BACKUP                       STATUS     STARTED   COMPLETED   AGE
cron123-20240205t094500   full   BACKUP_2024_02_05_09_45_01   Complete   42m       41m         42m
cron123-20240205t102500   full   BACKUP_2024_02_05_10_26_08   Complete   85s       55s         85s

Pause scheduled backup jobs¶

To pause a scheduled backup job, execute the following command:

kubectl patch ncb cron123  --type='merge' --patch '{"spec": {"pause": true}}'

Output:

nebulacronbackup.apps.nebula-graph.io/cron123 patched

Resume scheduled backup jobs¶

To resume a paused scheduled backup job, execute the following command:

kubectl patch ncb cron123  --type='merge' --patch '{"spec": {"pause": null}}'

Output:

nebulacronbackup.apps.nebula-graph.io/cron123 patched

Delete scheduled backup jobs¶

The spec.maxReservedTime parameter can be set in the YAML file for the scheduled backup to specify the maximum retention time for the backup object. The backup object is deleted after this time. By default, after the backup resource object is deleted, the backup data is retained in the cloud storage service. To delete the backup data in the cloud storage service, set spec.backupTemplate.cleanBackupData to true when creating the resource object for the scheduled backup.

kind: NebulaCronBackup
spec:
  maxReservedTime: 15m     # Supported time units: s, m, h
  backupTemplate:
    cleanBackupData: false
  ...

Last update: March 6, 2024