Incremental Backup¶

Incremental backup refers to the process of backing up data based on the previous backup. Compared to a full backup, incremental backups have smaller file sizes and shorter backup times. This topic describes how to use NebulaGraph Operator to incrementally backup NebulaGraph cluster data to Google Cloud Storage (GCS) and S3-compatible cloud storage services (such as AWS S3 and Minio).

Cautions¶

During the backup operation, DDL and DML statements in the specified graph space are blocked. We recommend that you perform operations during off-peak business hours, such as between 2 a.m. and 5 a.m.
The cluster performing the incremental backup and the cluster specified for the previous backup must be the same, and the storage bucket specified must be the same as the previous backup.
Ensure that the time interval between each incremental backup and the previous backup is less than one wal_ttl duration.
The backup of specified graph space data is not supported.

Prerequisites¶

To perform incremental backup of data using NebulaGraph Operator, the following conditions must be met:

NebulaGraph Operator version is 1.8.0 or higher.
A NebulaGraph cluster is running on Kubernetes.
Access credentials for Google Cloud Storage (GCS) or cloud storage services compatible with the S3 protocol are prepared for data backup.
A full backup is completed.

Steps¶

The following example is an incremental backup process with all created resource objects in the default namespace default.

Execute kubectl edit nc <cluster_name> to edit the cluster's YAML file.
- <cluster_name> is the name of your cluster.

Set spec.enableBR to true to enable the backup feature. You can also customize the Agent configuration. For details, see Create a cluster.

Partial configuration of the cluster

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
spec:  
  enableBR: true      # Set to true to enable the backup and restore feature. The default value is false.
  agent:              # A component used for backup and restore. Default settings are used if not modified.
    image: vesoft/nebula-agent  # Agent image address. The default value is vesoft/nebula-agent.
    version: latest             # Agent image version. The default value is latest.
    resources:                  
      requests:
        cpu: "100m"             # Minimum CPU usage.
        memory: "128Mi"         # Minimum memory usage.
      limits:
        cpu: "1"                # Maximum CPU usage.
        memory: "1Gi"           # Maximum memory usage.
  # Limit the speed of file upload and download, in Mbps. The default value is 0, indicating no limit.
  # rateLimit: 0
  # The connection timeout between the Agent and metad, in seconds. The default value is 60.
  # heartbeatInterval: 60
...

Create a Secret for pulling the NebulaGraph Backup&Restore image from the private repository.
```
kubectl -n <nebula> create secret docker-registry <br_secret_name> \
--docker-server=<registry_server> \
--docker-username=<registry_username> \
--docker-password=<registry_password> \
```
- <nebula>: The namespace where the Secret is stored. Ensure that the namespace is the same as the namespace of the NebulaBackup object. This example does not set the namespace, so the default namespace default is used.
- <br_secret_name>: Name of the Secret.
- <registry_server>: The private repository server address for pulling the image, such as reg.example-inc.com.
- <registry_username>: The image repository username.
- <registry_password>: The image repository password.

Create an incremental backup YAML file, such as backup_file_name.yaml.

The incremental backup to GCS and S3-compatible storage services YAML examples are similar to the full backup, except that the baseBackupName parameter must be added to the spec.config field in the incremental backup YAML file for specifying the name of the previous backup.

Example YAML content for incremental backup to GCSExample YAML content for incremental backup to S3-compatible storage service

apiVersion: v1
kind: Secret                                                       
metadata:
  name: gcs-secret                             # Name of the Secret for accessing the GCS storage service.
type: Opaque
data:
  credentials: <GOOGLE_APPLICATION_CREDENTIALS_JSON>  # The JSON string representing the Google service account key.
---      
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaBackup
metadata:
  name: nb2024-incr                            # Name of the backup job.
spec:
  image: reg.vesoft-inc.com/cloud-dev/br-ent   # Image address of the NebulaGraph Backup&Restore tool.
  version: v3.7.1                      # Version of the NebulaGraph Backup&Restore tool.
  resources:                                   # Resource requests and limits required for the backup job.
    limits:
      cpu: "1"
      memory: 300Mi
    requests:
      cpu: 100m
      memory: 200Mi
  imagePullSecrets:                            # Name of the Secret for pulling the NebulaGraph Backup&Restore image from the private repository.
  - name: br_secret_name
  autoRemoveFinished: false                     # Whether to automatically delete the job after the backup job is completed or fails. The default value is false.
  cleanBackupData: false                        # Whether to delete the backup data in the cloud storage service when deleting the backup object. The default value is false.
  config:                                      # Configuration information of the backup job.
    concurrency: 5                             # The concurrency of the backup jobs. The default value is 5.
    clusterName: nebula                        # Name of the backup cluster.
    baseBackupName: BACKUP_2024_02_05_08_05_13 # Name of the previous backup.       
    gs:                                        # Configuration information of the GCS storage service.
      location: "us-central1"                  # Geographic region where the GCS bucket is located.
      bucket: "ng-2025"                        # Name of the GCS storage bucket for storing backup data.
      secretName: "gcs-secret"                 # Name of the Secret for accessing the GCS storage bucket.

apiVersion: v1
kind: Secret                                   
metadata:
  name: aws-s3-secret                          # Name of the Secret for accessing the S3-compatible storage service.
type: Opaque
data:                                          
  access_key: QVNJQVE0WFlxxx                   # AccessKey for accessing the S3-compatible storage service.
  secret_key: ZFJ6OEdNcDdxenMwVGxxx            # SecretKey for accessing the S3-compatible storage service.
---
apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaBackup
metadata:
  name: nb2024-incr                            # Name of the backup job.
spec:
  image: reg.vesoft-inc.com/cloud-dev/br-ent   # Image address of the NebulaGraph Backup&Restore tool.
  version: v3.7.1                      # Version of the NebulaGraph Backup&Restore tool.
  resources:                                   # Resource requests and limits required for the backup job.
    limits:
      cpu: "1"
      memory: 300Mi
    requests:
      cpu: 100m
      memory: 200Mi
  imagePullSecrets:                            # Name of the Secret needed to pull the NebulaGraph Backup&Restore image from the private repository.
  - name: br_secret_name
  autoRemoveFinished: false                     # Whether to automatically delete the job after the backup job is completed or failed. The default value is false.
  cleanBackupData: false                        # Whether to delete the backup data in the cloud storage service when deleting the backup object. The default value is false.
  config:                                      # Configuration information for the backup job.
    concurrency: 5                             # The concurrency of the backup jobs. The default value is 5.
    clusterName: nebula                        # Name of the backup cluster.
    baseBackupName: BACKUP_2024_02_05_08_05_13 # Name of the previous backup.
    s3:                                        # Configuration information for the S3-compatible storage service.
      region: "us-east-2"                      # Geographical region where the S3 bucket is located.
      bucket: "nebula-test"                    # Name of the S3 bucket where the backup data is stored.
      endpoint: "https://s3.us-east-2.amazonaws.com"  # Access address of the S3 bucket.
      secretName: "aws-s3-secret"              # Name of the Secret used to access the S3 bucket.

In addition to the parameters in the above YAML files, the following parameters can also be set:

Parameter	Default Value	Description
`spec.env`	`[]`	Configures environment variables for the backup job.
`spec.imagePullPolicy`	`Always`	The image pull policy for the backup job.
`spec.nodeSelector`	`{}`	The node selector for the backup job.
`spec.affinity`	`{}`	The affinity for the backup job.
`spec.tolerations`	`[]`	The tolerations for the backup job.
`spec.initContainers`	`[]`	The initialization containers for the backup job.
`spec.sidecarContainers`	`[]`	The sidecar containers for the backup job.
`spec.volumes`	`[]`	The volumes for the backup job.
`spec.volumeMounts`	`[]`	The volume mounts for the backup job.
`spec.config.clusterNamespace`	`default`	Namespace where the cluster to be backed up is located. Cross-namespace cluster backup is supported, which means the namespace of the cluster may be different from that of the `NebulaBackup` object.

Start the backup job.
```
kubectl apply -f backup_file_name.yaml
```
Upon initiating a backup job, the system automatically creates a backup instance nb, which in turn creates a Job object. This Job object is responsible for creating and managing a Pod object. The Pod object runs the NebulaGraph Backup&Restore tool to execute the backup job. If the spec.autoRemoveFinished configuration of the NebulaBackup resource object is set to true, the system automatically deletes the Job object upon completion of the backup job.

Check the status of the backup object.

View the status of the NebulaBackup object.

kubectl get nb

Output:

NAME          TYPE   BACKUP                       STATUS     STARTED   COMPLETED   AGE
nb2024-full   full   BACKUP_2024_02_05_08_05_13   Complete   71s       1s          71s
nb2024-incr   incr   BACKUP_2024_02_05_08_43_52   Complete   58s       8s          58s

When spec.autoRemoveFinished is set to false, you can execute the following commands to check the status of the Job and Pod objects.

View the status of the Job object.

kubectl get job

Output:

NAME                      COMPLETIONS   DURATION   AGE
backup-incr-nb202402v62   1/1           5s         5s

View the status of the Pod object.

kubectl get pod

Output:

NAME                                READY   STATUS    RESTARTS   AGE
backup-incr-nb202402v62-j558p       1/1     Running   0          8s

Last update: March 6, 2024