Skip to content

Enable zones

NebulaGraph Operator supports the use of zones. A zone is a logical rack used to group Storage Pods within a cluster. Zones help improve the resilience of the cluster by ensuring an even distribution of data replicas across each zone. This topic explains how to create a cluster with zones.

Prerequisites

A cluster is created using NebulaGraph Operator. For details on creating a cluster, see Create a NebulaGraph cluster.

Background

NebulaGraph efficiently manages its distributed architecture using the functionality provided by zones. The data in NebulaGraph is divided into different partitions, and replicas of these partitions are evenly distributed across all available zones. Queries can be directed preferentially to the Storage Pods within the same zone. Using zones significantly reduces the network traffic costs between zones and improves data transfer speed. For more detailed information about zones, see Managing zones.

Configure zones

To make full use of the Zone feature, you first need to determine the actual Zones in which the nodes of the cluster reside. Typically, nodes deployed on cloud platforms come with labels indicating their respective Zones.

Once you have this information, you can configure it by setting the spec.metad.config.zone_list parameter in the cluster's configuration file. This parameter is a comma-separated list of Zone names and should match the actual Zone names of the nodes. For example, if your nodes are actually in the az1, az2, and az3 regions, the configuration should look like this:

spec:
  metad:
    config:
      zone_list: az1, az2, az3

NebulaGraph Operator utilizes Kubernetes's TopoloySpread feature to manage the scheduling of Storage Pods and Graph Pods. Once zone_list is configured, Storage Pods will be automatically assigned to their respective Zones based on the topology.kubernetes.io/zone label.

Prioritize intra-zone data access

For intra-zone data access, the Graph service dynamically assigns itself to a zone using the --assigned_zone=$NODE_ZONE parameter. The Alpine Linux image specified in spec.alpineImage (default: reg.vesoft-inc.com/nebula-alpine:latest) plays a role in obtaining zone information. It identifies the zone name of the node where the Graph service resides by utilizing an init-container to fetch this information.

By setting spec.graphd.config.prioritize_intra_zone_reading to true in the cluster configuration file, you enable the Graph service to prioritize sending queries to Storage services within the same zone. In the event of a read failure within that zone, the behavior depends on the value of spec.graphd.config.stick_to_intra_zone_on_failure. If set to true, the Graph service avoids reading data from other zones and returns an error. Otherwise, it reads data from leader partition replicas in other zones.

spec:
  alpineImage: reg.vesoft-inc.com/xxx/xxx:latest
  graphd:
    config:
      prioritize_intra_zone_reading: "true"
      stick_to_intra_zone_on_failure: "false"

Required parameters

If you need to create a cluster with zones, you must add the following parameters to the cluster configuration file. For other fields and descriptions, see Create a NebulaGraph cluster.

spec:
  # Used to obtain the zone information of the node.
  alpineImage: "reg.example-inc.com/xxx/xxx:latest"
  graphd:
    image: reg.example-inc.com/xxx/xxx
    config:
      # Prioritize sending queries to storage nodes within the same zone.
      prioritize_intra_zone_reading: "true"
      stick_to_intra_zone_on_failure: "false" 
  metad:
    image: reg.example-inc.com/xxx/xxx
    config:
      # List of zone names, separated by commas. It is recommended to set an odd number.
      zone_list: az1,az2,az3 
    licenseManagerURL: "192.168.8.xxx:9119"
  storaged:
    image: reg.example-inc.com/xxx/xxx
  imagePullSecrets:
  - name: nebula-image
  # Used to schedule the restart of Graph/Storage Pods to the original zone.
  schedulerName: nebula-scheduler
  # Field used to control the distribution of Storage Pods.
  topologySpreadConstraints:
  - topologyKey: "topology.kubernetes.io/zone"
    whenUnsatisfiable: "DoNotSchedule"

Parameters in the above table are described as follows:

Parameter Default value Description
spec.metad.licenseManagerURL - Configure the URL that points to the LM, which consists of the access address and port number (default port 9119) of the LM. For example, 192.168.8.xxx:9119. You must configure this parameter in order to obtain the license information; otherwise, the enterprise edition cluster cannot be used.
spec.<graphd|metad|storaged>.image - The container image of the Graph, Meta, or Storage service of the enterprise edition.
spec.imagePullSecrets - Specifies the Secret for pulling the NebulaGraph Enterprise service images from a private repository.
spec.alpineImage - The Alpine Linux image, used to obtain the zone information where nodes are located.
spec.metad.config.zone_list - A list of zone names, split by comma. For example: zone1,zone2,zone3.
Zone names CANNOT be modified once be set.
spec.graphd.config.prioritize_intra_zone_reading false Specifies whether to prioritize sending queries to the storage pods in the same zone.
When set to true, the query is sent to the storage pods in the same zone. If reading fails in that zone, it will decide based on stick_to_intra_zone_on_failure whether to read the leader partition replica data from other zones.
spec.graphd.config.stick_to_intra_zone_on_failure false Specifies whether to stick to intra-zone routing if unable to find the requested partitions in the same zone. When set to true, if unable to find the partition replica in that zone, it does not read data from other zones.
spec.schedulerName kube-scheduler Schedules the restarted Graph and Storage pods to the same zone. The value must be set to nebula-scheduler.
spec.topologySpreadConstraints - It is a field in Kubernetes used to control the distribution of storage Pods. Its purpose is to ensure that your storage Pods are evenly spread across zones.
To use the zone feature, you must set the value of topologySpreadConstraints[0].topologyKey to topology.kubernetes.io/zone and the value of topologySpreadConstraints[0].whenUnsatisfiable to DoNotSchedule. Run kubectl get node --show-labels to check the key. For more information, see TopologySpread.

Warning

DO NOT manually modify the configmaps created by NebulaGraph Operator. Doing so may cause unexpected behavior.

Once Storage and Graph services are assigned to zones, the mapping between the pod and its corresponding zone is stored in a configmap named <cluster_name>-graphd|storaged-zone. This mapping facilitates pod scheduling during rolling updates and pod restarts, ensuring that services return to their original zones as needed.

Caution

Make sure storage Pods are evenly distributed across zones before ingesting data by running SHOW ZONES in nebula-console. For zone-related commands, see Zones.

Cluster configuration example with zones

Example of creating a cluster configuration with zones using kubectl

Here is an example of a YAML configuration for creating a cluster with Zone using kubectl:

apiVersion: apps.nebula-graph.io/v1alpha1
kind: NebulaCluster
metadata:
  name: nebula
  namespace: default
spec:
  # Alpine Linux image used to obtain zone information for node location.
  alpineImage: "reg.example-inc.com/xxx/xxx:latest"
  # Agent configuration for backup and restore.
  # If you don't customize this configuration, the default configuration is used.
  agent:
    image: reg.example-inc.com/xxx/xxx
    version: v3.5.0
  exporter:
    image: vesoft/nebula-stats-exporter
    replicas: 1
    maxRequests: 20
  # Console container for connecting to the cluster.
  console:
    version: "nightly"
  graphd:
    config:
      # The following parameters are required to create a cluster with Zone.
      accept_partial_success: "true"
      prioritize_intra_zone_reading: "true"
      sync_meta_when_use_space: "true"
      stick_to_intra_zone_on_failure: "false" 
      session_reclaim_interval_secs: "300"
      # The following parameters are required for log collection.
      logtostderr: "1"
      redirect_stdout: "false"
      stderrthreshold: "0" 
    resources:
      requests:
        cpu: "2"
        memory: "2Gi"
      limits:
        cpu: "2"
        memory: "2Gi"
    replicas: 1
    image: reg.example-inc.com/xxx/xxx
    version: v3.5.0-sc
  metad:
    config:
      redirect_stdout: "false"
      stderrthreshold: "0"
      logtostder: "true"
      # Once set, the zone name cannot be changed.
      # It is recommended to set an odd number of zones.
      zone_list: az1,az2,az3 
      validate_session_timestamp: "false"
    # LM access address and port number.
    licenseManagerURL: "192.168.8.xxx:9119"
    resources:
      requests:
        cpu: "300m"
        memory: "500Mi"
      limits:
        cpu: "1"
        memory: "1Gi"
    replicas: 3
    image: reg.example-inc.com/xxx/xxx
    version: v3.5.0-sc
    dataVolumeClaim:
      resources:
        requests:
          storage: 2Gi
      storageClassName: local-path
  storaged:
    config:
      redirect_stdout: "false"
      stderrthreshold: "0"
      logtostder: "true"
    resources:
      requests:
        cpu: "1"
        memory: "1Gi"
      limits:
        cpu: "2"
        memory: "2Gi"
    replicas: 3
    image: reg.example-inc.com/xxx/xxx
    version: v3.5.0-sc
    dataVolumeClaims:
    - resources:
        requests:
          storage: 2Gi
      storageClassName: local-path
    # Automatically balance storage data after scaling.
    enableAutoBalance: true
  reference:
    name: statefulsets.apps
    version: v1
  schedulerName: nebula-scheduler
  nodeSelector:
    nebula: cloud
  imagePullPolicy: Always
  imagePullSecrets:
  - name: nebula-image
  # Distribute storage Pods evenly among zones.
  # Must be set when using zones.
  topologySpreadConstraints:
  - topologyKey: "topology.kubernetes.io/zone"
    whenUnsatisfiable: "DoNotSchedule"

Example of creating a cluster with zones using helm

Here is an example of creating a cluster with Zone using helm:

helm install "<NEBULA_CLUSTER_NAME>" nebula-operator/nebula-cluster \
    # Specify the version of the cluster chart; if not specified, the latest version is installed by default.
    # Execute the helm search repo nebula-operator/nebula-cluster command to view all chart versions.
    --version=1.8.0 \
    # Specify the namespace of the cluster.
    --namespace="<NEBULA_CLUSTER_NAMESPACE>" \
    # Configure the Secret for pulling images from the private repository.
    --set imagePullSecrets[0].name="{<image-pull-secret>}" \
    --set nameOverride="<NEBULA_CLUSTER_NAME>" \
    # Configure the LM access address and port, default port is `9119`.
    --set nebula.metad.licenseManagerURL="192.168.8.XXX:9119" \
    # Configure the image addresses for various services in the cluster.
    --set nebula.graphd.image="<reg.example-inc.com/test/graphd-ent>" \
    --set nebula.metad.image="<reg.example-inc.com/test/metad-ent>" \
    --set nebula.storaged.image="<reg.example-inc.com/test/storaged-ent>" \
    --set nebula.storageClassName="<STORAGE_CLASS_NAME>" \
    # Specify the version of the Nebula cluster.
    --set nebula.version=v3.5.0 \
    # Configure Zone.
    # Once configured, the information of Zone cannot be modified. It is recommended to configure an odd number of Zones.
    --set nebula.metad.config.zone_list="<zone1,zone2,zone3>" \
    --set nebula.graphd.config.prioritize_intra_zone_reading="true" \
    --set nebula.graphd.config.stick_to_intra_zone_on_failure="false" \
    # Configure the Alpine Linux image for obtaining node Zone information.
    --set nebula.alpineImage="<reg.example-inc.com/xxx/xxx:latest>" \
    # Set the distribution of Storage Pods to different zones.
    --set nebula.topologySpreadConstraints[0].topologyKey="topology.kubernetes.io/zone" \
    --set nebula.topologySpreadConstraints[0].whenUnsatisfiable="DoNotSchedule" \
    # Schedule the restarting Graph/Storage Pods to the original zones.
    --set nebula.schedulerName="nebula-scheduler"

Last update: March 6, 2024