Storage load balance¶

You can use the SUBMIT JOB BALANCE statement to balance the distribution of partitions and Raft leaders, or clear some Storage servers for easy maintenance. For details, see SUBMIT JOB BALANCE.

Danger

The BALANCE commands migrate data and balance the distribution of partitions by creating and executing a set of subtasks. DO NOT stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail.

Balance leader distribution¶

To balance the raft leaders, run SUBMIT JOB BALANCE LEADER. It will start a job to balance the distribution of all the storage leaders in all graph spaces.

Example¶

nebula> SUBMIT JOB BALANCE LEADER;

Run SHOW HOSTS to check the balance result.

nebula> SHOW HOSTS;
+------------------+------+----------+--------------+-----------------------------------+------------------------+----------------------+
| Host             | Port | Status   | Leader count | Leader distribution               | Partition distribution | Version              |
+------------------+------+----------+--------------+-----------------------------------+------------------------+----------------------+
| "192.168.10.101" | 9779 | "ONLINE" | 8            | "basketballplayer:3"              | "basketballplayer:8"   | "master" |
| "192.168.10.102" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:8"   | "master" |
| "192.168.10.103" | 9779 | "ONLINE" | 0            | "basketballplayer:2"              | "basketballplayer:7"   | "master" |
| "192.168.10.104" | 9779 | "ONLINE" | 0            | "basketballplayer:2"              | "basketballplayer:7"   | "master" |
| "192.168.10.105" | 9779 | "ONLINE" | 0            | "basketballplayer:2"              | "basketballplayer:7"   | "master" |
+------------------+------+----------+--------------+-----------------------------------+------------------------+----------------------+

Caution

During leader partition replica switching in NebulaGraph, the leader replicas will be temporarily prohibited from being written to until the switch is completed. If there are a large number of write requests during the switching period, it will result in a request error (Storage Error E_RPC_FAILURE). See FAQ for error handling methods.

You can set the value of raft_heartbeat_interval_secs in the Storage configuration file to control the timeout period for leader replica switching. For more information on the configuration file, see Storage configuration file.

Last update: October 24, 2023