Skip to content

Storage load balance

You can use the BALANCE statements to balance the distribution of partitions and Raft leaders, or remove redundant Storage servers.

Prerequisites

The graph spaces stored in Nebula Graph must have more than one replicas for the system to balance the distribution of partitions and Raft leaders.

Balance partition distribution

BALANCE DATA starts a task to equally distribute the storage partitions in a Nebula Graph cluster. A group of subtasks will be created and implemented to migrate data and balance the partition distribution.

Danger

DON'T stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail.

Take scaling out Nebula Graph for an example.

After you add new storage hosts into the cluster, no partition is deployed on the new hosts. You can run SHOW HOSTS to check the partition distribution.

nebual> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host        | Port | Status   | Leader count | Leader distribution               | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 4            | "basketballplayer:4"              | "basketballplayer:15"  |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged1" | 9779 | "ONLINE" | 8            | "basketballplayer:8"              | "basketballplayer:15"  |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged2" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:15"  |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged3" | 9779 | "ONLINE" | 0            | "No valid partition"              | "No valid partition"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged4" | 9779 | "ONLINE" | 0            | "No valid partition"              | "No valid partition"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "Total"     |      |          | 15           | "basketballplayer:15"             | "basketballplayer:45"  |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
Got 6 rows (time spent 1002/1780 us)

Run BALANCE DATA to start balancing the storage partitions. If the partitions are already balanced, BALANCE DATA fails.

nebula> BALANCE DATA;
+------------+
| ID         |
+------------+
| 1614237867 |
+------------+
Got 1 rows (time spent 3783/4533 us)

A BALANCE task ID is returned after running BALANCE DATA. Run BALANCE DATA <balance_id> to check the status of the BALANCE task.

nebula> BALANCE DATA 1614237867;
+--------------------------------------------------------------+-------------------+
| balanceId, spaceId:partId, src->dst                          | status            |
+--------------------------------------------------------------+-------------------+
| "[1614237867, 11:1, storaged1:9779->storaged3:9779]"         | "SUCCEEDED"       |
+--------------------------------------------------------------+-------------------+
| "[1614237867, 11:1, storaged2:9779->storaged4:9779]"         | "SUCCEEDED"       |
+--------------------------------------------------------------+-------------------+
| "[1614237867, 11:2, storaged1:9779->storaged3:9779]"         | "SUCCEEDED"       |
+--------------------------------------------------------------+-------------------+
...
+--------------------------------------------------------------+-------------------+
| "Total:22, Succeeded:22, Failed:0, In Progress:0, Invalid:0" | 100               |
+--------------------------------------------------------------+-------------------+
Got 23 rows (time spent 916/1528 us)

When all the subtasks succeed, the load balancing process finishes. Run SHOW HOSTS again to make sure the partition distribution is balanced.

Note

BALANCE DATA does not balance the leader distribution.

nebula> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host        | Port | Status   | Leader count | Leader distribution               | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 4            | "basketballplayer:4"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged1" | 9779 | "ONLINE" | 8            | "basketballplayer:8"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged2" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged3" | 9779 | "ONLINE" | 0            | "No valid partition"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged4" | 9779 | "ONLINE" | 0            | "No valid partition"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "Total"     |      |          | 15           | "basketballplayer:15"             | "basketballplayer:45"  |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
Got 6 rows (time spent 849/1420 us)

If any subtask fails, run BALANCE DATA again to restart the balancing. If redoing load balancing does not solve the problem, ask for help in the Nebula Graph community.

Stop data balancing

To stop a balance task, run BALANCE DATA STOP.

  • If no balance task is running, an error is returned.
  • If a balance task is running, the task ID is returned.

BALANCE DATA STOP does not stop the running subtasks but cancels all follow-up subtasks. The running subtasks continue.

To check the status of the stopped balance task, run BALANCE DATA <balance_id>.

Once all the subtasks are finished or stopped, you can run BALANCE DATA again to balance the partitions again.

  • If any subtask of the preceding balance task failed, Nebula Graph restarts the preceding balance task.
  • If no subtask of the preceding balance task failed, Nebula Graph starts a new balance task.

Remove storage servers

To remove specific storage servers and scale in the Storage Service, use the BALANCE DATA REMOVE <host_list> syntax.

For example, to remove the following storage servers:

Server name IP Port
storage3 192.168.0.8 19779
storage4 192.168.0.9 19779

Run the following statement:

BALANCE DATA REMOVE 192.168.0.8:19779,192.168.0.9:19779;

Nebula Graph will start a balance task, migrate the storage partitions in storage3 and storage4, and then remove them from the cluster.

Note

The removed server's state will change to OFFLINE.

Balance leader distribution

BALANCE DATA only balances the partition distribution. If the raft leader distribution is not balanced, some of the leaders may overload. To load balance the raft leaders, run BALANCE LEADER.

nebula> BALANCE LEADER;
Execution succeeded (time spent 7576/8657 us)

Run SHOW HOSTS to check the balance result.

nebula> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host        | Port | Status   | Leader count | Leader distribution               | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged1" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged2" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged3" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged4" | 9779 | "ONLINE" | 3            | "basketballplayer:3"              | "basketballplayer:9"   |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "Total"     |      |          | 15           | "basketballplayer:15"             | "basketballplayer:45"  |
+-------------+------+----------+--------------+-----------------------------------+------------------------+

Last update: May 10, 2021
Back to top