Storage load balance¶
You can use the BALANCE
statements to balance the distribution of partitions and Raft leaders, or remove redundant Storage servers.
Prerequisites¶
The graph spaces stored in Nebula Graph must have more than one replicas for the system to balance the distribution of partitions and Raft leaders.
Balance partition distribution¶
BALANCE DATA
starts a task to equally distribute the storage partitions in a Nebula Graph cluster. A group of subtasks will be created and implemented to migrate data and balance the partition distribution.
Danger
DON'T stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail.
Take scaling out Nebula Graph for an example.
After you add new storage hosts into the cluster, no partition is deployed on the new hosts. You can run SHOW HOSTS
to check the partition distribution.
nebual> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:15" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:15" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:15" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
Run BALANCE DATA
to start balancing the storage partitions. If the partitions are already balanced, BALANCE DATA
fails.
nebula> BALANCE DATA;
+------------+
| ID |
+------------+
| 1614237867 |
+------------+
A BALANCE task ID is returned after running BALANCE DATA
. Run BALANCE DATA <balance_id>
to check the status of the BALANCE
task.
nebula> BALANCE DATA 1614237867;
+--------------------------------------------------------------+-------------------+
| balanceId, spaceId:partId, src->dst | status |
+--------------------------------------------------------------+-------------------+
| "[1614237867, 11:1, storaged1:9779->storaged3:9779]" | "SUCCEEDED" |
+--------------------------------------------------------------+-------------------+
| "[1614237867, 11:1, storaged2:9779->storaged4:9779]" | "SUCCEEDED" |
+--------------------------------------------------------------+-------------------+
| "[1614237867, 11:2, storaged1:9779->storaged3:9779]" | "SUCCEEDED" |
+--------------------------------------------------------------+-------------------+
...
+--------------------------------------------------------------+-------------------+
| "Total:22, Succeeded:22, Failed:0, In Progress:0, Invalid:0" | 100 |
+--------------------------------------------------------------+-------------------+
When all the subtasks succeed, the load balancing process finishes. Run SHOW HOSTS
again to make sure the partition distribution is balanced.
Note
BALANCE DATA
does not balance the leader distribution.
nebula> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
If any subtask fails, run BALANCE DATA
again to restart the balancing. If redoing load balancing does not solve the problem, ask for help in the Nebula Graph community.
Stop data balancing¶
To stop a balance task, run BALANCE DATA STOP
.
- If no balance task is running, an error is returned.
- If a balance task is running, the task ID is returned.
BALANCE DATA STOP
does not stop the running subtasks but cancels all follow-up subtasks. The running subtasks continue.
To check the status of the stopped balance task, run BALANCE DATA <balance_id>
.
Once all the subtasks are finished or stopped, you can run BALANCE DATA
again to balance the partitions again.
- If any subtask of the preceding balance task failed, Nebula Graph restarts the preceding balance task.
- If no subtask of the preceding balance task failed, Nebula Graph starts a new balance task.
RESET a balance task¶
If a balance task fails to be restarted after being stopped, run BALANCE DATA RESET PLAN
to reset the task. After that, run BALANCE DATA
again to start a new balance task.
Remove storage servers¶
To remove specific storage servers and scale in the Storage Service, use the BALANCE DATA REMOVE <host_list>
syntax.
For example, to remove the following storage servers:
Server name | IP | Port |
---|---|---|
storage3 | 192.168.0.8 | 19779 |
storage4 | 192.168.0.9 | 19779 |
Run the following statement:
BALANCE DATA REMOVE 192.168.0.8:19779,192.168.0.9:19779;
Nebula Graph will start a balance task, migrate the storage partitions in storage3 and storage4, and then remove them from the cluster.
Note
The removed server's state will be changed to OFFLINE
by default after 24 hours, which is configurable via removed_threshold_sec
in the nebula-metad.conf
file. Unit: second.
Balance leader distribution¶
BALANCE DATA
only balances the partition distribution. If the raft leader distribution is not balanced, some of the leaders may overload. To load balance the raft leaders, run BALANCE LEADER
.
nebula> BALANCE LEADER;
Run SHOW HOSTS
to check the balance result.
nebula> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged1" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged3" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged4" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+