Storage load balance¶
You can use the BALANCE statement to balance the distribution of partitions and Raft leaders, or remove redundant Storage servers.
Balance partition distribution¶
BALANCE DATA starts a task to equally distribute the storage partitions in a NebulaGraph cluster. A group of subtasks will be created and implemented to migrate data and balance the partition distribution.
Danger
DO NOT stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail.
Examples¶
After you add new storage hosts into the cluster, no partition is deployed on the new hosts.
-
Run
SHOW HOSTSto check the partition distribution.nebual> SHOW HOSTS; +-------------+------+----------+--------------+-----------------------------------+------------------------+ | Host | Port | Status | Leader count | Leader distribution | Partition distribution | +-------------+------+----------+--------------+-----------------------------------+------------------------+ | "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:15" | | "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:15" | | "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:15" | | "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | | "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | | "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | +-------------+------+----------+--------------+-----------------------------------+------------------------+ -
Run
BALANCE DATAto start balancing the storage partitions. If the partitions are already balanced,BALANCE DATAfails.nebula> BALANCE DATA; +------------+ | ID | +------------+ | 1614237867 | +------------+ -
A BALANCE task ID is returned after running
BALANCE DATA. RunBALANCE DATA <balance_id>to check the status of theBALANCEtask.nebula> BALANCE DATA 1614237867; +--------------------------------------------------------------+-------------------+ | balanceId, spaceId:partId, src->dst | status | +--------------------------------------------------------------+-------------------+ | "[1614237867, 11:1, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | | "[1614237867, 11:1, storaged2:9779->storaged4:9779]" | "SUCCEEDED" | | "[1614237867, 11:2, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | ... | "Total:22, Succeeded:22, Failed:0, In Progress:0, Invalid:0" | 100 | +--------------------------------------------------------------+-------------------+ -
When all the subtasks succeed, the load balancing process finishes. Run
SHOW HOSTSagain to make sure the partition distribution is balanced.Note
BALANCE DATAdoes not balance the leader distribution. For more information, see Balance leader distribution.nebula> SHOW HOSTS; +-------------+------+----------+--------------+-----------------------------------+------------------------+ | Host | Port | Status | Leader count | Leader distribution | Partition distribution | +-------------+------+----------+--------------+-----------------------------------+------------------------+ | "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:9" | | "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:9" | | "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" | | "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | | "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | | "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | +-------------+------+----------+--------------+-----------------------------------+------------------------+
If any subtask fails, run BALANCE DATA again to restart the balancing. If redoing load balancing does not solve the problem, ask for help in the NebulaGraph community.
Stop data balancing¶
To stop a balance task, run BALANCE DATA STOP.
- If no balance task is running, an error is returned.
- If a balance task is running, the task ID (
balance_id) is returned.
BALANCE DATA STOP does not stop the running subtasks but cancels all follow-up subtasks. To check the status of the stopped balance task, run BALANCE DATA <balance_id>.
Once all the subtasks are finished or stopped, you can run BALANCE DATA again to balance the partitions again.
- If any subtask of the preceding balance task fails, NebulaGraph restarts the preceding balance task.
- If no subtask of the preceding balance task fails, NebulaGraph starts a new balance task.
RESET a balance task¶
If a balance task fails to be restarted after being stopped, run BALANCE DATA RESET PLAN to reset the task. After that, run BALANCE DATA again to start a new balance task.
Remove storage servers¶
To remove specified storage servers and scale in the Storage Service, run BALANCE DATA REMOVE <host_list>.
Example¶
To remove the following storage server,
| Server name | IP address | Port |
|---|---|---|
| storage3 | 192.168.0.8 | 9779 |
| storage4 | 192.168.0.9 | 9779 |
Run the following command:
BALANCE DATA REMOVE 192.168.0.8:9779,192.168.0.9:9779;
NebulaGraph will start a balance task, migrate the storage partitions in storage3 and storage4, and then remove them from the cluster.
Note
The state of the removed server will change to OFFLINE. This record will be deleted after one day. To retain it, you can change the meta configuration removed_threshold_sec.
Balance leader distribution¶
BALANCE DATA only balances the partition distribution. If the raft leader distribution is not balanced, some of the leaders may overload. To balance the raft leaders, run BALANCE LEADER.
Example¶
nebula> BALANCE LEADER;
Run SHOW HOSTS to check the balance result.
nebula> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged1" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged3" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged4" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
Caution
In NebulaGraph 2.6.2, switching leaders will cause a large number of short-term request errors (Storage Error E_RPC_FAILURE). For solutions, see FAQ.