Storage load balance¶
You can use the BALANCE
statement to balance the distribution of partitions and Raft leaders, or remove redundant Storage servers.
Balance partition distribution¶
BALANCE DATA
starts a task to equally distribute the storage partitions in a NebulaGraph cluster. A group of subtasks will be created and implemented to migrate data and balance the partition distribution.
Danger
DO NOT stop any machine in the cluster or change its IP address until all the subtasks finish. Otherwise, the follow-up subtasks fail.
Examples¶
After you add new storage hosts into the cluster, no partition is deployed on the new hosts.
-
Run
SHOW HOSTS
to check the partition distribution.nebual> SHOW HOSTS; +-------------+------+----------+--------------+-----------------------------------+------------------------+ | Host | Port | Status | Leader count | Leader distribution | Partition distribution | +-------------+------+----------+--------------+-----------------------------------+------------------------+ | "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:15" | | "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:15" | | "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:15" | | "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | | "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "No valid partition" | | "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | +-------------+------+----------+--------------+-----------------------------------+------------------------+
-
Run
BALANCE DATA
to start balancing the storage partitions. If the partitions are already balanced,BALANCE DATA
fails.nebula> BALANCE DATA; +------------+ | ID | +------------+ | 1614237867 | +------------+
-
A BALANCE task ID is returned after running
BALANCE DATA
. RunBALANCE DATA <balance_id>
to check the status of theBALANCE
task.nebula> BALANCE DATA 1614237867; +--------------------------------------------------------------+-------------------+ | balanceId, spaceId:partId, src->dst | status | +--------------------------------------------------------------+-------------------+ | "[1614237867, 11:1, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | | "[1614237867, 11:1, storaged2:9779->storaged4:9779]" | "SUCCEEDED" | | "[1614237867, 11:2, storaged1:9779->storaged3:9779]" | "SUCCEEDED" | ... | "Total:22, Succeeded:22, Failed:0, In Progress:0, Invalid:0" | 100 | +--------------------------------------------------------------+-------------------+
-
When all the subtasks succeed, the load balancing process finishes. Run
SHOW HOSTS
again to make sure the partition distribution is balanced.Note
BALANCE DATA
does not balance the leader distribution. For more information, see Balance leader distribution.nebula> SHOW HOSTS; +-------------+------+----------+--------------+-----------------------------------+------------------------+ | Host | Port | Status | Leader count | Leader distribution | Partition distribution | +-------------+------+----------+--------------+-----------------------------------+------------------------+ | "storaged0" | 9779 | "ONLINE" | 4 | "basketballplayer:4" | "basketballplayer:9" | | "storaged1" | 9779 | "ONLINE" | 8 | "basketballplayer:8" | "basketballplayer:9" | | "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" | | "storaged3" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | | "storaged4" | 9779 | "ONLINE" | 0 | "No valid partition" | "basketballplayer:9" | | "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" | +-------------+------+----------+--------------+-----------------------------------+------------------------+
If any subtask fails, run BALANCE DATA
again to restart the balancing. If redoing load balancing does not solve the problem, ask for help in the NebulaGraph community.
Stop data balancing¶
To stop a balance task, run BALANCE DATA STOP
.
- If no balance task is running, an error is returned.
- If a balance task is running, the task ID (
balance_id
) is returned.
BALANCE DATA STOP
does not stop the running subtasks but cancels all follow-up subtasks. To check the status of the stopped balance task, run BALANCE DATA <balance_id>
.
Once all the subtasks are finished or stopped, you can run BALANCE DATA
again to balance the partitions again.
- If any subtask of the preceding balance task fails, NebulaGraph restarts the preceding balance task.
- If no subtask of the preceding balance task fails, NebulaGraph starts a new balance task.
RESET a balance task¶
If a balance task fails to be restarted after being stopped, run BALANCE DATA RESET PLAN
to reset the task. After that, run BALANCE DATA
again to start a new balance task.
Remove storage servers¶
To remove specified storage servers and scale in the Storage Service, run BALANCE DATA REMOVE <host_list>
.
Example¶
To remove the following storage server,
Server name | IP address | Port |
---|---|---|
storage3 | 192.168.0.8 | 9779 |
storage4 | 192.168.0.9 | 9779 |
Run the following command:
BALANCE DATA REMOVE 192.168.0.8:9779,192.168.0.9:9779;
NebulaGraph will start a balance task, migrate the storage partitions in storage3 and storage4, and then remove them from the cluster.
Note
The state of the removed server will change to OFFLINE
. This record will be deleted after one day. To retain it, you can change the meta configuration removed_threshold_sec
.
Balance leader distribution¶
BALANCE DATA
only balances the partition distribution. If the raft leader distribution is not balanced, some of the leaders may overload. To balance the raft leaders, run BALANCE LEADER
.
Example¶
nebula> BALANCE LEADER;
Run SHOW HOSTS
to check the balance result.
nebula> SHOW HOSTS;
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| Host | Port | Status | Leader count | Leader distribution | Partition distribution |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
| "storaged0" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged1" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged2" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged3" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "storaged4" | 9779 | "ONLINE" | 3 | "basketballplayer:3" | "basketballplayer:9" |
| "Total" | | | 15 | "basketballplayer:15" | "basketballplayer:45" |
+-------------+------+----------+--------------+-----------------------------------+------------------------+
Caution
In NebulaGraph 2.6.2, switching leaders will cause a large number of short-term request errors (Storage Error E_RPC_FAILURE
). For solutions, see FAQ.