Job manager and the JOB statements¶

The long-term tasks run by the Storage Service are called jobs, such as COMPACT, FLUSH, and STATS. These jobs can be time-consuming if the data amount in the graph space is large. The job manager helps you run, show, stop, and recover jobs.

Note

All job management commands can be executed only after selecting a graph space.

SUBMIT JOB BALANCE DATA¶

Enterpriseonly

Only available for the NebulaGraph Enterprise Edition.

Caution

Before performing the job, it is recommended to create a snapshot.
During job execution, do not execute other jobs, such as SUBMIT JOB STATS, REBUILD INDEX, etc.
During job execution, it is recommended not to write or read data in large batches.

The SUBMIT JOB BALANCE DATA statement starts a job to balance the distribution of storage partitions in the current graph space. It returns the job ID.

For example:

nebula> SUBMIT JOB BALANCE DATA;
+------------+
| New Job Id |
+------------+
| 28         |
+------------+

SUBMIT JOB BALANCE DATA REMOVE¶

Enterpriseonly

Only available for the NebulaGraph Enterprise Edition.

Starts a job to balance the distribution of storage partitions in the current graph space. The default port is 9779. It returns the job ID.

For example:

nebula> SUBMIT JOB BALANCE DATA REMOVE 192.168.8.100:9779;
+------------+
| New Job Id |
+------------+
| 29         |
+------------+

SUBMIT JOB BALANCE LEADER¶

Starts a job to balance the distribution of all the storage leaders in all graph spaces. It returns the job ID.

For example:

nebula> SUBMIT JOB BALANCE LEADER;
+------------+
| New Job Id |
+------------+
| 33         |
+------------+

SUBMIT JOB COMPACT¶

The SUBMIT JOB COMPACT statement triggers the long-term RocksDB compact operation in the current graph space.

For more information about compact configuration, see Storage Service configuration.

For example:

nebula> SUBMIT JOB COMPACT;
+------------+
| New Job Id |
+------------+
| 40         |
+------------+

SUBMIT JOB FLUSH¶

The SUBMIT JOB FLUSH statement writes the RocksDB memfile in the memory to the hard disk in the current graph space.

For example:

nebula> SUBMIT JOB FLUSH;
+------------+
| New Job Id |
+------------+
| 96         |
+------------+

SUBMIT JOB STATS¶

The SUBMIT JOB STATS statement starts a job that makes the statistics of the current graph space. Once this job succeeds, you can use the SHOW STATS statement to list the statistics. For more information, see SHOW STATS.

Note

If the data stored in the graph space changes, in order to get the latest statistics, you have to run SUBMIT JOB STATS again.

For example:

nebula> SUBMIT JOB STATS;
+------------+
| New Job Id |
+------------+
| 9          |
+------------+

SUBMIT JOB DOWNLOAD/INGEST¶

The SUBMIT JOB DOWNLOAD HDFS and SUBMIT JOB INGEST commands are used to import the SST file into NebulaGraph. For detail, see Import data from SST files.

The SUBMIT JOB DOWNLOAD HDFS command will download the SST file on the specified HDFS.

The SUBMIT JOB INGEST command will import the downloaded SST file into NebulaGraph.

For example:

nebula> SUBMIT JOB DOWNLOAD HDFS "hdfs://192.168.10.100:9000/sst";
+------------+
| New Job Id |
+------------+
| 10         |
+------------+
nebula> SUBMIT JOB INGEST;
+------------+
| New Job Id |
+------------+
| 11         |
+------------+

SHOW JOB¶

The Meta Service parses a SUBMIT JOB request into multiple tasks and assigns them to the nebula-storaged processes. The SHOW JOB <job_id> statement shows the information about a specific job and all its tasks in the current graph space.

job_id is returned when you run the SUBMIT JOB statement.

For example:

nebula> SHOW JOB 9;
+----------------+-----------------+------------+----------------------------+----------------------------+-------------+
| Job Id(TaskId) | Command(Dest)   | Status     | Start Time                 | Stop Time                  | Error Code  |
+----------------+-----------------+------------+----------------------------+----------------------------+-------------+
| 8              | "STATS"         | "FINISHED" | 2022-10-18T08:14:45.000000 | 2022-10-18T08:14:45.000000 | "SUCCEEDED" |
| 0              | "192.168.8.129" | "FINISHED" | 2022-10-18T08:14:45.000000 | 2022-10-18T08:15:13.000000 | "SUCCEEDED" |
| "Total:1"      | "Succeeded:1"   | "Failed:0" | "In Progress:0"            | ""                         | ""          |
+----------------+-----------------+------------+----------------------------+----------------------------+-------------+

The descriptions are as follows.

Parameter	Description
`Job Id(TaskId)`	The first row shows the job ID and the other rows show the task IDs and the last row shows the total number of job-related tasks.
`Command(Dest)`	The first row shows the command executed and the other rows show on which storaged processes the task is running. The last row shows the number of successful tasks related to the job.
`Status`	Shows the status of the job or task. The last row shows the number of failed tasks related to the job. For more information, see Job status.
`Start Time`	Shows a timestamp indicating the time when the job or task enters the `RUNNING` phase. The last row shows the number of ongoing tasks related to the job.
`Stop Time`	Shows a timestamp indicating the time when the job or task gets `FINISHED`, `FAILED`, or `STOPPED`.
`Error Code`	The error code of job.

Job status¶

The descriptions are as follows.

Status	Description
QUEUE	The job or task is waiting in a queue. The `Start Time` is empty in this phase.
RUNNING	The job or task is running. The `Start Time` shows the beginning time of this phase.
FINISHED	The job or task is successfully finished. The `Stop Time` shows the time when the job or task enters this phase.
FAILED	The job or task has failed. The `Stop Time` shows the time when the job or task enters this phase.
STOPPED	The job or task is stopped without running. The `Stop Time` shows the time when the job or task enters this phase.
REMOVED	The job or task is removed.

The description of switching the status is described as follows.

Queue -- running -- finished -- removed
     \          \                /
      \          \ -- failed -- /
       \          \            /
        \ ---------- stopped -/

SHOW JOBS¶

The SHOW JOBS statement lists all the unexpired jobs in the current graph space.

The default job expiration interval is one week. You can change it by modifying the job_expired_secs parameter of the Meta Service. For how to modify job_expired_secs, see Meta Service configuration.

For example:

nebula> SHOW JOBS;
+--------+---------------------+------------+----------------------------+----------------------------+
| Job Id | Command             | Status     | Start Time                 | Stop Time                  |
+--------+---------------------+------------+----------------------------+----------------------------+
| 34     | "STATS"             | "FINISHED" | 2021-11-01T03:32:27.000000 | 2021-11-01T03:32:27.000000 |
| 33     | "FLUSH"             | "FINISHED" | 2021-11-01T03:32:15.000000 | 2021-11-01T03:32:15.000000 |
| 32     | "COMPACT"           | "FINISHED" | 2021-11-01T03:32:06.000000 | 2021-11-01T03:32:06.000000 |
| 31     | "REBUILD_TAG_INDEX" | "FINISHED" | 2021-10-29T05:39:16.000000 | 2021-10-29T05:39:17.000000 |
| 10     | "COMPACT"           | "FINISHED" | 2021-10-26T02:27:05.000000 | 2021-10-26T02:27:05.000000 |
+--------+---------------------+------------+----------------------------+----------------------------+

STOP JOB¶

The STOP JOB <job_id> statement stops jobs that are not finished in the current graph space.

For example:

nebula> STOP JOB 22;
+---------------+
| Result        |
+---------------+
| "Job stopped" |
+---------------+

RECOVER JOB¶

The RECOVER JOB [<job_id>] statement re-executes the jobs that status is FAILED or STOPPED in the current graph space and returns the number of recovered jobs. If <job_id> is not specified, re-execution is performed from the earliest job and the number of jobs that have been recovered is returned.

For example:

nebula> RECOVER JOB;
+-------------------+
| Recovered job num |
+-------------------+
| 5 job recovered   |
+-------------------+

FAQ¶

How to troubleshoot job problems?¶

The SUBMIT JOB operations use the HTTP port. Please check if the HTTP ports on the machines where the Storage Service is running are working well. You can use the following command to debug.

curl "http://{storaged-ip}:19779/admin?space={space_name}&op=compact"

Last update: June 1, 2023