Query NebulaGraph metrics¶

NebulaGraph supports querying the monitoring metrics through HTTP ports.

Metrics structure¶

Each metric of NebulaGraph consists of three fields: name, type, and time range. The fields are separated by periods, for example, num_queries.sum.600. Different NebulaGraph services (Graph, Storage, or Meta) support different metrics. The detailed description is as follows.

Field	Example	Description
Metric name	`num_queries`	Indicates the function of the metric.
Metric type	`sum`	Indicates how the metrics are collected. Supported types are SUM, AVG, RATE, and the P-th sample quantiles such as P75, P95, P99, and P99.9.
Time range	`600`	The time range in seconds for the metric collection. Supported values are 5, 60, 600, and 3600, representing the last 5 seconds, 1 minute, 10 minutes, and 1 hour.

Space-level metrics¶

The Graph service supports a set of space-level metrics that record the information of different graph spaces separately.

To enable space-level metrics, set the value of enable_space_level_metrics to true in the Graph service configuration file before starting NebulaGraph. For details about how to modify the configuration, see Configuration Management.

Note

Space-level metrics can be queried only by querying all metrics. For example, run curl -G "http://192.168.8.40:19559/stats" to show all metrics. The returned result contains the graph space name in the form of '{space=space_name}', such as num_active_queries{space=basketballplayer}.sum.5=0.

Query metrics over HTTP¶

Syntax¶

curl -G "http://<ip>:<port>/stats?stats=<metric_name_list> [&format=json]"

Parameter	Description
`ip`	The IP address of the server. You can find it in the configuration file in the installation directory.
`port`	The HTTP port of the server. You can find it in the configuration file in the installation directory. The default ports are 19559 (Meta), 19669 (Graph), and 19779 (Storage).
`metric_name_list`	The metrics names. Multiple metrics are separated by commas (,).
`&format=json`	Optional. Returns the result in the JSON format.

Note

If NebulaGraph is deployed with Docker Compose, run docker-compose ps to check the ports that are mapped from the service ports inside of the container and then query through them.

Examples¶

Query a single metric

Query the query number in the last 10 minutes in the Graph Service.

$ curl -G "http://192.168.8.40:19669/stats?stats=num_queries.sum.600"
num_queries.sum.600=400

Query multiple metrics

Query the following metrics together:
- The average heartbeat latency in the last 1 minute.
- The average latency of the slowest 1% heartbeats, i.e., the P99 heartbeats, in the last 10 minutes.
```
$ curl -G "http://192.168.8.40:19559/stats?stats=heartbeat_latency_us.avg.60,heartbeat_latency_us.p99.600"
heartbeat_latency_us.avg.60=281
heartbeat_latency_us.p99.600=985
```

Return a JSON result.

Query the number of new vertices in the Storage Service in the last 10 minutes and return the result in the JSON format.
```
$ curl -G "http://192.168.8.40:19779/stats?stats=num_add_vertices.sum.600&format=json"
[{"value":1,"name":"num_add_vertices.sum.600"}]
```

Query all metrics in a service.

If no metric is specified in the query, NebulaGraph returns all metrics in the service.

$ curl -G "http://192.168.8.40:19559/stats"
heartbeat_latency_us.avg.5=304
heartbeat_latency_us.avg.60=308
heartbeat_latency_us.avg.600=299
heartbeat_latency_us.avg.3600=285
heartbeat_latency_us.p75.5=652
heartbeat_latency_us.p75.60=669
heartbeat_latency_us.p75.600=651
heartbeat_latency_us.p75.3600=642
heartbeat_latency_us.p95.5=930
heartbeat_latency_us.p95.60=963
heartbeat_latency_us.p95.600=933
heartbeat_latency_us.p95.3600=929
heartbeat_latency_us.p99.5=986
heartbeat_latency_us.p99.60=1409
heartbeat_latency_us.p99.600=989
heartbeat_latency_us.p99.3600=986
num_heartbeats.rate.5=0
num_heartbeats.rate.60=0
num_heartbeats.rate.600=0
num_heartbeats.rate.3600=0
num_heartbeats.sum.5=2
num_heartbeats.sum.60=40
num_heartbeats.sum.600=394
num_heartbeats.sum.3600=2364
...

Metric description¶

Graph¶

Parameter	Description
`num_active_queries`	The number of queries currently being executed.
`num_active_sessions`	The number of currently active sessions.
`num_aggregate_executors`	The number of executions for the Aggregation operator.
`num_auth_failed_sessions_bad_username_password`	The number of sessions where authentication failed due to incorrect username and password.
`num_auth_failed_sessions_out_of_max_allowed`	The number of sessions that failed to authenticate logins because the value of the parameter `FLAG_OUT_OF_MAX_ALLOWED_CONNECTIONS` was exceeded.
`num_auth_failed_sessions`	The number of sessions in which login authentication failed.
`num_indexscan_executors`	The number of executions for index scan operators.
`num_killed_queries`	The number of killed queries.
`num_opened_sessions`	The number of sessions connected to the server.
`num_queries`	The number of queries.
`num_query_errors_leader_changes`	The number of the raft leader changes due to query errors.
`num_query_errors`	The number of query errors.
`num_reclaimed_expired_sessions`	The number of expired sessions actively reclaimed by the server.
`num_rpc_sent_to_metad_failed`	The number of failed RPC requests that the Graphd service sent to the Metad service.
`num_rpc_sent_to_metad`	The number of RPC requests that the Graphd service sent to the Metad service.
`num_rpc_sent_to_storaged_failed`	The number of failed RPC requests that the Graphd service sent to the Storaged service.
`num_rpc_sent_to_storaged`	The number of RPC requests that the Graphd service sent to the Storaged service.
`num_sentences`	The number of statements received by the Graphd service.
`num_slow_queries`	The number of slow queries.
`num_sort_executors`	The number of executions for the Sort operator.
`optimizer_latency_us`	The latency of executing optimizer statements.
`query_latency_us`	The average latency of queries.
`slow_query_latency_us`	The average latency of slow queries.
`num_queries_hit_memory_watermark`	The number of queries reached the memory watermark.

Meta¶

Parameter	Description
`commit_log_latency_us`	The latency of committing logs in Raft.
`commit_snapshot_latency_us`	The latency of committing snapshots in Raft.
`heartbeat_latency_us`	The latency of heartbeats.
`num_heartbeats`	The number of heartbeats.
`num_raft_votes`	The number of votes in Raft.
`transfer_leader_latency_us`	The latency of transferring the raft leader.
`num_agent_heartbeats`	The number of heartbeats for the AgentHBProcessor.
`agent_heartbeat_latency_us`	The average latency of the AgentHBProcessor.
`replicate_log_latency_us`	The latency of replicating the log record to most nodes by Raft.
`num_send_snapshot`	The number of times that Raft sends snapshots to other nodes.
`append_log_latency_us`	The latency of replicating the log record to a single node by Raft.
`append_wal_latency_us`	The Raft write latency for a single WAL.
`num_grant_votes`	The number of times that Raft votes for other nodes.
`num_start_elect`	The number of times that Raft starts an election.

Storage¶

Parameter	Description
`add_edges_atomic_latency_us`	The average latency of adding edge single.
`add_edges_latency_us`	The average latency of adding edges.
`add_vertices_latency_us`	The average latency of adding vertices.
`commit_log_latency_us`	The latency of committing logs in Raft.
`commit_snapshot_latency_us`	The latency of committing snapshots in Raft.
`delete_edges_latency_us`	The average latency of deleting edges.
`delete_vertices_latency_us`	The average latency of deleting vertices.
`get_neighbors_latency_us`	The average latency of querying neighbor vertices.
`num_get_prop`	The number of executions for the GetPropProcessor.
`num_get_neighbors_errors`	The number of execution errors for the GetNeighborsProcessor.
`get_prop_latency_us`	The average latency of executions for the GetPropProcessor.
`num_edges_deleted`	The number of deleted edges.
`num_edges_inserted`	The number of inserted edges.
`num_raft_votes`	The number of votes in Raft.
`num_rpc_sent_to_metad_failed`	The number of failed RPC requests that the Storage service sent to the Meta service.
`num_rpc_sent_to_metad`	The number of RPC requests that the Storaged service sent to the Metad service.
`num_tags_deleted`	The number of deleted tags.
`num_vertices_deleted`	The number of deleted vertices.
`num_vertices_inserted`	The number of inserted vertices.
`transfer_leader_latency_us`	The latency of transferring the raft leader.
`lookup_latency_us`	The average latency of executions for the LookupProcessor.
`num_lookup_errors`	The number of execution errors for the LookupProcessor.
`num_scan_vertex`	The number of executions for the ScanVertexProcessor.
`num_scan_vertex_errors`	The number of execution errors for the ScanVertexProcessor.
`update_edge_latency_us`	The average latency of executions for the UpdateEdgeProcessor.
`num_update_vertex`	The number of executions for the UpdateVertexProcessor.
`num_update_vertex_errors`	The number of execution errors for the UpdateVertexProcessor.
`kv_get_latency_us`	The average latency of executions for the Getprocessor.
`kv_put_latency_us`	The average latency of executions for the PutProcessor.
`kv_remove_latency_us`	The average latency of executions for the RemoveProcessor.
`num_kv_get_errors`	The number of execution errors for the GetProcessor.
`num_kv_get`	The number of executions for the GetProcessor.
`num_kv_put_errors`	The number of execution errors for the PutProcessor.
`num_kv_put`	The number of executions for the PutProcessor.
`num_kv_remove_errors`	The number of execution errors for the RemoveProcessor.
`num_kv_remove`	The number of executions for the RemoveProcessor.
`forward_tranx_latency_us`	The average latency of transmission.
`scan_edge_latency_us`	The average latency of executions for the ScanEdgeProcessor.
`num_scan_edge_errors`	The number of execution errors for the ScanEdgeProcessor.
`num_scan_edge`	The number of executions for the ScanEdgeProcessor.
`scan_vertex_latency_us`	The latency of executions for the ScanVertexProcessor.
`num_add_edges`	The number of times that edges are added.
`num_add_edges_errors`	The number of errors when adding edges.
`num_add_vertices`	The number of times that vertices are added.
`num_start_elect`	The number of times that Raft starts an election.
`num_add_vertices_errors`	The number of errors when adding vertices.
`num_delete_vertices_errors`	The number of errors when deleting vertices.
`append_log_latency_us`	The latency of replicating the log record to a single node by Raft.
`num_grant_votes`	The number of times that Raft votes for other nodes.
`replicate_log_latency_us`	The latency of replicating the log record to most nodes by Raft.
`num_delete_tags`	The number of times that tags are deleted.
`num_delete_tags_errors`	The number of errors when deleting tags.
`num_delete_edges`	The number of edge deletions.
`num_delete_edges_errors`	The number of errors when deleting edges
`num_send_snapshot`	The number of times that snapshots are sent.
`update_vertex_latency_us`	The latency of executions for the UpdateVertexProcessor.
`append_wal_latency_us`	The Raft write latency for a single WAL.
`num_update_edge`	The number of executions for the UpdateEdgeProcessor.
`delete_tags_latency_us`	The average latency of deleting tags.
`num_update_edge_errors`	The number of execution errors for the UpdateEdgeProcessor.
`num_get_neighbors`	The number of executions for the GetNeighborsProcessor.
`num_get_prop_errors`	The number of execution errors for the GetPropProcessor.
`num_delete_vertices`	The number of times that vertices are deleted.
`num_lookup`	The number of executions for the LookupProcessor.
`num_sync_data`	The number of times the storage synchronizes data from drainer.
`num_sync_data_errors`	The number of errors the storage synchronizes data from drainer.

Graph space¶

Parameter	Description
`num_active_queries`	The number of queries currently being executed.
`num_queries`	The number of queries.
`num_sentences`	The number of statements received by the Graphd service.
`optimizer_latency_us`	The latency of executing optimizer statements.
`query_latency_us`	The average latency of queries.
`num_slow_queries`	The number of slow queries.
`num_query_errors`	The number of query errors.
`num_query_errors_leader_changes`	The number of raft leader changes due to query errors.
`num_killed_queries`	The number of killed queries.
`num_aggregate_executors`	The number of executions for the Aggregation operator.
`num_sort_executors`	The number of executions for the Sort operator.
`num_indexscan_executors`	The number of executions for index scan operators.
`num_oom_queries`	The number of queries that caused memory to run out.
`num_auth_failed_sessions_bad_username_password`	The number of sessions where authentication failed due to incorrect username and password.
`num_auth_failed_sessions`	The number of sessions in which login authentication failed.
`num_opened_sessions`	The number of sessions connected to the server.
`num_queries_hit_memory_watermark`	The number of queries reached the memory watermark.
`num_reclaimed_expired_sessions`	The number of expired sessions actively reclaimed by the server.
`num_rpc_sent_to_metad_failed`	The number of failed RPC requests that the Graphd service sent to the Metad service.
`num_rpc_sent_to_metad`	The number of RPC requests that the Graphd service sent to the Metad service.
`num_rpc_sent_to_storaged_failed`	The number of failed RPC requests that the Graphd service sent to the Storaged service.
`num_rpc_sent_to_storaged`	The number of RPC requests that the Graphd service sent to the Storaged service.
`slow_query_latency_us`	The average latency of slow queries.

Last update: March 13, 2023