Query NebulaGraph metrics¶
NebulaGraph supports querying the monitoring metrics through HTTP ports.
Metrics structure¶
Each metric of NebulaGraph consists of three fields: name, type, and time range. The fields are separated by periods, for example, num_queries.sum.600
. Different NebulaGraph services (Graph, Storage, or Meta) support different metrics. The detailed description is as follows.
Field | Example | Description |
---|---|---|
Metric name | num_queries |
Indicates the function of the metric. |
Metric type | sum |
Indicates how the metrics are collected. Supported types are SUM, AVG, RATE, and the P-th sample quantiles such as P75, P95, P99, and P99.9. |
Time range | 600 |
The time range in seconds for the metric collection. Supported values are 5, 60, 600, and 3600, representing the last 5 seconds, 1 minute, 10 minutes, and 1 hour. |
Query metrics over HTTP¶
Syntax¶
curl -G "http://<ip>:<port>/stats?stats=<metric_name_list> [&format=json]"
Parameter | Description |
---|---|
ip |
The IP address of the server. You can find it in the configuration file in the installation directory. |
port |
The HTTP port of the server. You can find it in the configuration file in the installation directory. The default ports are 19559 (Meta), 19669 (Graph), and 19779 (Storage). |
metric_name_list |
The metrics names. Multiple metrics are separated by commas (,). |
&format=json |
Optional. Returns the result in the JSON format. |
Note
If NebulaGraph is deployed with Docker Compose, run docker-compose ps
to check the ports that are mapped from the service ports inside of the container and then query through them.
Query a single metric¶
Query the query number in the last 10 minutes in the Graph Service.
$ curl -G "http://192.168.8.40:19669/stats?stats=num_queries.sum.600"
num_queries.sum.600=400
Query multiple metrics¶
Query the following metrics together:
- The average heartbeat latency in the last 1 minute.
-
The average latency of the slowest 1% heartbeats, i.e., the P99 heartbeats, in the last 10 minutes.
$ curl -G "http://192.168.8.40:19559/stats?stats=heartbeat_latency_us.avg.60,heartbeat_latency_us.p99.600" heartbeat_latency_us.avg.60=281 heartbeat_latency_us.p99.600=985
Return a JSON result.¶
Query the number of new vertices in the Storage Service in the last 10 minutes and return the result in the JSON format.
$ curl -G "http://192.168.8.40:19779/stats?stats=num_add_vertices.sum.600&format=json"
[{"value":1,"name":"num_add_vertices.sum.600"}]
Query all metrics in a service.¶
If no metric is specified in the query, NebulaGraph returns all metrics in the service.
$ curl -G "http://192.168.8.40:19559/stats"
heartbeat_latency_us.avg.5=304
heartbeat_latency_us.avg.60=308
heartbeat_latency_us.avg.600=299
heartbeat_latency_us.avg.3600=285
heartbeat_latency_us.p75.5=652
heartbeat_latency_us.p75.60=669
heartbeat_latency_us.p75.600=651
heartbeat_latency_us.p75.3600=642
heartbeat_latency_us.p95.5=930
heartbeat_latency_us.p95.60=963
heartbeat_latency_us.p95.600=933
heartbeat_latency_us.p95.3600=929
heartbeat_latency_us.p99.5=986
heartbeat_latency_us.p99.60=1409
heartbeat_latency_us.p99.600=989
heartbeat_latency_us.p99.3600=986
num_heartbeats.rate.5=0
num_heartbeats.rate.60=0
num_heartbeats.rate.600=0
num_heartbeats.rate.3600=0
num_heartbeats.sum.5=2
num_heartbeats.sum.60=40
num_heartbeats.sum.600=394
num_heartbeats.sum.3600=2364
...
Space-level metrics¶
The Graph service supports a set of space-level metrics that record the information of different graph spaces separately.
Space-level metrics can be queried only by querying all metrics. For example, run curl -G "http://192.168.8.40:19559/stats"
to show all metrics. The returned result contains the graph space name in the form of '{space=space_name}', such as num_active_queries{space=basketballplayer}.sum.5=0
.
Caution
To enable space-level metrics, set the value of enable_space_level_metrics
to true
in the Graph service configuration file before starting NebulaGraph. For details about how to modify the configuration, see Configuration Management.
Metric description¶
Graph¶
Parameter | Description |
---|---|
num_active_queries |
The number of changes in the number of active queries. Formula: The number of started queries minus the number of finished queries within a specified time. |
num_active_sessions |
The number of changes in the number of active sessions. Formula: The number of logged in sessions minus the number of logged out sessions within a specified time. For example, when querying num_active_sessions.sum.5 , if there were 10 sessions logged in and 30 sessions logged out in the last 5 seconds, the value of this metric is -20 (10-30). |
num_aggregate_executors |
The number of executions for the Aggregation operator. |
num_auth_failed_sessions_bad_username_password |
The number of sessions where authentication failed due to incorrect username and password. |
num_auth_failed_sessions_out_of_max_allowed |
The number of sessions that failed to authenticate logins because the value of the parameter FLAG_OUT_OF_MAX_ALLOWED_CONNECTIONS was exceeded. |
num_auth_failed_sessions |
The number of sessions in which login authentication failed. |
num_indexscan_executors |
The number of executions for index scan operators. |
num_killed_queries |
The number of killed queries. |
num_opened_sessions |
The number of sessions connected to the server. |
num_queries |
The number of queries. |
num_query_errors_leader_changes |
The number of the raft leader changes due to query errors. |
num_query_errors |
The number of query errors. |
num_reclaimed_expired_sessions |
The number of expired sessions actively reclaimed by the server. |
num_rpc_sent_to_metad_failed |
The number of failed RPC requests that the Graphd service sent to the Metad service. |
num_rpc_sent_to_metad |
The number of RPC requests that the Graphd service sent to the Metad service. |
num_rpc_sent_to_storaged_failed |
The number of failed RPC requests that the Graphd service sent to the Storaged service. |
num_rpc_sent_to_storaged |
The number of RPC requests that the Graphd service sent to the Storaged service. |
num_sentences |
The number of statements received by the Graphd service. |
num_slow_queries |
The number of slow queries. |
num_sort_executors |
The number of executions for the Sort operator. |
optimizer_latency_us |
The latency of executing optimizer statements. |
query_latency_us |
The latency of queries. |
slow_query_latency_us |
The latency of slow queries. |
num_queries_hit_memory_watermark |
The number of queries reached the memory watermark. |
Meta¶
Parameter | Description |
---|---|
commit_log_latency_us |
The latency of committing logs in Raft. |
commit_snapshot_latency_us |
The latency of committing snapshots in Raft. |
heartbeat_latency_us |
The latency of heartbeats. |
num_heartbeats |
The number of heartbeats. |
num_raft_votes |
The number of votes in Raft. |
transfer_leader_latency_us |
The latency of transferring the raft leader. |
num_agent_heartbeats |
The number of heartbeats for the AgentHBProcessor. |
agent_heartbeat_latency_us |
The latency of the AgentHBProcessor. |
replicate_log_latency_us |
The latency of replicating the log record to most nodes by Raft. |
num_send_snapshot |
The number of times that Raft sends snapshots to other nodes. |
append_log_latency_us |
The latency of replicating the log record to a single node by Raft. |
append_wal_latency_us |
The Raft write latency for a single WAL. |
num_grant_votes |
The number of times that Raft votes for other nodes. |
num_start_elect |
The number of times that Raft starts an election. |
Storage¶
Parameter | Description |
---|---|
add_edges_latency_us |
The latency of adding edges. |
add_vertices_latency_us |
The latency of adding vertices. |
commit_log_latency_us |
The latency of committing logs in Raft. |
commit_snapshot_latency_us |
The latency of committing snapshots in Raft. |
delete_edges_latency_us |
The latency of deleting edges. |
delete_vertices_latency_us |
The latency of deleting vertices. |
get_neighbors_latency_us |
The latency of querying neighbor vertices. |
get_dst_by_src_latency_us |
The latency of querying the destination vertex by the source vertex. |
num_get_prop |
The number of executions for the GetPropProcessor. |
num_get_neighbors_errors |
The number of execution errors for the GetNeighborsProcessor. |
num_get_dst_by_src_errors |
The number of execution errors for the GetDstBySrcProcessor. |
get_prop_latency_us |
The latency of executions for the GetPropProcessor. |
num_edges_deleted |
The number of deleted edges. |
num_edges_inserted |
The number of inserted edges. |
num_raft_votes |
The number of votes in Raft. |
num_rpc_sent_to_metad_failed |
The number of failed RPC requests that the Storage service sent to the Meta service. |
num_rpc_sent_to_metad |
The number of RPC requests that the Storaged service sent to the Metad service. |
num_tags_deleted |
The number of deleted tags. |
num_vertices_deleted |
The number of deleted vertices. |
num_vertices_inserted |
The number of inserted vertices. |
transfer_leader_latency_us |
The latency of transferring the raft leader. |
lookup_latency_us |
The latency of executions for the LookupProcessor. |
num_lookup_errors |
The number of execution errors for the LookupProcessor. |
num_scan_vertex |
The number of executions for the ScanVertexProcessor. |
num_scan_vertex_errors |
The number of execution errors for the ScanVertexProcessor. |
update_edge_latency_us |
The latency of executions for the UpdateEdgeProcessor. |
num_update_vertex |
The number of executions for the UpdateVertexProcessor. |
num_update_vertex_errors |
The number of execution errors for the UpdateVertexProcessor. |
kv_get_latency_us |
The latency of executions for the Getprocessor. |
kv_put_latency_us |
The latency of executions for the PutProcessor. |
kv_remove_latency_us |
The latency of executions for the RemoveProcessor. |
num_kv_get_errors |
The number of execution errors for the GetProcessor. |
num_kv_get |
The number of executions for the GetProcessor. |
num_kv_put_errors |
The number of execution errors for the PutProcessor. |
num_kv_put |
The number of executions for the PutProcessor. |
num_kv_remove_errors |
The number of execution errors for the RemoveProcessor. |
num_kv_remove |
The number of executions for the RemoveProcessor. |
forward_tranx_latency_us |
The latency of transmission. |
scan_edge_latency_us |
The latency of executions for the ScanEdgeProcessor. |
num_scan_edge_errors |
The number of execution errors for the ScanEdgeProcessor. |
num_scan_edge |
The number of executions for the ScanEdgeProcessor. |
scan_vertex_latency_us |
The latency of executions for the ScanVertexProcessor. |
num_add_edges |
The number of times that edges are added. |
num_add_edges_errors |
The number of errors when adding edges. |
num_add_vertices |
The number of times that vertices are added. |
num_start_elect |
The number of times that Raft starts an election. |
num_add_vertices_errors |
The number of errors when adding vertices. |
num_delete_vertices_errors |
The number of errors when deleting vertices. |
append_log_latency_us |
The latency of replicating the log record to a single node by Raft. |
num_grant_votes |
The number of times that Raft votes for other nodes. |
replicate_log_latency_us |
The latency of replicating the log record to most nodes by Raft. |
num_delete_tags |
The number of times that tags are deleted. |
num_delete_tags_errors |
The number of errors when deleting tags. |
num_delete_edges |
The number of edge deletions. |
num_delete_edges_errors |
The number of errors when deleting edges |
num_send_snapshot |
The number of times that snapshots are sent. |
update_vertex_latency_us |
The latency of executions for the UpdateVertexProcessor. |
append_wal_latency_us |
The Raft write latency for a single WAL. |
num_update_edge |
The number of executions for the UpdateEdgeProcessor. |
delete_tags_latency_us |
The latency of deleting tags. |
num_update_edge_errors |
The number of execution errors for the UpdateEdgeProcessor. |
num_get_neighbors |
The number of executions for the GetNeighborsProcessor. |
num_get_dst_by_src |
The number of executions for the GetDstBySrcProcessor. |
num_get_prop_errors |
The number of execution errors for the GetPropProcessor. |
num_delete_vertices |
The number of times that vertices are deleted. |
num_lookup |
The number of executions for the LookupProcessor. |
num_sync_data |
The number of times the Storage service synchronizes data from the Drainer. |
num_sync_data_errors |
The number of errors that occur when the Storage service synchronizes data from the Drainer. |
sync_data_latency_us |
The latency of the Storage service synchronizing data from the Drainer. |
Graph space¶
Note
Space-level metrics are created dynamically, so that only when the behavior is triggered in the graph space, the corresponding metric is created and can be queried by the user.
Parameter | Description |
---|---|
num_active_queries |
The number of queries currently being executed. |
num_queries |
The number of queries. |
num_sentences |
The number of statements received by the Graphd service. |
optimizer_latency_us |
The latency of executing optimizer statements. |
query_latency_us |
The latency of queries. |
num_slow_queries |
The number of slow queries. |
num_query_errors |
The number of query errors. |
num_query_errors_leader_changes |
The number of raft leader changes due to query errors. |
num_killed_queries |
The number of killed queries. |
num_aggregate_executors |
The number of executions for the Aggregation operator. |
num_sort_executors |
The number of executions for the Sort operator. |
num_indexscan_executors |
The number of executions for index scan operators. |
num_auth_failed_sessions_bad_username_password |
The number of sessions where authentication failed due to incorrect username and password. |
num_auth_failed_sessions |
The number of sessions in which login authentication failed. |
num_opened_sessions |
The number of sessions connected to the server. |
num_queries_hit_memory_watermark |
The number of queries reached the memory watermark. |
num_reclaimed_expired_sessions |
The number of expired sessions actively reclaimed by the server. |
num_rpc_sent_to_metad_failed |
The number of failed RPC requests that the Graphd service sent to the Metad service. |
num_rpc_sent_to_metad |
The number of RPC requests that the Graphd service sent to the Metad service. |
num_rpc_sent_to_storaged_failed |
The number of failed RPC requests that the Graphd service sent to the Storaged service. |
num_rpc_sent_to_storaged |
The number of RPC requests that the Graphd service sent to the Storaged service. |
slow_query_latency_us |
The latency of slow queries. |
Single process metrics¶
Graph, Meta, and Storage services all have their own single process metrics.
Parameter | Description |
---|---|
context_switches_total |
The number of context switches. |
cpu_seconds_total |
The CPU usage based on user and system time. |
memory_bytes_gauge |
The number of bytes of memory used. |
open_filedesc_gauge |
The number of file descriptors. |
read_bytes_total |
The number of bytes read. |
write_bytes_total |
The number of bytes written. |