Skip to content

Query NebulaGraph metrics

NebulaGraph supports querying the monitoring metrics through HTTP ports.

Metrics structure

Each metric of NebulaGraph consists of three fields: name, type, and time range. The fields are separated by periods, for example, num_queries.sum.600. Different NebulaGraph services (Graph, Storage, or Meta) support different metrics. The detailed description is as follows.

Field Example Description
Metric name num_queries Indicates the function of the metric.
Metric type sum Indicates how the metrics are collected. Supported types are SUM, AVG, RATE, and the P-th sample quantiles such as P75, P95, P99, and P99.9.
Time range 600 The time range in seconds for the metric collection. Supported values are 5, 60, 600, and 3600, representing the last 5 seconds, 1 minute, 10 minutes, and 1 hour.

Space-level metrics

The Graph service supports a set of space-level metrics that record the information of different graph spaces separately.

To enable space-level metrics, set the value of enable_space_level_metrics to true in the Graph service configuration file before starting NebulaGraph. For details about how to modify the configuration, see Configuration Management.

Note

Space-level metrics can be queried only by querying all metrics. For example, run curl -G "http://192.168.8.40:19559/stats" to show all metrics. The returned result contains the graph space name in the form of '{space=space_name}', such as num_active_queries{space=basketballplayer}.sum.5=0.

Query metrics over HTTP

Syntax

curl -G "http://<ip>:<port>/stats?stats=<metric_name_list> [&format=json]"
Parameter Description
ip The IP address of the server. You can find it in the configuration file in the installation directory.
port The HTTP port of the server. You can find it in the configuration file in the installation directory. The default ports are 19559 (Meta), 19669 (Graph), and 19779 (Storage).
metric_name_list The metrics names. Multiple metrics are separated by commas (,).
&format=json Optional. Returns the result in the JSON format.

Note

If NebulaGraph is deployed with Docker Compose, run docker-compose ps to check the ports that are mapped from the service ports inside of the container and then query through them.

Examples

  • Query a single metric

    Query the query number in the last 10 minutes in the Graph Service.

    $ curl -G "http://192.168.8.40:19669/stats?stats=num_queries.sum.600"
    num_queries.sum.600=400
    
  • Query multiple metrics

    Query the following metrics together:

    • The average heartbeat latency in the last 1 minute.
    • The average latency of the slowest 1% heartbeats, i.e., the P99 heartbeats, in the last 10 minutes.

      $ curl -G "http://192.168.8.40:19559/stats?stats=heartbeat_latency_us.avg.60,heartbeat_latency_us.p99.600"
      heartbeat_latency_us.avg.60=281
      heartbeat_latency_us.p99.600=985
      
  • Return a JSON result.

    Query the number of new vertices in the Storage Service in the last 10 minutes and return the result in the JSON format.

    $ curl -G "http://192.168.8.40:19779/stats?stats=num_add_vertices.sum.600&format=json"
    [{"value":1,"name":"num_add_vertices.sum.600"}]
    
  • Query all metrics in a service.

    If no metric is specified in the query, NebulaGraph returns all metrics in the service.

    $ curl -G "http://192.168.8.40:19559/stats"
    heartbeat_latency_us.avg.5=304
    heartbeat_latency_us.avg.60=308
    heartbeat_latency_us.avg.600=299
    heartbeat_latency_us.avg.3600=285
    heartbeat_latency_us.p75.5=652
    heartbeat_latency_us.p75.60=669
    heartbeat_latency_us.p75.600=651
    heartbeat_latency_us.p75.3600=642
    heartbeat_latency_us.p95.5=930
    heartbeat_latency_us.p95.60=963
    heartbeat_latency_us.p95.600=933
    heartbeat_latency_us.p95.3600=929
    heartbeat_latency_us.p99.5=986
    heartbeat_latency_us.p99.60=1409
    heartbeat_latency_us.p99.600=989
    heartbeat_latency_us.p99.3600=986
    num_heartbeats.rate.5=0
    num_heartbeats.rate.60=0
    num_heartbeats.rate.600=0
    num_heartbeats.rate.3600=0
    num_heartbeats.sum.5=2
    num_heartbeats.sum.60=40
    num_heartbeats.sum.600=394
    num_heartbeats.sum.3600=2364
    ...
    

Metric description

Graph

Parameter Description
num_active_queries The number of changes in the number of active queries.
Formula: The number of started queries minus the number of finished queries within a specified time.
num_active_sessions The number of changes in the number of active sessions.
Formula: The number of logged in sessions minus the number of logged out sessions within a specified time.
For example, when querying num_active_sessions.sum.5, if there were 10 sessions logged in and 30 sessions logged out in the last 5 seconds, the value of this metric is -20 (10-30).
num_aggregate_executors The number of executions for the Aggregation operator.
num_auth_failed_sessions_bad_username_password The number of sessions where authentication failed due to incorrect username and password.
num_auth_failed_sessions_out_of_max_allowed The number of sessions that failed to authenticate logins because the value of the parameter FLAG_OUT_OF_MAX_ALLOWED_CONNECTIONS was exceeded.
num_auth_failed_sessions The number of sessions in which login authentication failed.
num_indexscan_executors The number of executions for index scan operators.
num_killed_queries The number of killed queries.
num_opened_sessions The number of sessions connected to the server.
num_queries The number of queries.
num_query_errors_leader_changes The number of the raft leader changes due to query errors.
num_query_errors The number of query errors.
num_reclaimed_expired_sessions The number of expired sessions actively reclaimed by the server.
num_rpc_sent_to_metad_failed The number of failed RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_metad The number of RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_storaged_failed The number of failed RPC requests that the Graphd service sent to the Storaged service.
num_rpc_sent_to_storaged The number of RPC requests that the Graphd service sent to the Storaged service.
num_sentences The number of statements received by the Graphd service.
num_slow_queries The number of slow queries.
num_sort_executors The number of executions for the Sort operator.
optimizer_latency_us The latency of executing optimizer statements.
query_latency_us The latency of queries.
slow_query_latency_us The latency of slow queries.
num_queries_hit_memory_watermark The number of queries reached the memory watermark.

Meta

Parameter Description
commit_log_latency_us The latency of committing logs in Raft.
commit_snapshot_latency_us The latency of committing snapshots in Raft.
heartbeat_latency_us The latency of heartbeats.
num_heartbeats The number of heartbeats.
num_raft_votes The number of votes in Raft.
transfer_leader_latency_us The latency of transferring the raft leader.
num_agent_heartbeats The number of heartbeats for the AgentHBProcessor.
agent_heartbeat_latency_us The latency of the AgentHBProcessor.
replicate_log_latency_us The latency of replicating the log record to most nodes by Raft.
num_send_snapshot The number of times that Raft sends snapshots to other nodes.
append_log_latency_us The latency of replicating the log record to a single node by Raft.
append_wal_latency_us The Raft write latency for a single WAL.
num_grant_votes The number of times that Raft votes for other nodes.
num_start_elect The number of times that Raft starts an election.

Storage

Parameter Description
add_edges_atomic_latency_us The latency of adding atomic edge after TOSS is enabled.
add_edges_latency_us The latency of adding edges.
add_vertices_latency_us The latency of adding vertices.
commit_log_latency_us The latency of committing logs in Raft.
commit_snapshot_latency_us The latency of committing snapshots in Raft.
delete_edges_latency_us The latency of deleting edges.
delete_vertices_latency_us The latency of deleting vertices.
get_neighbors_latency_us The latency of querying neighbor vertices.
get_dst_by_src_latency_us The latency of querying the destination vertex by the source vertex.
num_get_prop The number of executions for the GetPropProcessor.
num_get_neighbors_errors The number of execution errors for the GetNeighborsProcessor.
num_get_dst_by_src_errors The number of execution errors for the GetDstBySrcProcessor.
get_prop_latency_us The latency of executions for the GetPropProcessor.
num_edges_deleted The number of deleted edges.
num_edges_inserted The number of inserted edges.
num_raft_votes The number of votes in Raft.
num_rpc_sent_to_metad_failed The number of failed RPC requests that the Storage service sent to the Meta service.
num_rpc_sent_to_metad The number of RPC requests that the Storaged service sent to the Metad service.
num_tags_deleted The number of deleted tags.
num_vertices_deleted The number of deleted vertices.
num_vertices_inserted The number of inserted vertices.
transfer_leader_latency_us The latency of transferring the raft leader.
lookup_latency_us The latency of executions for the LookupProcessor.
num_lookup_errors The number of execution errors for the LookupProcessor.
num_scan_vertex The number of executions for the ScanVertexProcessor.
num_scan_vertex_errors The number of execution errors for the ScanVertexProcessor.
update_edge_latency_us The latency of executions for the UpdateEdgeProcessor.
num_update_vertex The number of executions for the UpdateVertexProcessor.
num_update_vertex_errors The number of execution errors for the UpdateVertexProcessor.
kv_get_latency_us The latency of executions for the Getprocessor.
kv_put_latency_us The latency of executions for the PutProcessor.
kv_remove_latency_us The latency of executions for the RemoveProcessor.
num_kv_get_errors The number of execution errors for the GetProcessor.
num_kv_get The number of executions for the GetProcessor.
num_kv_put_errors The number of execution errors for the PutProcessor.
num_kv_put The number of executions for the PutProcessor.
num_kv_remove_errors The number of execution errors for the RemoveProcessor.
num_kv_remove The number of executions for the RemoveProcessor.
forward_tranx_latency_us The latency of transmission.
scan_edge_latency_us The latency of executions for the ScanEdgeProcessor.
num_scan_edge_errors The number of execution errors for the ScanEdgeProcessor.
num_scan_edge The number of executions for the ScanEdgeProcessor.
scan_vertex_latency_us The latency of executions for the ScanVertexProcessor.
num_add_edges The number of times that edges are added.
num_add_edges_errors The number of errors when adding edges.
num_add_vertices The number of times that vertices are added.
num_start_elect The number of times that Raft starts an election.
num_add_vertices_errors The number of errors when adding vertices.
num_delete_vertices_errors The number of errors when deleting vertices.
append_log_latency_us The latency of replicating the log record to a single node by Raft.
num_grant_votes The number of times that Raft votes for other nodes.
replicate_log_latency_us The latency of replicating the log record to most nodes by Raft.
num_delete_tags The number of times that tags are deleted.
num_delete_tags_errors The number of errors when deleting tags.
num_delete_edges The number of edge deletions.
num_delete_edges_errors The number of errors when deleting edges
num_send_snapshot The number of times that snapshots are sent.
update_vertex_latency_us The latency of executions for the UpdateVertexProcessor.
append_wal_latency_us The Raft write latency for a single WAL.
num_update_edge The number of executions for the UpdateEdgeProcessor.
delete_tags_latency_us The latency of deleting tags.
num_update_edge_errors The number of execution errors for the UpdateEdgeProcessor.
num_get_neighbors The number of executions for the GetNeighborsProcessor.
num_get_dst_by_src The number of executions for the GetDstBySrcProcessor.
num_get_prop_errors The number of execution errors for the GetPropProcessor.
num_delete_vertices The number of times that vertices are deleted.
num_lookup The number of executions for the LookupProcessor.
num_sync_data The number of times the Storage service synchronizes data from the Drainer.
num_sync_data_errors The number of errors that occur when the Storage service synchronizes data from the Drainer.
sync_data_latency_us The latency of the Storage service synchronizing data from the Drainer.

Graph space

Parameter Description
num_active_queries The number of queries currently being executed.
num_queries The number of queries.
num_sentences The number of statements received by the Graphd service.
optimizer_latency_us The latency of executing optimizer statements.
query_latency_us The latency of queries.
num_slow_queries The number of slow queries.
num_query_errors The number of query errors.
num_query_errors_leader_changes The number of raft leader changes due to query errors.
num_killed_queries The number of killed queries.
num_aggregate_executors The number of executions for the Aggregation operator.
num_sort_executors The number of executions for the Sort operator.
num_indexscan_executors The number of executions for index scan operators.
num_oom_queries The number of queries that caused memory to run out.
num_auth_failed_sessions_bad_username_password The number of sessions where authentication failed due to incorrect username and password.
num_auth_failed_sessions The number of sessions in which login authentication failed.
num_opened_sessions The number of sessions connected to the server.
num_queries_hit_memory_watermark The number of queries reached the memory watermark.
num_reclaimed_expired_sessions The number of expired sessions actively reclaimed by the server.
num_rpc_sent_to_metad_failed The number of failed RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_metad The number of RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_storaged_failed The number of failed RPC requests that the Graphd service sent to the Storaged service.
num_rpc_sent_to_storaged The number of RPC requests that the Graphd service sent to the Storaged service.
slow_query_latency_us The latency of slow queries.

Single process metrics

Graph, Meta, and Storage services all have their own single process metrics.

Parameter Description
context_switches_total The number of context switches.
cpu_seconds_total The CPU usage based on user and system time.
memory_bytes_gauge The number of bytes of memory used.
open_filedesc_gauge The number of file descriptors.
read_bytes_total The number of bytes read.
write_bytes_total The number of bytes written.

Last update: December 6, 2022