Skip to content

Metrics

This topic will describe the monitoring metrics in NebulaGraph Dashboard.

Machine

Note

  • All the machine metrics listed below are for the Linux operating system.
  • The default unit for Disk and Network is byte. The unit changes with the data magnitude as the page displays. For example, when the flow is less than 1 KB/s, the unit is Bytes/s.
  • For all versions of Dashboard Enterprise Edition, the memory occupied by Buff and Cache will not be counted in the memory usage.

CPU

Parameter Description
cpu_utilization The percentage of used CPU.
cpu_idle The percentage of idled CPU.
cpu_wait The percentage of CPU waiting for IO operations.
cpu_user The percentage of CPU used by users.
cpu_system The percentage of CPU used by the system.

Memory

Parameter Description
memory_utilization The percentage of used memory.
memory_used The memory space used (not including caches).
memory_free The memory space available.

Load

Parameter Description
load_1m The average load of the system in the last 1 minute.
load_5m The average load of the system in the last 5 minutes.
load_15m The average load of the system in the last 15 minutes.

Disk

Parameter Description
disk_used_percentage The disk utilization percentage.
disk_used The disk space used.
disk_free The disk space available.
disk_readbytes The number of bytes that the system reads in the disk per second.
disk_writebytes The number of bytes that the system writes in the disk per second.
disk_readiops The number of read queries that the disk receives per second.
disk_writeiops The number of write queries that the disk receives per second.
inode_utilization The percentage of used inode.

Network

Parameter Description
network_in_rate The number of bytes that the network card receives per second.
network_out_rate The number of bytes that the network card sends out per second.
network_in_errs The number of wrong bytes that the network card receives per second.
network_out_errs The number of wrong bytes that the network card sends out per second.
network_in_packets The number of data packages that the network card receives per second.
network_out_packets The number of data packages that the network card sends out per second.

Service

Period

The period is the time range of counting metrics. It currently supports 5 seconds, 60 seconds, 600 seconds, and 3600 seconds, which respectively represent the last 5 seconds, the last 1 minute, the last 10 minutes, and the last 1 hour.

Metric methods

Parameter Description
rate The average rate of operations per second in a period.
sum The sum of operations in the period.
avg The average latency in the cycle.
P75 The 75th percentile latency.
P95 The 95th percentile latency.
P99 The 99th percentile latency.
P999 The 99.9th percentile latency.

Note

Dashboard collects the following metrics from the NebulaGraph core, but only shows the metrics that are important to it.

Graph

Parameter Description
num_active_queries The number of changes in the number of active queries.
Formula: The number of started queries minus the number of finished queries within a specified time.
num_active_sessions The number of changes in the number of active sessions.
Formula: The number of logged in sessions minus the number of logged out sessions within a specified time.
For example, when querying num_active_sessions.sum.5, if there were 10 sessions logged in and 30 sessions logged out in the last 5 seconds, the value of this metric is -20 (10-30).
num_aggregate_executors The number of executions for the Aggregation operator.
num_auth_failed_sessions_bad_username_password The number of sessions where authentication failed due to incorrect username and password.
num_auth_failed_sessions_out_of_max_allowed The number of sessions that failed to authenticate logins because the value of the parameter FLAG_OUT_OF_MAX_ALLOWED_CONNECTIONS was exceeded.
num_auth_failed_sessions The number of sessions in which login authentication failed.
num_indexscan_executors The number of executions for index scan operators.
num_killed_queries The number of killed queries.
num_opened_sessions The number of sessions connected to the server.
num_queries The number of queries.
num_query_errors_leader_changes The number of the raft leader changes due to query errors.
num_query_errors The number of query errors.
num_reclaimed_expired_sessions The number of expired sessions actively reclaimed by the server.
num_rpc_sent_to_metad_failed The number of failed RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_metad The number of RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_storaged_failed The number of failed RPC requests that the Graphd service sent to the Storaged service.
num_rpc_sent_to_storaged The number of RPC requests that the Graphd service sent to the Storaged service.
num_sentences The number of statements received by the Graphd service.
num_slow_queries The number of slow queries.
num_sort_executors The number of executions for the Sort operator.
optimizer_latency_us The latency of executing optimizer statements.
query_latency_us The latency of queries.
slow_query_latency_us The latency of slow queries.
num_queries_hit_memory_watermark The number of queries reached the memory watermark.

Meta

Parameter Description
commit_log_latency_us The latency of committing logs in Raft.
commit_snapshot_latency_us The latency of committing snapshots in Raft.
heartbeat_latency_us The latency of heartbeats.
num_heartbeats The number of heartbeats.
num_raft_votes The number of votes in Raft.
transfer_leader_latency_us The latency of transferring the raft leader.
num_agent_heartbeats The number of heartbeats for the AgentHBProcessor.
agent_heartbeat_latency_us The latency of the AgentHBProcessor.
replicate_log_latency_us The latency of replicating the log record to most nodes by Raft.
num_send_snapshot The number of times that Raft sends snapshots to other nodes.
append_log_latency_us The latency of replicating the log record to a single node by Raft.
append_wal_latency_us The Raft write latency for a single WAL.
num_grant_votes The number of times that Raft votes for other nodes.
num_start_elect The number of times that Raft starts an election.

Storage

Parameter Description
add_edges_atomic_latency_us The latency of adding atomic edge after TOSS is enabled.
add_edges_latency_us The latency of adding edges.
add_vertices_latency_us The latency of adding vertices.
commit_log_latency_us The latency of committing logs in Raft.
commit_snapshot_latency_us The latency of committing snapshots in Raft.
delete_edges_latency_us The latency of deleting edges.
delete_vertices_latency_us The latency of deleting vertices.
get_neighbors_latency_us The latency of querying neighbor vertices.
num_get_prop The number of executions for the GetPropProcessor.
num_get_neighbors_errors The number of execution errors for the GetNeighborsProcessor.
get_prop_latency_us The latency of executions for the GetPropProcessor.
num_edges_deleted The number of deleted edges.
num_edges_inserted The number of inserted edges.
num_raft_votes The number of votes in Raft.
num_rpc_sent_to_metad_failed The number of failed RPC requests that the Storage service sent to the Meta service.
num_rpc_sent_to_metad The number of RPC requests that the Storaged service sent to the Metad service.
num_tags_deleted The number of deleted tags.
num_vertices_deleted The number of deleted vertices.
num_vertices_inserted The number of inserted vertices.
transfer_leader_latency_us The latency of transferring the raft leader.
lookup_latency_us The latency of executions for the LookupProcessor.
num_lookup_errors The number of execution errors for the LookupProcessor.
num_scan_vertex The number of executions for the ScanVertexProcessor.
num_scan_vertex_errors The number of execution errors for the ScanVertexProcessor.
update_edge_latency_us The latency of executions for the UpdateEdgeProcessor.
num_update_vertex The number of executions for the UpdateVertexProcessor.
num_update_vertex_errors The number of execution errors for the UpdateVertexProcessor.
kv_get_latency_us The latency of executions for the Getprocessor.
kv_put_latency_us The latency of executions for the PutProcessor.
kv_remove_latency_us The latency of executions for the RemoveProcessor.
num_kv_get_errors The number of execution errors for the GetProcessor.
num_kv_get The number of executions for the GetProcessor.
num_kv_put_errors The number of execution errors for the PutProcessor.
num_kv_put The number of executions for the PutProcessor.
num_kv_remove_errors The number of execution errors for the RemoveProcessor.
num_kv_remove The number of executions for the RemoveProcessor.
forward_tranx_latency_us The latency of transmission.
scan_edge_latency_us The latency of executions for the ScanEdgeProcessor.
num_scan_edge_errors The number of execution errors for the ScanEdgeProcessor.
num_scan_edge The number of executions for the ScanEdgeProcessor.
scan_vertex_latency_us The latency of executions for the ScanVertexProcessor.
num_add_edges The number of times that edges are added.
num_add_edges_errors The number of errors when adding edges.
num_add_vertices The number of times that vertices are added.
num_start_elect The number of times that Raft starts an election.
num_add_vertices_errors The number of errors when adding vertices.
num_delete_vertices_errors The number of errors when deleting vertices.
append_log_latency_us The latency of replicating the log record to a single node by Raft.
num_grant_votes The number of times that Raft votes for other nodes.
replicate_log_latency_us The latency of replicating the log record to most nodes by Raft.
num_delete_tags The number of times that tags are deleted.
num_delete_tags_errors The number of errors when deleting tags.
num_delete_edges The number of edge deletions.
num_delete_edges_errors The number of errors when deleting edges
num_send_snapshot The number of times that snapshots are sent.
update_vertex_latency_us The latency of executions for the UpdateVertexProcessor.
append_wal_latency_us The Raft write latency for a single WAL.
num_update_edge The number of executions for the UpdateEdgeProcessor.
delete_tags_latency_us The latency of deleting tags.
num_update_edge_errors The number of execution errors for the UpdateEdgeProcessor.
num_get_neighbors The number of executions for the GetNeighborsProcessor.
num_get_prop_errors The number of execution errors for the GetPropProcessor.
num_delete_vertices The number of times that vertices are deleted.
num_lookup The number of executions for the LookupProcessor.
num_sync_data The number of times the Storage service synchronizes data from the Drainer.
num_sync_data_errors The number of errors that occur when the Storage service synchronizes data from the Drainer.
sync_data_latency_us The latency of the Storage service synchronizing data from the Drainer.

Graph space

Note

Space-level metrics are created dynamically, so that only when the behavior is triggered in the graph space, the corresponding metric is created and can be queried by the user.

Parameter Description
num_active_queries The number of queries currently being executed.
num_queries The number of queries.
num_sentences The number of statements received by the Graphd service.
optimizer_latency_us The latency of executing optimizer statements.
query_latency_us The latency of queries.
num_slow_queries The number of slow queries.
num_query_errors The number of query errors.
num_query_errors_leader_changes The number of raft leader changes due to query errors.
num_killed_queries The number of killed queries.
num_aggregate_executors The number of executions for the Aggregation operator.
num_sort_executors The number of executions for the Sort operator.
num_indexscan_executors The number of executions for index scan operators.
num_auth_failed_sessions_bad_username_password The number of sessions where authentication failed due to incorrect username and password.
num_auth_failed_sessions The number of sessions in which login authentication failed.
num_opened_sessions The number of sessions connected to the server.
num_queries_hit_memory_watermark The number of queries reached the memory watermark.
num_reclaimed_expired_sessions The number of expired sessions actively reclaimed by the server.
num_rpc_sent_to_metad_failed The number of failed RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_metad The number of RPC requests that the Graphd service sent to the Metad service.
num_rpc_sent_to_storaged_failed The number of failed RPC requests that the Graphd service sent to the Storaged service.
num_rpc_sent_to_storaged The number of RPC requests that the Graphd service sent to the Storaged service.
slow_query_latency_us The average latency of slow queries.

Last update: February 1, 2023