Skip to content

Cluster monitoring

This topic introduces node monitoring, service monitoring, graph space monitoring, and the Big Screen of Dashboard.

Node

At the top navigation bar of the Dashboard Enterprise Edition page, click Cluster Management, and click Monitoring->Node to enter the node monitoring page.

On this page, you can view the variation of CPU, Memory, Load, Disk, and Network In/Out quickly.

  • To set a base line, click the setup button.
  • To view the detailed monitoring information, click the watch button. In this example, select Load for details. The figure is as follows. load
    • By default, you can view the monitoring data of the latest 1 hour, 6 hours, 12 hours, 1 day, 3 days, 7days, or 14 days.
    • You can select the machine and monitoring metrics that you want to view. For details of monitoring metrics, see Monitor parameter.
    • You can set a base line as a reference standard.

Service

At the top navigation bar of the Dashboard Enterprise Edition page, click Cluster Management, and click Monitoring->Service to enter the service monitoring page.

On this page, you can view the information of Graph, Meta, and Storage services quickly. In the upper right corner, the number of normal services and abnormal services will be displayed.

Note

In the current Service page of the Enterprise Edition, only two monitoring metrics can be set for each service, which can be adjusted by clicking the Set up button.

  • To view the detailed monitoring information, click the watch button. In this example, select Graph for details. The figure is as follows. service
    • By default, you can view the monitoring data of the latest 1 hour, 6 hours, 12 hours, 1 day, 3 days, 7days, or 14 days.
    • You can select the machine and monitoring metrics that you want to view. For details of monitoring metrics, see Monitor parameter.
    • The Graph service supports a set of space-level metrics. For more information, see the following section Graph space.
    • You can set a base line as a reference standard.
    • You can view the status of the current service.

Graph space

Note

Before using graph space metrics, you need to set enable_space_level_metrics to true in the Graph service. For details, see Update config.

Space graph metrics record the information of different graph spaces separately. Currently, only the Graph service supports a set of space-level metrics.

Only when the behavior of a graph space metric is triggered, you can specify the graph space to view information about the corresponding graph space metric. For information about the space graph metrics, see Space graph.

graph-metrics

Big Screen

The Big Screen feature helps users understand the health status of the cluster and the information of services and nodes at a glance.

At the top navigation bar of the Dashboard Enterprise Edition page, click Cluster Management, and click Monitoring->Big Screen to enter the Big Screen page.

tv-dashboard

Screen area Information displayed
Upper middle area 1. The health degree of your cluster. The system scores the health of your cluster. For more information, see the following note.
2. The information and number of running nodes, the number of running services and abnormal services in the cluster.
3. CPU and memory usage of the node at the current time.
4. Alert notifications. The system displays the 5 most recently triggered alert messages based on their severity level (emergency>critical>warning). For more information, Monitoring alerts.
Lower middle area Monitoring information of 4 Graph service metrics at different periods. The 4 metrics are:
1. num_active_sessions
2. num_slow_queries
3. num_active_queries
4. num_query_errors
Left side of the area 1. QPS (Query Per Second) of your cluster.
2. The monitoring information of 2 Storage service metrics at different periods. The two metrics are: add_edges_latency_us,add_vertices_latency_us.
Right side of the area The node-related metrics information at different periods. Metrics include:
1. cpu_utilization
2. memory_utilization
3. load_1m
4. disk_readbytes
5. disk_writebytes

For more information about the monitoring metrics, see Metrics.

Note

Cluster scoring rules are as follows:

  • The maximum score is 100; The minimum score is 13.
  • When 100≥Health Degree≥80, the score is blue; When 80>Health Degree≥60, the score is yellow; When Health Degree<60, the score is yellow.
  • Algorithm: (1-number of abnormal services/total number of services)*100%。
  • Except for the appearance of the first emergency level alert that deducts 40 points, 10 points are deducted for each of the other emergency level alerts and other levels of alerts.

Last update: April 26, 2022