Prepare resources for compiling, installing, and running NebulaGraph¶
This topic describes the requirements and suggestions for compiling and installing NebulaGraph, as well as how to estimate the resource you need to reserve for running a NebulaGraph cluster.
About storage devices¶
NebulaGraph is designed and implemented for NVMe SSD. All default parameters are optimized for the SSD devices and require extremely high IOPS and low latency.
- Due to the poor IOPS capability and long random seek latency, HDD is not recommended. Users may encounter many problems when using HDD.
- Do not use remote storage devices, such as NAS or SAN. Do not connect an external virtual hard disk based on HDFS or Ceph.
- RAID is not recommended because NebulaGraph provides a multi-replica mechanism. Configuring RAID would result in a waste of resources.
- Use local SSD devices, or AWS Provisioned IOPS SSD equivalence.
About CPU architecture¶
Starting with 3.0.2, you can run containerized NebulaGraph databases on Docker Desktop for ARM macOS or on ARM Linux servers.
Caution
We do not recommend you deploy NebulaGraph on Docker Desktop for Windows due to its subpar performance. For details, see #12401.
Requirements for compiling the source code¶
Hardware requirements for compiling NebulaGraph¶
Item | Requirement |
---|---|
CPU architecture | x86_64 |
Memory | 4 GB |
Disk | 10 GB, SSD |
Supported operating systems for compiling NebulaGraph¶
For now, we can only compile NebulaGraph in the Linux system. We recommend that you use any Linux system with kernel version 4.15
or above.
Note
To install NebulaGraph on Linux systems with kernel version lower than required, use RPM/DEB packages or TAR files.
Software requirements for compiling NebulaGraph¶
You must have the correct version of the software listed below to compile NebulaGraph. If they are not as required or you are not sure, follow the steps in Prepare software for compiling NebulaGraph to get them ready.
Software | Version | Note |
---|---|---|
glibc | 2.17 or above | You can run ldd --version to check the glibc version. |
make | Any stable version | - |
m4 | Any stable version | - |
git | Any stable version | - |
wget | Any stable version | - |
unzip | Any stable version | - |
xz | Any stable version | - |
readline-devel | Any stable version | - |
ncurses-devel | Any stable version | - |
zlib-devel | Any stable version | - |
g++ | 8.5.0 or above | You can run gcc -v to check the gcc version. |
cmake | 3.14.0 or above | You can run cmake --version to check the cmake version. |
curl | Any stable version | - |
redhat-lsb-core | Any stable version | - |
libstdc++-static | Any stable version | Only needed in CentOS 8+, RedHat 8+, and Fedora systems. |
libasan | Any stable version | Only needed in CentOS 8+, RedHat 8+, and Fedora systems. |
bzip2 | Any stable version | - |
Other third-party software will be automatically downloaded and installed to the build
directory at the configure (cmake) stage.
Prepare software for compiling NebulaGraph¶
If part of the dependencies are missing or the versions does not meet the requirements, manually install them with the following steps. You can skip unnecessary dependencies or steps according to your needs.
-
Install dependencies.
- For CentOS, RedHat, and Fedora users, run the following commands.
$ yum update $ yum install -y make \ m4 \ git \ wget \ unzip \ xz \ readline-devel \ ncurses-devel \ zlib-devel \ gcc \ gcc-c++ \ cmake \ curl \ redhat-lsb-core \ bzip2 // For CentOS 8+, RedHat 8+, and Fedora, install libstdc++-static and libasan as well $ yum install -y libstdc++-static libasan
- For Debian and Ubuntu users, run the following commands.
$ apt-get update $ apt-get install -y make \ m4 \ git \ wget \ unzip \ xz-utils \ curl \ lsb-core \ build-essential \ libreadline-dev \ ncurses-dev \ cmake \ bzip2
- For CentOS, RedHat, and Fedora users, run the following commands.
-
Check if the GCC and cmake on your host are in the right version. See Software requirements for compiling NebulaGraph for the required versions.
$ g++ --version $ cmake --version
If your GCC and CMake are in the right versions, then you are all set and you can ignore the subsequent steps. If they are not, select and perform the needed steps as follows.
-
If the CMake version is incorrect, visit the CMake official website to install the required version.
-
If the G++ version is incorrect, visit the G++ official website or follow the instructions below to to install the required version.
-
For CentOS users, run:
yum install centos-release-scl yum install devtoolset-11 scl enable devtoolset-11 'bash'
-
For Ubuntu users, run:
add-apt-repository ppa:ubuntu-toolchain-r/test apt install gcc-11 g++-11
-
Requirements and suggestions for installing NebulaGraph in test environments¶
Hardware requirements for test environments¶
Item | Requirement |
---|---|
CPU architecture | x86_64 |
Number of CPU core | 4 |
Memory | 8 GB |
Disk | 100 GB, SSD |
Supported operating systems for test environments¶
For now, we can only install NebulaGraph in the Linux system. To install NebulaGraph in a test environment, we recommend that you use any Linux system with kernel version 3.9
or above.
Suggested service architecture for test environments¶
Process | Suggested number |
---|---|
metad (the metadata service process) | 1 |
storaged (the storage service process) | 1 or more |
graphd (the query engine service process) | 1 or more |
For example, for a single-machine test environment, you can deploy 1 metad, 1 storaged, and 1 graphd processes in the machine.
For a more common test environment, such as a cluster of 3 machines (named as A, B, and C), you can deploy NebulaGraph as follows:
Machine name | Number of metad | Number of storaged | Number of graphd |
---|---|---|---|
A | 1 | 1 | 1 |
B | None | 1 | 1 |
C | None | 1 | 1 |
Requirements and suggestions for installing NebulaGraph in production environments¶
Hardware requirements for production environments¶
Item | Requirement |
---|---|
CPU architecture | x86_64 |
Number of CPU core | 48 |
Memory | 256 GB |
Disk | 2 * 1.6 TB, NVMe SSD |
Supported operating systems for production environments¶
For now, we can only install NebulaGraph in the Linux system. To install NebulaGraph in a production environment, we recommend that you use any Linux system with kernel version 3.9 or above.
Users can adjust some of the kernel parameters to better accommodate the need for running NebulaGraph. For more information, see kernel configuration.
Suggested service architecture for production environments¶
Danger
DO NOT deploy a single cluster across IDCs (The Enterprise Edtion supports data synchronization between clusters across IDCs).
Process | Suggested number |
---|---|
metad (the metadata service process) | 3 |
storaged (the storage service process) | 3 or more |
graphd (the query engine service process) | 3 or more |
Each metad process automatically creates and maintains a replica of the metadata. Usually, you need to deploy three metad processes and only three.
The number of storaged processes does not affect the number of graph space replicas.
Users can deploy multiple processes on a single machine. For example, on a cluster of 5 machines (named as A, B, C, D, and E), you can deploy NebulaGraph as follows:
Machine name | Number of metad | Number of storaged | Number of graphd |
---|---|---|---|
A | 1 | 1 | 1 |
B | 1 | 1 | 1 |
C | 1 | 1 | 1 |
D | None | 1 | 1 |
E | None | 1 | 1 |
Capacity requirements for running a NebulaGraph cluster¶
Users can estimate the memory, disk space, and partition number needed for a NebulaGraph cluster of 3 replicas as follows.
Resource | Unit | How to estimate | Description |
---|---|---|---|
Disk space for a cluster | Bytes | the_sum_of_edge_number_and_vertex_number * average_bytes_of_properties * 7.5 * 120% |
For more information, see Edge partitioning and storage amplification. |
Memory for a cluster | Bytes | [the_sum_of_edge_number_and_vertex_number * 16 + the_number_of_RocksDB_instances * (write_buffer_size * max_write_buffer_number ) + rocksdb_block_cache ] * 120% |
write_buffer_size and max_write_buffer_number are RocksDB parameters. For more information, see MemTable. For details about rocksdb_block_cache , see Memory usage in RocksDB. |
Number of partitions for a graph space | - | the_number_of_disks_in_the_cluster * multiplier |
multiplier is an integer between 2 and 20 (both including). Its value depends on the disk performance. Use 20 for SSD and 2 for HDD. |
- Question 1: Why do I need to multiply by 7.5 in the disk space estimation formula?
Answer: On one hand, the data in one single replica takes up about 2.5 times more space than that of the original data file (csv) according to test values. On the other hand, indexes take up additional space. Each indexed vertex or edge takes up 16 bytes of memory. The hard disk space occupied by the index can be empirically estimated as the total number of indexed vertices or edges * 50 bytes.
- Question 2: Why do we multiply the disk space and memory by 120%?
Answer: The extra 20% is for buffer.
-
Question 3: How to get the number of RocksDB instances?
Answer: The number of RocksDB instances = the number of graph spaces * the total number of directories specified by the
--data_path
parameter across all Storage services. Each graph space corresponds to one RocksDB instance and each directory in the--data_path
parameter in theetc/nebula-storaged.conf
file corresponds to one RocksDB instance.Note
Users can decrease the memory size occupied by the bloom filter by adding
--enable_partitioned_index_filter=true
inetc/nebula-storaged.conf
. But it may decrease the read performance in some random-seek cases.
Caution
Each RocksDB instance takes up about 70M of disk space even when no data has been written yet. One partition corresponds to one RocksDB instance, and when the partition setting is very large, for example, 100, the graph space takes up a lot of disk space after it is created.