Prepare resources for compiling, installing, and running NebulaGraph¶

This topic describes the requirements and suggestions for compiling and installing NebulaGraph, as well as how to estimate the resource you need to reserve for running a NebulaGraph cluster.

Reading guide¶

If you are reading this topic with the questions listed below, click them to jump to their answers.

What do I need to compile NebulaGraph?

What do I need to run NebulaGraph in a test environment?

What do I need to run NebulaGraph in a production environment?

How much memory and disk space do I need to reserve for my NebulaGraph cluster?

Requirements for compiling the NebulaGraph source code¶

Hardware requirements for compiling NebulaGraph¶

Item	Requirement
CPU architecture	x86_64
Memory	4 GB
Disk	10 GB, SSD

Supported operating systems for compiling NebulaGraph¶

For now, we can only compile NebulaGraph in the Linux system. We recommend that you use any Linux system with kernel version 2.6.32 or above.

Software requirements for compiling NebulaGraph¶

You must have the correct version of the software listed below to compile NebulaGraph. If they are not as required or you are not sure, follow the steps in Prepare software for compiling NebulaGraph to get them ready.

Software	Version	Note
glibc	2.17 or above	You can run `ldd --version` to check the glibc version.
make	Any stable version	-
m4	Any stable version	-
git	Any stable version	-
wget	Any stable version	-
unzip	Any stable version	-
xz	Any stable version	-
readline-devel	Any stable version	-
ncurses-devel	Any stable version	-
zlib-devel	Any stable version	-
gcc	7.5.0 or above	You can run `gcc -v` to check the gcc version.
gcc-c++	Any stable version	-
cmake	3.9.0 or above	You can run `cmake --version` to check the cmake version.
gettext	Any stable version	-
curl	Any stable version	-
redhat-lsb-core	Any stable version	-
libstdc++-static	Any stable version	Only needed in CentOS 8+, RedHat 8+, and Fedora systems.
libasan	Any stable version	Only needed in CentOS 8+, RedHat 8+, and Fedora systems.
bzip2	Any stable version	-

Other third-party software will be automatically downloaded and installed to the build directory at the configure (cmake) stage.

Prepare software for compiling NebulaGraph¶

This section guides you through the downloading and installation of software required for compiling NebulaGraph.

Install dependencies.

For CentOS, RedHat, and Fedora users, run the following commands.

$ yum update
$ yum install -y make \
                 m4 \
                 git \
                 wget \
                 unzip \
                 xz \
                 readline-devel \
                 ncurses-devel \
                 zlib-devel \
                 gcc \
                 gcc-c++ \
                 cmake \
                 gettext \
                 curl \
                 redhat-lsb-core \
                 bzip2
  // For CentOS 8+, RedHat 8+, and Fedora, install libstdc++-static and libasan as well
$ yum install -y libstdc++-static libasan

For Debian and Ubuntu users, run the following commands.

$ apt-get update
$ apt-get install -y make \
                     m4 \
                     git \
                     wget \
                     unzip \
                     xz-utils \
                     curl \
                     lsb-core \
                     build-essential \
                     libreadline-dev \
                     ncurses-dev \
                     cmake \
                     gettext

Check if the GCC and cmake on your host are in the right version. See Software requirements for compiling NebulaGraph for the required versions.

$ g++ --version
$ cmake --version

If your GCC and CMake are in the right version, then you are all set. If they are not, follow the sub-steps as follows.

Clone the nebula repository to your host.
```
$ git clone -b v2.6.2 https://github.com/vesoft-inc/nebula-common.git
```
Users can use the --branch or -b option to specify the branch to be cloned. For example, for 2.6.2, run the following command.
```
$ git clone --branch v2.6.2 https://github.com/vesoft-inc/nebula-common.git
```
Make nebula the current working directory.
```
$ cd nebula
```

Run the following commands to install and enable CMake and GCC.

// Install CMake.
$ ./third-party/install-cmake.sh cmake-install

// Enable CMake.
$ source cmake-install/bin/enable-cmake.sh

// Authorize the write privilege to the opt directory.
$ sudo mkdir /opt/vesoft && sudo chmod -R a+w /opt/vesoft

// Install GCC. Installing GCC to the opt directory requires the write privilege. And users can change it to other locations.
$ ./third-party/install-gcc.sh --prefix=/opt

// Enable GCC.
$ source /opt/vesoft/toolset/gcc/7.5.0/enable

Execute the script install-third-party.sh.
```
$ ./third-party/install-third-party.sh
```

Requirements and suggestions for installing NebulaGraph in test environments¶

Hardware requirements for test environments¶

Item	Requirement
CPU architecture	x86_64
Number of CPU core	4
Memory	8 GB
Disk	100 GB, SSD

Supported operating systems for test environments¶

For now, we can only install NebulaGraph in the Linux system. To install NebulaGraph in a test environment, we recommend that you use any Linux system with kernel version 3.9 or above.

Suggested service architecture for test environments¶

Process	Suggested number
metad (the metadata service process)	1
storaged (the storage service process)	1 or more
graphd (the query engine service process)	1 or more

For example, for a single-machine test environment, you can deploy 1 metad, 1 storaged, and 1 graphd processes in the machine.

For a more common test environment, such as a cluster of 3 machines (named as A, B, and C), you can deploy NebulaGraph as follows:

Machine name	Number of metad	Number of storaged	Number of graphd
A	1	1	1
B	None	1	1
C	None	1	1

Requirements and suggestions for installing NebulaGraph in production environments¶

Hardware requirements for production environments¶

Item	Requirement
CPU architecture	x86_64
Number of CPU core	48
Memory	96 GB
Disk	2 * 900 GB, NVMe SSD

Supported operating systems for production environments¶

For now, we can only install NebulaGraph in the Linux system. To install NebulaGraph in a production environment, we recommend that you use any Linux system with kernel version 3.9 or above.

Users can adjust some of the kernel parameters to better accommodate the need for running NebulaGraph. For more information, see kernel configuration.

Suggested service architecture for production environments¶

Danger

DO NOT deploy a cluster across IDCs.

Process	Suggested number
metad (the metadata service process)	3
storaged (the storage service process)	3 or more
graphd (the query engine service process)	3 or more

Each metad process automatically creates and maintains a replica of the metadata. Usually, you need to deploy three metad processes and only three.

The number of storaged processes does not affect the number of graph space replicas.

Users can deploy multiple processes on a single machine. For example, on a cluster of 5 machines (named as A, B, C, D, and E), you can deploy NebulaGraph as follows:

Machine name	Number of metad	Number of storaged	Number of graphd
A	1	1	1
B	1	1	1
C	1	1	1
D	None	1	1
E	None	1	1

Capacity requirements for running a NebulaGraph cluster¶

Users can estimate the memory, disk space, and partition number needed for a NebulaGraph cluster of 3 replicas as follows.

Resource	Unit	How to estimate	Description
Disk space for a cluster	Bytes	`the_sum_of_edge_number_and_vertex_number` * `average_bytes_of_properties` * 6 * 120%	-
Memory for a cluster	Bytes	[`the_sum_of_edge_number_and_vertex_number` * 15 + `the_number_of_RocksDB_instances` * (`write_buffer_size` * `max_write_buffer_number` + `rocksdb_block_cache`)] * 120%	`write_buffer_size` and `max_write_buffer_number` are RocksDB parameters. For more information, see MemTable. For details about `rocksdb_block_cache`, see Memory usage in RocksDB.
Number of partitions for a graph space	-	`the_number_of_disks_in_the_cluster` * `disk_partition_num_multiplier`	`disk_partition_num_multiplier` is an integer between 2 and 10 (both including). Its value depends on the disk performance. Use 2 for HDD.

Question 1: Why do we multiply the disk space and memory by 120%?
Answer: The extra 20% is for buffer.

Question 2: How to get the number of RocksDB instances?

Answer: Each directory in the --data_path item in the etc/nebula-storaged.conf file corresponds to a RocksDB instance. Count the number of directories to get the RocksDB instance number.

Note

Users can decrease the memory size occupied by the bloom filter by adding --enable_partitioned_index_filter=true in etc/nebula-storaged.conf. But it may decrease the read performance in some random-seek cases.

FAQ¶

About storage devices¶

NebulaGraph is designed and implemented for NVMe SSD. All default parameters are optimized for the SSD devices and require extremely high IOPS and low latency.

Due to the poor IOPS capability and long random seek latency, HDD is not recommended. Users may encounter many problems when using HDD.

Do not use remote storage devices, such as NAS or SAN. Do not connect an external virtual hard disk based on HDFS or Ceph.

Do not use RAID.

Use local SSD devices.

About CPU architecture¶

Enterpriseonly

Only NebulaGraph 2.6.2 Enterprise Edition can be run or compiled on ARM architectures (including Apple Mac M1 or Huawei Kunpeng). Contact inquiry@vesoft.com for business supports.

Last update: March 13, 2023