Skip to content

Enable AutoFDO for NebulaGraph

The AutoFDO can analyze the performance of an optimized program and use the program's performance information to guide the compiler to re-optimize the program. This document will help you to enable the AutoFDO for NebulaGraph.

More information about the AutoFDO, please refer AutoFDO Wiki.

Resource Preparations

Install Dependencies

  • Install perf

    sudo apt-get update
    sudo apt-get install -y linux-tools-common \
    linux-tools-generic \
    linux-tools-`uname -r`
    
  • Install autofdo tool

    sudo apt-get update
    sudo apt-get install -y autofdo
    

    Or you can compile the autofdo tool from source.

NebulaGraph Binary with Debug Version

For how to build NebulaGraph from source, please refer to the official document: Install NebulaGraph by compiling the source code. In the configure step, replace CMAKE_BUILD_TYPE=Release with CMAKE_BUILD_TYPE=RelWithDebInfo as below:

$ cmake -DCMAKE_INSTALL_PREFIX=/usr/local/nebula -DENABLE_TESTING=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo ..

Prepare Test Data

In our test environment, we use NebulaGraph Bench to prepare the test data and collect the profile data by running the FindShortestPath, Go1Step, Go2Step, Go3Step, InsertPersonScenario 5 scenarios.

Note

You can use your TopN queries in your production environment to collect the profile data, the performance can gain more in your environment.

Prepare Profile Data

Collect Perf Data For AutoFdo Tool

  1. After the test data preparation work done. Collect the perf data for different scenarios. Get the pid of storaged, graphd, metad.

    $ nebula.service status all
    [INFO] nebula-metad: Running as 305422, Listening on 9559
    [INFO] nebula-graphd: Running as 305516, Listening on 9669
    [INFO] nebula-storaged: Running as 305707, Listening on 9779
    
  2. Start the perf record for nebula-graphd and nebula-storaged.

    perf record -p 305516,305707 -b -e br_inst_retired.near_taken:pp -o ~/FindShortestPath.data
    

    Note

    Because the nebula-metad service contribution percent is small compared with nebula-graphd and nebula-storaged services. To reduce effort, we didn't collect the perf data for nebula-metad service.

  3. Start the benchmark test for FindShortestPath scenario.

    cd NebulaGraph-Bench 
    python3 run.py stress run -s benchmark -scenario find_path.FindShortestPath -a localhost:9669 --args='-u 100 -i 100000'
    
  4. After the benchmark finished, end the perf record by Ctrl + c.

  5. Repeat above steps to collect corresponding profile data for the rest Go1Step, Go2Step, Go3Step and InsertPersonScenario scenarios.

Create Gcov File

create_gcov --binary=$NEBULA_HOME/bin/nebula-storaged \
--profile=~/FindShortestPath.data \
--gcov=~/FindShortestPath-storaged.gcov \
-gcov_version=1

create_gcov --binary=$NEBULA_HOME/bin/nebula-graphd \
--profile=~/FindShortestPath.data \
--gcov=~/FindShortestPath-graphd.gcov \
-gcov_version=1

Repeat for Go1Step, Go2Step, Go3Step and InsertPersonScenario scenarios.

Merge the Profile Data

profile_merger ~/FindShortestPath-graphd.gcov \
~/FindShortestPath-storaged.gcov \
~/go1step-storaged.gcov \
~/go1step-graphd.gcov \
~/go2step-storaged.gcov \
~/go2step-graphd.gcov \
~/go3step-storaged.gcov \
~/go3step-master-graphd.gcov \
~/InsertPersonScenario-storaged.gcov \
~/InsertPersonScenario-graphd.gcov

You will get a merged profile which is named fbdata.afdo after that.

Recompile GraphNebula Binary with the Merged Profile

Recompile the GraphNebula Binary by passing the profile with compile option -fauto-profile.

diff --git a/cmake/nebula/GeneralCompilerConfig.cmake b/cmake/nebula/GeneralCompilerConfig.cmake
@@ -20,6 +20,8 @@ add_compile_options(-Wshadow)
 add_compile_options(-Wnon-virtual-dtor)
 add_compile_options(-Woverloaded-virtual)
 add_compile_options(-Wignored-qualifiers)
+add_compile_options(-fauto-profile=~/fbdata.afdo)

Note

When you use multiple fbdata.afdo to compile multiple times, please remember to make clean before re-compile, baucase only change the fbdata.afdo will not trigger re-compile.

Performance Test Result

Hardware & Software Environment

Key Value
CPU Processor# 2
Sockets 2
NUMA 2
CPU Type Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Cores per Processor 40C80T
Cache L1 data: 48KB L1 i: 32KB L2: 1.25MB per physical core L3: shared 60MB per processor
Memory Micron DDR4 3200MT/s 16GB16Micron DDR4 3200MT/s 16GB16
SSD Disk INTEL SSDPE2KE016T8
SSD R/W Sequential 3200 MB/s (read) / 2100 MB/s(write)
Nebula Version master with commit id 51d84a4ed7d2a032a337e3b996c927e3bc5d1415
Kernel 4.18.0-408.el8.x86_64

Test Results

Scenario Average Latency(LiB) Default Binary Optimized Binary with AutoFDO P95 Latency (LiB) Default Binary Optimized Binary with AutoFDO
FindShortestPath 1 8072.52 7260.10 1 22102.00 19108.00
2 8034.32 7218.59 2 22060.85 19006.00
3 8079.27 7257.24 3 22147.00 19053.00
4 8087.66 7221.39 4 22143.00 19050.00
5 8044.77 7239.85 5 22181.00 19055.00
STDDEVP 20.57 17.34 STDDEVP 41.41 32.36
Mean 8063.71 7239.43 Mean 22126.77 19054.40
STDDEVP/Mean 0.26% 0.24% STDDEVP/Mean 0.19% 0.17%
Opt/Default 100.00% 10.22% Opt/Default 100.00% 13.89%
Go1Step 1 422.53 418.37 1 838.00 850.00
2 432.37 402.44 2 866.00 815.00
3 437.45 407.98 3 874.00 836.00
4 429.16 408.38 4 858.00 838.00
5 446.38 411.32 5 901.00 837.00
STDDEVP 8.02 5.20 STDDEVP 20.63 11.30
Mean 433.58 409.70 Mean 867.40 835.20
STDDEVP/Mean 1.85% 1.27% STDDEVP/Mean 2.38% 1.35%
Opt/Default 100.00% 5.51% Opt/Default 100.00% 3.71%
Go2Step 1 2989.93 2824.29 1 10202.00 9656.95
2 2957.22 2834.55 2 10129.00 9632.40
3 2962.74 2818.62 3 10168.40 9624.70
4 2992.39 2817.27 4 10285.10 9647.50
5 2934.85 2834.91 5 10025.00 9699.65
STDDEVP 21.53 7.57 STDDEVP 85.62 26.25
Mean 2967.43 2825.93 Mean 10161.90 9652.24
STDDEVP/Mean 0.73% 0.27% STDDEVP/Mean 0.84% 0.27%
Opt/Default 100.00% 4.77% Opt/Default 100.00% 5.02%
Go3Step 1 93551.97 89406.96 1 371359.55 345433.50
2 92418.43 89977.25 2 368868.00 352375.20
3 92587.67 90339.25 3 365390.15 356198.55
4 93371.64 92458.95 4 373578.15 365177.75
5 94046.05 89943.44 5 373392.25 352576.00
STDDEVP 609.07 1059.54 STDDEVP 3077.38 6437.52
Mean 93195.15 90425.17 Mean 370517.62 354352.20
STDDEVP/Mean 0.65% 1.17% STDDEVP/Mean 0.83% 1.82%
Opt/Default 100.00% 2.97% Opt/Default 100.00% 4.36%
InsertPerson 1 2022.86 1937.36 1 2689.00 2633.45
2 1966.05 1935.41 2 2620.45 2555.00
3 1985.25 1953.58 3 2546.00 2593.00
4 2026.73 1887.28 4 2564.00 2394.00
5 2007.55 1964.41 5 2676.00 2581.00
STDDEVP 23.02 26.42 STDDEVP 57.45 82.62
Mean 2001.69 1935.61 Mean 2619.09 2551.29
STDDEVP/Mean 1.15% 1.37% STDDEVP/Mean 2.19% 3.24%
Opt/Default 100.00% 3.30% Opt/Default 100.00% 2.59%

Last update: February 19, 2024