Enable AutoFDO for NebulaGraph¶
The AutoFDO can analyze the performance of an optimized program and use the program's performance information to guide the compiler to re-optimize the program. This document will help you to enable the AutoFDO for NebulaGraph.
More information about the AutoFDO, please refer AutoFDO Wiki.
Resource Preparations¶
Install Dependencies¶
-
Install perf
sudo apt-get update sudo apt-get install -y linux-tools-common \ linux-tools-generic \ linux-tools-`uname -r`
-
Install autofdo tool
sudo apt-get update sudo apt-get install -y autofdo
Or you can compile the autofdo tool from source.
NebulaGraph Binary with Debug Version¶
For how to build NebulaGraph from source, please refer to the official document: Install NebulaGraph by compiling the source code.
In the configure step, replace CMAKE_BUILD_TYPE=Release
with CMAKE_BUILD_TYPE=RelWithDebInfo
as below:
$ cmake -DCMAKE_INSTALL_PREFIX=/usr/local/nebula -DENABLE_TESTING=OFF -DCMAKE_BUILD_TYPE=RelWithDebInfo ..
Prepare Test Data¶
In our test environment, we use NebulaGraph Bench to prepare the test data and collect the profile data by running the FindShortestPath, Go1Step, Go2Step, Go3Step, InsertPersonScenario 5 scenarios.
Note
You can use your TopN queries in your production environment to collect the profile data, the performance can gain more in your environment.
Prepare Profile Data¶
Collect Perf Data For AutoFdo Tool¶
-
After the test data preparation work done. Collect the perf data for different scenarios. Get the pid of
storaged
,graphd
,metad
.$ nebula.service status all [INFO] nebula-metad: Running as 305422, Listening on 9559 [INFO] nebula-graphd: Running as 305516, Listening on 9669 [INFO] nebula-storaged: Running as 305707, Listening on 9779
-
Start the perf record for nebula-graphd and nebula-storaged.
perf record -p 305516,305707 -b -e br_inst_retired.near_taken:pp -o ~/FindShortestPath.data
Note
Because the
nebula-metad
service contribution percent is small compared withnebula-graphd
andnebula-storaged
services. To reduce effort, we didn't collect the perf data fornebula-metad
service. -
Start the benchmark test for FindShortestPath scenario.
cd NebulaGraph-Bench python3 run.py stress run -s benchmark -scenario find_path.FindShortestPath -a localhost:9669 --args='-u 100 -i 100000'
-
After the benchmark finished, end the perf record by Ctrl + c.
-
Repeat above steps to collect corresponding profile data for the rest Go1Step, Go2Step, Go3Step and InsertPersonScenario scenarios.
Create Gcov File¶
create_gcov --binary=$NEBULA_HOME/bin/nebula-storaged \
--profile=~/FindShortestPath.data \
--gcov=~/FindShortestPath-storaged.gcov \
-gcov_version=1
create_gcov --binary=$NEBULA_HOME/bin/nebula-graphd \
--profile=~/FindShortestPath.data \
--gcov=~/FindShortestPath-graphd.gcov \
-gcov_version=1
Repeat for Go1Step, Go2Step, Go3Step and InsertPersonScenario scenarios.
Merge the Profile Data¶
profile_merger ~/FindShortestPath-graphd.gcov \
~/FindShortestPath-storaged.gcov \
~/go1step-storaged.gcov \
~/go1step-graphd.gcov \
~/go2step-storaged.gcov \
~/go2step-graphd.gcov \
~/go3step-storaged.gcov \
~/go3step-master-graphd.gcov \
~/InsertPersonScenario-storaged.gcov \
~/InsertPersonScenario-graphd.gcov
You will get a merged profile which is named fbdata.afdo
after that.
Recompile GraphNebula Binary with the Merged Profile¶
Recompile the GraphNebula Binary by passing the profile with compile option -fauto-profile
.
diff --git a/cmake/nebula/GeneralCompilerConfig.cmake b/cmake/nebula/GeneralCompilerConfig.cmake
@@ -20,6 +20,8 @@ add_compile_options(-Wshadow)
add_compile_options(-Wnon-virtual-dtor)
add_compile_options(-Woverloaded-virtual)
add_compile_options(-Wignored-qualifiers)
+add_compile_options(-fauto-profile=~/fbdata.afdo)
Note
When you use multiple fbdata.afdo to compile multiple times, please remember to make clean
before re-compile, baucase only change the fbdata.afdo will not trigger re-compile.
Performance Test Result¶
Hardware & Software Environment¶
Key | Value |
---|---|
CPU Processor# | 2 |
Sockets | 2 |
NUMA | 2 |
CPU Type | Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz |
Cores per Processor | 40C80T |
Cache | L1 data: 48KB L1 i: 32KB L2: 1.25MB per physical core L3: shared 60MB per processor |
Memory | Micron DDR4 3200MT/s 16GB16Micron DDR4 3200MT/s 16GB16 |
SSD Disk | INTEL SSDPE2KE016T8 |
SSD R/W Sequential | 3200 MB/s (read) / 2100 MB/s(write) |
Nebula Version | master with commit id 51d84a4ed7d2a032a337e3b996c927e3bc5d1415 |
Kernel | 4.18.0-408.el8.x86_64 |
Test Results¶
Scenario | Average Latency(LiB) | Default Binary | Optimized Binary with AutoFDO | P95 Latency (LiB) | Default Binary | Optimized Binary with AutoFDO |
---|---|---|---|---|---|---|
FindShortestPath | 1 | 8072.52 | 7260.10 | 1 | 22102.00 | 19108.00 |
2 | 8034.32 | 7218.59 | 2 | 22060.85 | 19006.00 | |
3 | 8079.27 | 7257.24 | 3 | 22147.00 | 19053.00 | |
4 | 8087.66 | 7221.39 | 4 | 22143.00 | 19050.00 | |
5 | 8044.77 | 7239.85 | 5 | 22181.00 | 19055.00 | |
STDDEVP | 20.57 | 17.34 | STDDEVP | 41.41 | 32.36 | |
Mean | 8063.71 | 7239.43 | Mean | 22126.77 | 19054.40 | |
STDDEVP/Mean | 0.26% | 0.24% | STDDEVP/Mean | 0.19% | 0.17% | |
Opt/Default | 100.00% | 10.22% | Opt/Default | 100.00% | 13.89% | |
Go1Step | 1 | 422.53 | 418.37 | 1 | 838.00 | 850.00 |
2 | 432.37 | 402.44 | 2 | 866.00 | 815.00 | |
3 | 437.45 | 407.98 | 3 | 874.00 | 836.00 | |
4 | 429.16 | 408.38 | 4 | 858.00 | 838.00 | |
5 | 446.38 | 411.32 | 5 | 901.00 | 837.00 | |
STDDEVP | 8.02 | 5.20 | STDDEVP | 20.63 | 11.30 | |
Mean | 433.58 | 409.70 | Mean | 867.40 | 835.20 | |
STDDEVP/Mean | 1.85% | 1.27% | STDDEVP/Mean | 2.38% | 1.35% | |
Opt/Default | 100.00% | 5.51% | Opt/Default | 100.00% | 3.71% | |
Go2Step | 1 | 2989.93 | 2824.29 | 1 | 10202.00 | 9656.95 |
2 | 2957.22 | 2834.55 | 2 | 10129.00 | 9632.40 | |
3 | 2962.74 | 2818.62 | 3 | 10168.40 | 9624.70 | |
4 | 2992.39 | 2817.27 | 4 | 10285.10 | 9647.50 | |
5 | 2934.85 | 2834.91 | 5 | 10025.00 | 9699.65 | |
STDDEVP | 21.53 | 7.57 | STDDEVP | 85.62 | 26.25 | |
Mean | 2967.43 | 2825.93 | Mean | 10161.90 | 9652.24 | |
STDDEVP/Mean | 0.73% | 0.27% | STDDEVP/Mean | 0.84% | 0.27% | |
Opt/Default | 100.00% | 4.77% | Opt/Default | 100.00% | 5.02% | |
Go3Step | 1 | 93551.97 | 89406.96 | 1 | 371359.55 | 345433.50 |
2 | 92418.43 | 89977.25 | 2 | 368868.00 | 352375.20 | |
3 | 92587.67 | 90339.25 | 3 | 365390.15 | 356198.55 | |
4 | 93371.64 | 92458.95 | 4 | 373578.15 | 365177.75 | |
5 | 94046.05 | 89943.44 | 5 | 373392.25 | 352576.00 | |
STDDEVP | 609.07 | 1059.54 | STDDEVP | 3077.38 | 6437.52 | |
Mean | 93195.15 | 90425.17 | Mean | 370517.62 | 354352.20 | |
STDDEVP/Mean | 0.65% | 1.17% | STDDEVP/Mean | 0.83% | 1.82% | |
Opt/Default | 100.00% | 2.97% | Opt/Default | 100.00% | 4.36% | |
InsertPerson | 1 | 2022.86 | 1937.36 | 1 | 2689.00 | 2633.45 |
2 | 1966.05 | 1935.41 | 2 | 2620.45 | 2555.00 | |
3 | 1985.25 | 1953.58 | 3 | 2546.00 | 2593.00 | |
4 | 2026.73 | 1887.28 | 4 | 2564.00 | 2394.00 | |
5 | 2007.55 | 1964.41 | 5 | 2676.00 | 2581.00 | |
STDDEVP | 23.02 | 26.42 | STDDEVP | 57.45 | 82.62 | |
Mean | 2001.69 | 1935.61 | Mean | 2619.09 | 2551.29 | |
STDDEVP/Mean | 1.15% | 1.37% | STDDEVP/Mean | 2.19% | 3.24% | |
Opt/Default | 100.00% | 3.30% | Opt/Default | 100.00% | 2.59% |