Export data from NebulaGraph¶
The Exchange allows you to export data from NebulaGraph to a CSV file or another NebulaGraph space (supporting different NebulaGraph clusters). This topic describes the specific procedure.
Enterpriseonly
Only Exchange Enterprise Edition supports exporting data from NebulaGraph.
Preparation¶
This example is completed on a virtual machine equipped with Linux. The hardware and software you need to prepare before exporting data are as follows.
Hardware¶
Type | Information |
---|---|
CPU | 4 Intel(R) Xeon(R) Platinum 8260 CPU @ 2.30GHz |
Memory | 16G |
Hard disk | 50G |
System¶
CentOS 7.9.2009
Software¶
Name | Version |
---|---|
JDK | 1.8.0 |
Scala | 2.12.11 |
Spark | 2.4.7 |
NebulaGraph | 3.5.0 |
Dataset¶
As the data source, NebulaGraph stores the basketballplayer dataset in this example, the Schema elements of which are shown as follows.
Element | Name | Property |
---|---|---|
Tag | player |
name string, age int |
Tag | team |
name string |
Edge type | follow |
degree int |
Edge type | serve |
start_year int, end_year int |
Steps¶
-
Get the JAR file of Exchange Enterprise Edition from the NebulaGraph Enterprise Edition Package.
-
Modify the configuration file.
Exchange Enterprise Edition provides the configuration template
export_to_csv.conf
andexport_to_nebula.conf
for exporting NebulaGraph data. For details, see Exchange parameters. The core content of the configuration file used in this example is as follows:- Export to a CSV file:
# Use the command to submit the exchange job: # spark-submit \ # --master "spark://master_ip:7077" \ # --driver-memory=2G --executor-memory=30G \ # --total-executor-cores=60 --executor-cores=20 \ # --class com.vesoft.nebula.exchange.Exchange \ # nebula-exchange-3.0-SNAPSHOT.jar -c export_to_csv.conf { # Spark config spark: { app: { name: NebulaGraph Exchange } } # Nebula Graph config # if you export nebula data to csv, please ignore these nebula config nebula: { address:{ graph:["127.0.0.1:9669"] # the address of any of the meta services. # if your NebulaGraph server is in virtual network like k8s, please config the leader address of meta. meta:["127.0.0.1:9559"] } user: root pswd: nebula space: test # nebula client connection parameters connection { # socket connect & execute timeout, unit: millisecond timeout: 30000 } error: { # max number of failures, if the number of failures is bigger than max, then exit the application. max: 32 # failed data will be recorded in output path, format with ngql output: /tmp/errors } # use google's RateLimiter to limit the requests send to NebulaGraph rate: { # the stable throughput of RateLimiter limit: 1024 # Acquires a permit from RateLimiter, unit: MILLISECONDS # if it can't be obtained within the specified timeout, then give up the request. timeout: 1000 } } # Processing tags tags: [ { # you can ignore the tag name when export nebula data to csv name: tag-name-1 type: { source: nebula sink: csv } metaAddress:"127.0.0.1:9559" space:"test" label:"person" # config the fields you want to export from nebula fields: [nebula-field-0, nebula-field-1, nebula-field-2] noFields:false # default false, if true, just export id partition: 60 # config the path to save your csv file. if your file in not in hdfs, config "file:///path/ test.csv" path: "hdfs://ip:port/path/person" separator: "," header: true } ] # process edges edges: [ { # you can ignore the edge name when export nebula data to csv name: edge-name-1 type: { source: nebula sink: csv } metaAddress:"127.0.0.1:9559" space:"test" label:"friend" # config the fields you want to export from nebula fields: [nebula-field-0, nebula-field-1, nebula-field-2] noFields:false # default false, if true, just export id partition: 60 # config the path to save your csv file. if your file in not in hdfs, config "file:///path/ test.csv" path: "hdfs://ip:port/path/friend" separator: "," header: true } ] }
- Export to another graph space:
# Use the command to submit the exchange job: # spark-submit \ # --master "spark://master_ip:7077" \ # --driver-memory=2G --executor-memory=30G \ # --total-executor-cores=60 --executor-cores=20 \ # --class com.vesoft.nebula.exchange.Exchange \ # nebula-exchange-3.0-SNAPSHOT.jar -c export_to_nebula.conf { # Spark config spark: { app: { name: NebulaGraph Exchange } } # Nebula Graph config, just config the sink nebula information nebula: { address:{ graph:["127.0.0.1:9669"] # the address of any of the meta services meta:["127.0.0.1:9559"] } user: root pswd: nebula space: test # nebula client connection parameters connection { # socket connect & execute timeout, unit: millisecond timeout: 30000 } error: { # max number of failures, if the number of failures is bigger than max, then exit the application. max: 32 # failed data will be recorded in output path, format with ngql output: /tmp/errors } # use google's RateLimiter to limit the requests send to NebulaGraph rate: { # the stable throughput of RateLimiter limit: 1024 # Acquires a permit from RateLimiter, unit: MILLISECONDS # if it can't be obtained within the specified timeout, then give up the request. timeout: 1000 } } # Processing tags tags: [ { name: tag-name-1 type: { source: nebula sink: client } # data source nebula config metaAddress:"127.0.0.1:9559" space:"test" label:"person" # mapping the fields of the original NebulaGraph to the fields of the target NebulaGraph. fields: [source_nebula-field-0, source_nebula-field-1, source_nebula-field-2] nebula.fields: [target_nebula-field-0, target_nebula-field-1, target_nebula-field-2] limit:10000 vertex: _vertexId # must be `_vertexId` # udf:{ # separator:"_" # oldColNames:[field-0,field-1,field-2] # newColName:new-field # } batch: 2000 partition: 60 } ] # process edges edges: [ { name: edge-name-1 type: { source: csv sink: client } # data source nebula config metaAddress:"127.0.0.1:9559" space:"test" label:"friend" fields: [source_nebula-field-0, source_nebula-field-1, source_nebula-field-2] nebula.fields: [target_nebula-field-0, target_nebula-field-1, target_nebula-field-2] limit:1000 source: _srcId # must be `_srcId` # udf:{ # separator:"_" # oldColNames:[field-0,field-1,field-2] # newColName:new-field # } target: _dstId # must be `_dstId` # udf:{ # separator:"_" # oldColNames:[field-0,field-1,field-2] # newColName:new-field # } ranking: source_nebula-field-2 batch: 2000 partition: 60 } ] }
-
Export data from NebulaGraph with the following command.
Note
The parameters of the Driver and Executor process can be modified based on your own machine configuration.
<spark_install_path>/bin/spark-submit --master "spark://<master_ip>:7077" \ --driver-memory=2G --executor-memory=30G \ --total-executor-cores=60 --executor-cores=20 \ --class com.vesoft.nebula.exchange.Exchange nebula-exchange-x.y.z.jar_path> \ -c <conf_file_path>
The following is an example command to export the data to a CSV file.
$ ./spark-submit --master "spark://192.168.10.100:7077" \ --driver-memory=2G --executor-memory=30G \ --total-executor-cores=60 --executor-cores=20 \ --class com.vesoft.nebula.exchange.Exchange ~/exchange-ent/nebula-exchange-ent-3.5.0.jar \ -c ~/exchange-ent/export_to_csv.conf
-
Check the exported data.
-
Export to a CSV file:
Check whether the CSV file is successfully generated under the target path, and check the contents of the CSV file to ensure that the data export is successful.
$ hadoop fs -ls /vertex/player Found 11 items -rw-r--r-- 3 nebula supergroup 0 2021-11-05 07:36 /vertex/player/_SUCCESS -rw-r--r-- 3 nebula supergroup 160 2021-11-05 07:36 /vertex/player/ part-00000-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 163 2021-11-05 07:36 /vertex/player/ part-00001-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 172 2021-11-05 07:36 /vertex/player/ part-00002-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 172 2021-11-05 07:36 /vertex/player/ part-00003-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 144 2021-11-05 07:36 /vertex/player/ part-00004-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 173 2021-11-05 07:36 /vertex/player/ part-00005-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 160 2021-11-05 07:36 /vertex/player/ part-00006-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 148 2021-11-05 07:36 /vertex/player/ part-00007-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 125 2021-11-05 07:36 /vertex/player/ part-00008-17293020-ba2e-4243-b834-34495c0536b3-c000.csv -rw-r--r-- 3 nebula supergroup 119 2021-11-05 07:36 /vertex/player/ part-00009-17293020-ba2e-4243-b834-34495c0536b3-c000.csv
-
Export to another graph space:
Log in to the new graph space and check the statistics through
SUBMIT JOB STATS
andSHOW STATS
commands to ensure the data export is successful.
-