NebulaGraph Importer¶
NebulaGraph Importer (Importer) is a standalone tool for importing data from CSV files into NebulaGraph. Importer can read the local CSV file and then import the data into the NebulaGraph database.
Scenario¶
Importer is used to import the contents of a local CSV file into the NebulaGraph.
Advantage¶
- Lightweight and fast: no complex environment can be used, fast data import.
- Flexible filtering: You can flexibly filter CSV data through configuration files.
Release note¶
Prerequisites¶
Before using NebulaGraph Importer, make sure:
-
NebulaGraph service has been deployed. There are currently three deployment modes:
- Schema is created in NebulaGraph, including space, Tag and Edge type, or set by parameter
clientSettings.postStart.commands
.
- Golang environment has been deployed on the machine running the Importer. For details, see Build Go environment.
Steps¶
Configure the YAML file and prepare the CSV file to be imported to use the tool to batch write data to NebulaGraph.
Download binary package and run¶
-
Download the binary package directly and add execute permission to it.
-
Start the service.
$ ./<binary_package_name> --config <yaml_config_file_path>
Source code compile and run¶
-
Clone repository.
$ git clone -b release-3.4 https://github.com/vesoft-inc/nebula-importer.git
Note
Use the correct branch. NebulaGraph 2.x and 3.x have different RPC protocols.
-
Access the directory
nebula-importer
.$ cd nebula-importer
-
Compile the source code.
$ make build
-
Start the service.
$ ./nebula-importer --config <yaml_config_file_path>
Note
For details about the YAML configuration file, see configuration file description at the end of topic.
No network compilation mode¶
If the server cannot be connected to the Internet, it is recommended to upload the source code and various dependency packages to the corresponding server for compilation on the machine that can be connected to the Internet. The operation steps are as follows:
-
Clone repository.
$ git clone -b release-3.4 https://github.com/vesoft-inc/nebula-importer.git
-
Use the following command to download and package the dependent source code.
$ cd nebula-importer $ go mod vendor $ cd .. && tar -zcvf nebula-importer.tar.gz nebula-importer
-
Upload the compressed package to a server that cannot be connected to the Internet.
-
Unzip and compile.
$ tar -zxvf nebula-importer.tar.gz $ cd nebula-importer $ go build -mod vendor cmd/importer.go
Run in Docker mode¶
Instead of installing the Go locale locally, you can use Docker to pull the image of the NebulaGraph Importer and mount the local configuration file and CSV data file into the container. The command is as follows:
$ docker run --rm -ti \
--network=host \
-v <config_file>:<config_file> \
-v <csv_data_dir>:<csv_data_dir> \
vesoft/nebula-importer:<version>
--config <config_file>
<config_file>
: The absolute path to the local YAML configuration file.<csv_data_dir>
: The absolute path to the local CSV data file.<version>
: NebulaGraph 2.x Please fill in 'v3'.
Note
A relative path is recommended. If you use a local absolute path, check that the path maps to the path in the Docker.
Configuration File Description¶
NebulaGraph Importer uses configuration(nebula-importer/examples/v2/example.yaml
) files to describe information about the files to be imported, the NebulaGraph server, and more. You can refer to the example configuration file: Configuration without Header/Configuration with Header. This section describes the fields in the configuration file by category.
Note
If users download a binary package, create the configuration file manually.
Basic configuration¶
The example configuration is as follows:
version: v2
description: example
removeTempFiles: false
Parameter | Default value | Required | Description |
---|---|---|---|
version |
v2 | Yes | Target version of the configuration file. |
description |
example | No | Description of the configuration file. |
removeTempFiles |
false | No | Whether to delete temporarily generated logs and error data files. |
Client configuration¶
The client configuration stores the configurations associated with NebulaGraph.
The example configuration is as follows:
clientSettings:
retry: 3
concurrency: 10
channelBufferSize: 128
space: test
connection:
user: user
password: password
address: 192.168.*.13:9669,192.168.*.14:9669
postStart:
commands: |
UPDATE CONFIGS storage:wal_ttl=3600;
UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = true };
afterPeriod: 8s
preStop:
commands: |
UPDATE CONFIGS storage:wal_ttl=86400;
UPDATE CONFIGS storage:rocksdb_column_family_options = { disable_auto_compactions = false };
Parameter | Default value | Required | Description |
---|---|---|---|
clientSettings.retry |
3 | No | Retry times of nGQL statement execution failures. |
clientSettings.concurrency |
10 | No | Number of NebulaGraph client concurrency. |
clientSettings.channelBufferSize |
128 | No | Cache queue size per NebulaGraph client. |
clientSettings.space |
- | Yes | Specifies the NebulaGraph space to import the data into. Do not import multiple spaces at the same time to avoid performance impact. |
clientSettings.connection.user |
- | Yes | NebulaGraph user name. |
clientSettings.connection.password |
- | Yes | The password for the NebulaGraph user name. |
clientSettings.connection.address |
- | Yes | Addresses and ports for all Graph services. |
clientSettings.postStart.commands |
- | No | Configure some of the operations to perform after connecting to the NebulaGraph server, and before inserting data. |
clientSettings.postStart.afterPeriod |
- | No | The interval, between executing the above commands and executing the insert data command, such as 8s . |
clientSettings.preStop.commands |
- | No | Configure some of the actions you performed before disconnecting from the NebulaGraph server. |
File configuration¶
File configuration Stores the configuration of data files and logs, and details about the Schema.
File and log configuration¶
The example configuration is as follows:
workingDir: ./data/
logPath: ./err/test.log
files:
- path: ./student.csv
failDataPath: ./err/student.csv
batchSize: 128
limit: 10
inOrder: false
type: csv
csv:
withHeader: false
withLabel: false
delimiter: ","
lazyQuotes: false
Parameter | Default value | Required | Description |
---|---|---|---|
workingDir |
- | No | If you have multiple directories containing data with the same file structure, you can use this parameter to switch between them. For example, the value of path and failDataPath of the configuration below will be automatically changed to ./data/student.csv and ./data/err/student . If you change workingDir to ./data1 , the path will be changed accordingly. The param can be either absolute or relative. |
logPath |
- | No | Path for exporting log information, such as errors during import. |
files.path |
- | Yes | Path for storing data files. If a relative path is used, the path is merged with the current configuration file directory. You can use an asterisk (*) for fuzzy matching to import multiple files with similar names, but the files need to be the same structure. |
files.failDataPath |
- | Yes | Insert the failed data file storage path, so that data can be written later. |
files.batchSize |
128 | No | The number of statements inserting data in a batch. |
files.limit |
- | No | Limit on the number of rows of read data. |
files.inOrder |
- | No | Whether to insert rows in the file in order. If the value is set to false , the import rate decreases due to data skew. |
files.type |
- | Yes | The file type. |
files.csv.withHeader |
false |
Yes | Whether there is a header. |
files.csv.withLabel |
false |
Yes | Whether there is a label. |
files.csv.delimiter |
"," |
Yes | Specifies the delimiter for the CSV file. A string delimiter that supports only one character. |
lazyQuotes |
false |
No | If lazyQuotes is true, a quote may appear in an unquoted field and a non-doubled quote may appear in a quoted field. |
Schema configuration¶
Schema configuration describes the Meta information of the current data file. Schema types are vertex and edge. Multiple vertexes or edges can be configured at the same time.
- vertex configuration
The example configuration is as follows:
schema:
type: vertex
vertex:
vid:
index: 1
function: hash
prefix: abc
tags:
- name: student
props:
- name: age
type: int
index: 2
- name: name
type: string
index: 1
- name: gender
type: string
- name: phone
type: string
nullable: true
- name: wechat
type: string
nullable: true
nullValue: "__NULL__"
Parameter | Default value | Required | Description |
---|---|---|---|
files.schema.type |
- | Yes | Schema type. Possible values are vertex and edge . |
files.schema.vertex.vid.index |
- | No | The vertex ID corresponds to the column number in the CSV file. |
files.schema.vertex.vid.function |
- | No | Functions to generate the VIDs. Currently, we only support function hash . |
files.schema.vertex.vid.prefix |
- | No | Add prefix to the original vid. When function is specified also, prefix is applied to the original vid before function . |
files.schema.vertex.tags.name |
- | Yes | Tag name. |
files.schema.vertex.tags.props.name |
- | Yes | Tag property name, which must match the Tag property in the NebulaGraph. |
files.schema.vertex.tags.props.type |
- | Yes | Property data type, supporting bool ,int ,float ,double ,string ,time ,timestamp ,date ,datetime ,geography ,geography(point) ,geography(linestring) and geography(polygon) . |
files.schema.vertex.tags.props.index |
- | No | Property corresponds to the sequence number of the column in the CSV file. |
files.schema.vertex.tags.props.nullable |
false |
No | Whether this prop property can be NULL , optional values is true or false . |
files.schema.vertex.tags.props.nullValue |
"" |
No | Ignored when nullable is false. The property is set to NULL when the value is equal to nullValue. |
files.schema.vertex.tags.props.alternativeIndices |
- | No | Ignored when nullable is false . When the property value is not nullValue , the value is fetched from csv according to the index sequence. |
files.schema.vertex.tags.props.defaultValue |
- | No | Ignored when nullable is false. The property default value, when all the values obtained by index and alternativeIndices are nullValue . |
Note
The sequence numbers of the columns in the CSV file start from 0, that is, the sequence numbers of the first column are 0, and the sequence numbers of the second column are 1.
- edge configuration
The example configuration is as follows:
schema:
type: edge
edge:
name: follow
srcVID:
index: 0
function: hash
dstVID:
index: 1
function:
rank:
index: 2
props:
- name: grade
type: int
index: 3
Parameter | Default value | Required | Description |
---|---|---|---|
files.schema.type |
- | Yes | Schema type. Possible values are vertex and edge . |
files.schema.edge.name |
- | Yes | Edge type name. |
files.schema.edge.srcVID.index |
- | No | The data type of the starting vertex ID of the edge. |
files.schema.edge.srcVID.function |
- | No | Functions to generate the source vertex. Currently, we only support function hash . |
files.schema.edge.dstVID.index |
- | No | The destination vertex ID of the edge corresponds to the column number in the CSV file. |
files.schema.edge.dstVID.function |
- | No | Functions to generate the destination vertex. Currently, we only support function hash . |
files.schema.edge.rank.index |
- | No | The rank value of the edge corresponds to the column number in the CSV file. |
files.schema.edge.props.name |
- | Yes | The Edge Type property name must match the Edge Type property in the NebulaGraph. |
files.schema.edge.props.type |
- | Yes | Property data type, supporting bool , int , float , double , timestamp , string , and geo . |
files.schema.edge.props.index |
- | No | Property corresponds to the sequence number of the column in the CSV file. |
About the CSV file header¶
According to whether the CSV file has a header or not, the Importer needs to make different Settings on the configuration file. For relevant examples and explanations, please refer to: