This topic describes how to automatically generate a template configuration file when users use NebulaGraph Exchange, and introduces the configuration file application.conf.
Before configuring the application.conf file, it is recommended to copy the file name application.conf and then edit the file name according to the file type of a data source. For example, change the file name to csv_application.conf if the file type of the data source is CSV.
The application.conf file contains the following content types:
This topic lists only some Spark parameters. For more information, see Spark Configuration.
Parameter
Type
Default value
Required
Description
spark.app.name
string
-
No
The drive name in Spark.
spark.driver.cores
int
1
No
The number of CPU cores used by a driver, only applicable to a cluster mode.
spark.driver.maxResultSize
string
1G
No
The total size limit (in bytes) of the serialized results of all partitions in a single Spark operation (such as collect). The minimum value is 1M, and 0 means unlimited.
spark.executor.memory
string
1G
No
The amount of memory used by a Spark driver which can be specified in units, such as 512M or 1G.
spark.cores.max
int
16
No
The maximum number of CPU cores of applications requested across clusters (rather than from each node) when a driver runs in a coarse-grained sharing mode on a standalone cluster or a Mesos cluster. The default value is spark.deploy.defaultCores on a Spark standalone cluster manager or the value of the infinite parameter (all available cores) on Mesos.
Users only need to configure parameters for connecting to Hive if Spark and Hive are deployed in different clusters. Otherwise, please ignore the following configurations.
Parameter
Type
Default value
Required
Description
hive.warehouse
string
-
Yes
The warehouse path in HDFS. Enclose the path in double quotes and start with hdfs://.
hive.connectionURL
string
-
Yes
The URL of a JDBC connection. For example, "jdbc:mysql://127.0.0.1:3306/hive_spark?characterEncoding=UTF-8".
The addresses of all Graph services, including IPs and ports, separated by commas (,). Example: ["ip1:port1","ip2:port2","ip3:port3"].
nebula.address.meta
list[string]
["127.0.0.1:9559"]
Yes
The addresses of all Meta services, including IPs and ports, separated by commas (,). Example: ["ip1:port1","ip2:port2","ip3:port3"].
nebula.user
string
-
Yes
The username with write permissions for NebulaGraph.
nebula.pswd
string
-
Yes
The account password.
nebula.space
string
-
Yes
The name of the graph space where data needs to be imported.
nebula.ssl.enable.graph
bool
false
Yes
Enables the SSL encryption between Exchange and Graph services. If the value is true, the SSL encryption is enabled and the following SSL parameters take effect. If Exchange is run on a multi-machine cluster, you need to store the corresponding files in the same path on each machine when setting the following SSL-related paths.
nebula.ssl.sign
string
ca
Yes
Specifies the SSL sign. Optional values are ca and self.
nebula.ssl.ca.param.caCrtFilePath
string
Specifies the storage path of the CA certificate. It takes effect when the value of nebula.ssl.sign is ca.
nebula.ssl.ca.param.crtFilePath
string
"/path/crtFilePath"
Yes
Specifies the storage path of the CRT certificate. It takes effect when the value of nebula.ssl.sign is ca.
nebula.ssl.ca.param.keyFilePath
string
"/path/keyFilePath"
Yes
Specifies the storage path of the key file. It takes effect when the value of nebula.ssl.sign is ca.
nebula.ssl.self.param.crtFilePath
string
"/path/crtFilePath"
Yes
Specifies the storage path of the CRT certificate. It takes effect when the value of nebula.ssl.sign is self.
nebula.ssl.self.param.keyFilePath
string
"/path/keyFilePath"
Yes
Specifies the storage path of the key file. It takes effect when the value of nebula.ssl.sign is self.
nebula.ssl.self.param.password
string
"nebula"
Yes
Specifies the storage path of the password. It takes effect when the value of nebula.ssl.sign is self.
nebula.path.local
string
"/tmp"
No
The local SST file path which needs to be set when users import SST files.
nebula.path.remote
string
"/sst"
No
The remote SST file path which needs to be set when users import SST files.
nebula.path.hdfs.namenode
string
"hdfs://name_node:9000"
No
The NameNode path which needs to be set when users import SST files.
nebula.connection.timeout
int
3000
No
The timeout set for Thrift connections. Unit: ms.
nebula.connection.retry
int
3
No
Retries set for Thrift connections.
nebula.execution.retry
int
3
No
Retries set for executing nGQL statements.
nebula.error.max
int
32
No
The maximum number of failures during the import process. When the number of failures reaches the maximum, the Spark job submitted will stop automatically .
nebula.error.output
string
/tmp/errors
No
The path to output error logs. Failed nGQL statement executions are saved in the error log.
nebula.rate.limit
int
1024
No
The limit on the number of tokens in the token bucket when importing data.
nebula.rate.timeout
int
1000
No
The timeout period for getting tokens from a token bucket. Unit: milliseconds.
Note
NebulaGraph doesn't support vertices without tags by default. To import vertices without tags, enable vertices without tags in the NebulaGraph cluster and then add parameter nebula.enableTagless to the Exchange configuration with the value true. For example:
For different data sources, the vertex configurations are different. There are many general parameters and some specific parameters. General parameters and specific parameters of different data sources need to be configured when users configure vertices.
Specify an import method. Optional values are client and SST.
tags.writeMode
string
INSERT
No
Types of batch operations on data, including batch inserts, updates, and deletes. Optional values are INSERT, UPDATE, DELETE.
tags.deleteEdge
string
false
No
Whether or not to delete the related incoming and outgoing edges of the vertices when performing a batch delete operation. This parameter takes effect when tags.writeMode is DELETE.
tags.fields
list[string]
-
Yes
The header or column name of the column corresponding to properties. If there is a header or a column name, please use that name directly. If a CSV file does not have a header, use the form of [_c0, _c1, _c2] to represent the first column, the second column, the third column, and so on.
tags.nebula.fields
list[string]
-
Yes
Property names defined in NebulaGraph, the order of which must correspond to tags.fields. For example, [_c1, _c2] corresponds to [name, age], which means that values in the second column are the values of the property name, and values in the third column are the values of the property age.
tags.vertex.field
string
-
Yes
The column of vertex IDs. For example, when a CSV file has no header, users can use _c0 to indicate values in the first column are vertex IDs.
tags.vertex.udf.separator
string
-
No
Support merging multiple columns by custom rules. This parameter specifies the join character.
tags.vertex.udf.oldColNames
list
-
No
Support merging multiple columns by custom rules. This parameter specifies the names of the columns to be merged. Multiple columns are separated by commas.
tags.vertex.udf.newColName
string
-
No
Support merging multiple columns by custom rules. This parameter specifies the new column name.
tags.vertex.prefix
string
-
No
Add the specified prefix to the VID. For example, if the VID is 12345, adding the prefix tag1 will result in tag1_12345. The underscore cannot be modified.
tags.vertex.policy
string
-
No
Supports only the value hash. Performs hashing operations on VIDs of type string.
tags.batch
int
256
Yes
The maximum number of vertices written into NebulaGraph in a single batch.
tags.partition
int
32
Yes
The number of partitions to be created when the data is written to NebulaGraph. If tags.partition ≤ 1, the number of partitions to be created in NebulaGraph is the same as that in the data source.
Specific parameters of Parquet/JSON/ORC data sources¶
Parameter
Type
Default value
Required
Description
tags.path
string
-
Yes
The path of vertex data files in HDFS. Enclose the path in double quotes and start with hdfs://.
The path of vertex data files in HDFS. Enclose the path in double quotes and start with hdfs://.
tags.separator
string
,
Yes
The separator. The default value is a comma (,). For special characters, such as the control character ^A, you can use ASCII octal \001 or UNICODE encoded hexadecimal \u0001, for the control character ^B, use ASCII octal \002 or UNICODE encoded hexadecimal \u0002, for the control character ^C, use ASCII octal \003 or UNICODE encoded hexadecimal \u0003.
The path of the source file specified to generate SST files.
tags.repartitionWithNebula
bool
true
No
Whether to repartition data based on the number of partitions of graph spaces in NebulaGraph when generating the SST file. Enabling this function can reduce the time required to DOWNLOAD and INGEST SST files.
For different data sources, configurations of edges are also different. There are general parameters and some specific parameters. General parameters and specific parameters of different data sources need to be configured when users configure edges.
For the specific parameters of different data sources for edge configurations, please refer to the introduction of specific parameters of different data sources above, and pay attention to distinguishing tags and edges.
The method specified to import data. Optional values are client and SST.
edges.writeMode
string
INSERT
No
Types of batch operations on data, including batch inserts, updates, and deletes. Optional values are INSERT, UPDATE, DELETE.
edges.fields
list[string]
-
Yes
The header or column name of the column corresponding to properties. If there is a header or column name, please use that name directly. If a CSV file does not have a header, use the form of [_c0, _c1, _c2] to represent the first column, the second column, the third column, and so on.
edges.nebula.fields
list[string]
-
Yes
Edge names defined in NebulaGraph, the order of which must correspond to edges.fields. For example, [_c2, _c3] corresponds to [start_year, end_year], which means that values in the third column are the values of the start year, and values in the fourth column are the values of the end year.
edges.source.field
string
-
Yes
The column of source vertices of edges. For example, _c0 indicates a value in the first column that is used as the source vertex of an edge.
edges.source.prefix
string
-
No
Add the specified prefix to the VID. For example, if the VID is 12345, adding the prefix tag1 will result in tag1_12345. The underscore cannot be modified.
edges.source.policy
string
-
No
Supports only the value hash. Performs hashing operations on VIDs of type string.
edges.target.field
string
-
Yes
The column of destination vertices of edges. For example, _c0 indicates a value in the first column that is used as the destination vertex of an edge.
edges.target.prefix
string
-
No
Add the specified prefix to the VID. For example, if the VID is 12345, adding the prefix tag1 will result in tag1_12345. The underscore cannot be modified.
edges.target.policy
string
-
No
Supports only the value hash. Performs hashing operations on VIDs of type string.
edges.ranking
int
-
No
The column of rank values. If not specified, all rank values are 0 by default.
edges.batch
int
256
Yes
The maximum number of edges written into NebulaGraph in a single batch.
edges.partition
int
32
Yes
The number of partitions to be created when the data is written to NebulaGraph. If edges.partition ≤ 1, the number of partitions to be created in NebulaGraph is the same as that in the data source.
The path of the source file specified to generate SST files.
edges.repartitionWithNebula
bool
true
No
Whether to repartition data based on the number of partitions of graph spaces in NebulaGraph when generating the SST file. Enabling this function can reduce the time required to DOWNLOAD and INGEST SST files.