Skip to content

Options for import

After editing the configuration file, run the following commands to import specified source data into the NebulaGraph database.

Import data

<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> 

Note

If the value of the properties contains Chinese characters, the encoding error may appear. Please add the following options when submitting the Spark task:

--conf spark.driver.extraJavaOptions=-Dfile.encoding=utf-8
--conf spark.executor.extraJavaOptions=-Dfile.encoding=utf-8

The following table lists command parameters.

Parameter Required Default value Description
--class  Yes - Specify the main class of the driver.
--master  Yes - Specify the URL of the master process in a Spark cluster. For more information, see master-urls. Optional values are:
local: Local Mode. Run Spark applications on a single thread. Suitable for importing small data sets in a test environment.
yarn: Run Spark applications on a YARN cluster. Suitable for importing large data sets in a production environment.
spark://HOST:PORT: Connect to the specified Spark standalone cluster.
mesos://HOST:PORT: Connect to the specified Mesos cluster.
k8s://HOST:PORT: Connect to the specified Kubernetes cluster.
-c/--config  Yes - Specify the path of the configuration file.
-h/--hive  No false Specify whether importing Hive data is supported.
-D/--dry  No false Specify whether to check the format of the configuration file. This parameter is used to check the format of the configuration file only, it does not check the validity of tags and edges configurations and does not import data. Don't add this parameter if you need to import data.
-r/--reload No - Specify the path of the reload file that needs to be reloaded.

For more Spark parameter configurations, see Spark Configuration.

Note

  • The version number of a JAR file is subject to the name of the JAR file that is actually compiled.
  • If users use the yarn mode to submit a job, see the following command, especially the two '--conf' commands in the example.
$SPARK_HOME/bin/spark-submit     --master yarn \
--class com.vesoft.nebula.exchange.Exchange \
--files application.conf \
--conf spark.driver.extraClassPath=./ \
--conf spark.executor.extraClassPath=./ \
nebula-exchange-3.8.0.jar \
-c application.conf

Import the reload file

If some data fails to be imported during the import, the failed data will be stored in the reload file. Use the parameter -r to import the data in reload file.

<spark_install_path>/bin/spark-submit --master "spark://HOST:PORT" --class com.vesoft.nebula.exchange.Exchange <nebula-exchange-2.x.y.jar_path> -c <application.conf_path> -r "<reload_file_path>" 

If the import still fails, go to Official Forum for consultation.


Last update: January 30, 2024