Skip to content

Nebula Exchange

Nebula Exchange (hereinafter referred to as Exchange) is an Apache Spark™ application for migrating data into Nebula Graph from distributed systems. Exchange supports the migration of migrating batch data and stream data of different formats.

Use cases

Exchange applies to transforming the following data into vertices and edges in Nebula Graph:

  • Stream data stored in Kafka or Pulsar, including Logs, online shopping records, online game player activities, social network information, financial trading data, and geospatial service data.
  • Telemeasuring data recorded by equipment connected to IDCs.
  • Batch data stored in relational databases such as MySQL or distributed file systems such as HDFS.

Benefits

  • Adaptable. Exchange supports importing data with many different formats and sources into the Nebula Graph for easy data migration.
  • Supports SST import. Exchange can transform data from different sources into SST files for importing.

    Note

    SST import is only supported on Linux.

  • Supports breakpoint continuous transmission. To save time and improve efficiency, Exchange can continue the data transmission after the transmission is stopped.

    Note

    For now, breakpoint continuous transmission is only supported when importing Neo4j data.

  • Asynchronous operations. Exchange generates a writing statement and then sends it to the Graph Service for data insertion.
  • Flexible. Exchange supports importing data with multiple tags and edge types that originated from different data formats or sources.
  • Supports statistics. Exchange uses Apache Spark™ Accumulators to make statistics for successful and failed insertion operations.
  • Easy to use. Exchange applies the Human-Optimized Config Object Notation (HOCON) format for configuration files. HOCON is object-oriented and easy to understand and use.

Data formats and origins

Exchange 2.0 can migrate data with the following formats or origins.

  • Data stored in HDFS, including:
    • Apache Parquet
    • Apache ORC
    • JSON
    • CSV
  • Apache HBase™
  • Data warehouse: Hive
  • Graph database:Neo4j
  • Relational database:MySQL
  • Event streaming platform:Apache Kafka®
  • Message publishing/subscribing platform: Apache Pulsar 2.4.5

Last update: April 28, 2021