Nebula Graph CRUD

This topic describes the basic CRUD operations in Nebula Graph.

Graph space and Nebula Graph schema

A Nebula Graph instance consists of one or more graph spaces. Graph spaces are physically isolated from each other. You can use different graph spaces in the same instance to store different datasets.

Nebula Graph and graph spaces

To insert data into a graph space, define a schema for the graph database. Nebula Graph schema is based on the following components.

Schema component Description
Vertex Represents an entity in the real world. A vertex can have one or more tags.
Tag The type of a vertex. It defines a group of properties that describes a type of vertices.
Edge Represents a directed relationship between two vertices.
Edge type The type of an edge. It defines a group of properties that describes a type of edges.

For more information, see Nebula Graph schema (doc TODO).

In this topic, we use the following dataset to demonstrate basic CRUD operations.

The demo dataset

Check the machine status in the Nebula Graph cluster

First, we recommend that you check the machine status to make sure that all the Storage services are connected to the Meta Services. Run SHOW HOSTS as follows.

nebula> SHOW HOSTS;
+-------------+-----------+-----------+--------------+----------------------+------------------------+
| Host        | Port      | Status    | Leader count | Leader distribution  | Partition distribution |
+-------------+-----------+-----------+--------------+----------------------+------------------------+
| "storaged0" | 9779      | "ONLINE"  | 0            | "No valid partition" | "No valid partition"   |
+-------------+-----------+-----------+--------------+----------------------+------------------------+
| "storaged1" | 9779      | "ONLINE"  | 0            | "No valid partition" | "No valid partition"   |
+-------------+-----------+-----------+--------------+----------------------+------------------------+
| "storaged2" | 9779      | "ONLINE"  | 0            | "No valid partition" | "No valid partition"   |
+-------------+-----------+-----------+--------------+----------------------+------------------------+
| "Total"     | __EMPTY__ | __EMPTY__ | 0            | __EMPTY__            | __EMPTY__              |
+-------------+-----------+-----------+--------------+----------------------+------------------------+
Got 4 rows (time spent 1061/2251 us)

From the Status column of the table in the return message, you can see that all the Storage services are online.

Asynchronous implementation of creation and alteration

Nebula Graph implements the following creation or alteration operations asynchronously in the next heartbeat cycle. The operations won't take effect until they finish.

  • CREATE SPACE
  • CREATE TAG
  • CREATE EDGE
  • ALTER TAG
  • ALTER EDGE
  • CREATE TAG INDEX
  • CREATE EDGE INDEX

NOTE:The default heartbeat interval is 10 seconds. To modify the heartbeat interval, see Adjust heartbeat cycle (doc TODO).

To make sure the follow-up operations work as expected, take one of the following approaches:

  • Run SHOW or DESCRIBE statements accordingly to check the status of the objects, and make sure the creation or alteration is complete. If it is not, wait a few seconds and try again.
  • Wait for two heartbeat cycles, i.e., 20 seconds.

Create and use a graph space

nGQL syntax

  • Create a graph space:
    CREATE SPACE <graph_space_name> [(partition_num=<partition number>, replica_factor=<replica number>, vid_type=fixed_string(<string_number>))]
    
    Property Description
    partition_num Specifies the number of partitions in each replica. The suggested number is the number of hard disks in the cluster times 5. For example, if you have 3 hard disks in the cluster, we recommend that you set 15 partitions.
    replica_factor Specifies the number of replicas in the Nebula Graph cluster. The suggested number is 3 in a production environment and 1 in a test environment. The replica number must always be an odd number for the need of quorum-based voting.
    vid_type To insert a vertex with a string-typed VID, set the fixed_string length for the maximum VID length. The default fixed_string is 8, and recommend to be less than 64. fix_string(N) is similar to char(N) in SQL.
  • List graph spaces and check if the creation is successful:
    nebula> SHOW SPACES;
    
  • Use a graph space:
    USE <graph_space_name>
    

Examples

  1. Use the following statement to create a graph space named nba.

    nebula> CREATE SPACE nba(partition_num=15, replica_factor=1, vid_type=fixed_string(30));
    Execution succeeded (time spent 2817/3280 us)
    
  2. Check the partition distribution with SHOW HOSTS to make sure that the partitions are distributed in a balanced way.

    nebula> SHOW HOSTS;
    +-------------+-----------+-----------+--------------+---------------------+------------------------+
    | Host        | Port      | Status    | Leader count | Leader distribution | Partition distribution |
    +-------------+-----------+-----------+--------------+---------------------+------------------------+
    | "storaged0" | 9779      | "ONLINE"  | 5            | "nba:5"             | "nba:5"                |
    +-------------+-----------+-----------+--------------+---------------------+------------------------+
    | "storaged1" | 9779      | "ONLINE"  | 5            | "nba:5"             | "nba:5"                |
    +-------------+-----------+-----------+--------------+---------------------+------------------------+
    | "storaged2" | 9779      | "ONLINE"  | 5            | "nba:5"             | "nba:5"                |
    +-------------+-----------+-----------+--------------+---------------------+------------------------+
    | "Total"     | __EMPTY__ | __EMPTY__ | 15           | "nba:15"            | "nba:15"               |
    +-------------+-----------+-----------+--------------+---------------------+------------------------+
    Got 4 rows (time spent 1633/2867 us)
    

    If the Leader distribution is uneven, use BALANCE LEADER to redistribute the partitions. For more information, see BALANCE (doc TODO).

  3. Use the nba graph space.

    nebula> USE nba;
    Execution succeeded (time spent 1322/2206 us)
    

    You can use SHOW SPACES to check the graph space you created.

    nebula> SHOW SPACES;
    +------+
    | Name |
    +------+
    | nba  |
    +------+
    Got 1 rows (time spent 1235/1934 us)
    

Create tags and edge types

nGQL syntax

CREATE {TAG | EDGE} {<tag_name> | <edge_type>}(<property_name> <data_type>
[, <property_name> <data_type> ...]);

Examples

Create tags player and team, edge types follow and serve.

Component name Type Property
player Tag name (string), age (int)
team Tag name (string)
follow Edge type degree (int)
serve Edge type start_year (int), end_year (int)
nebula> CREATE TAG player(name string, age int);
Execution succeeded (time spent 2694/3116 us)

Thu, 15 Oct 2020 06:22:29 UTC

nebula> CREATE TAG team(name string);
Execution succeeded (time spent 2630/3002 us)

Thu, 15 Oct 2020 06:22:37 UTC

nebula> CREATE EDGE follow(degree int);
Execution succeeded (time spent 3087/3467 us)

Thu, 15 Oct 2020 06:22:43 UTC

nebula> CREATE EDGE serve(start_year int, end_year int);
Execution succeeded (time spent 2645/3123 us)

Thu, 15 Oct 2020 06:22:50 UTC

Insert vertices and edges

You can use the INSERT statement to insert vertices or edges based on existing tags or edge types.

nGQL syntax

  • Insert vertices:
    INSERT VERTEX <tag_name> (<property_name>[, <property_name>...])
    [, <tag_name> (<property_name>[, <property_name>...]), ...]
    {VALUES | VALUE} <vid>: (<property_value>[, <property_value>...])
    [, <vid>: (<property_value>[, <property_value>...];
    

    VID is short for vertex ID. A VID must be a unique string value in a graph space.

  • Insert edges:
    INSERT EDGE <edge_type> (<property_name>[, <property_name>...])
    {VALUES | VALUE} <src_vid> -> <dst_vid>[@<rank>] : (<property_value>[, <property_value>...])
    [, <src_vid> -> <dst_vid>[@<rank> : (<property_name>[, <property_name>...]), ...]
    

Examples

  • Insert vertices representing NBA players and teams:
    nebula> INSERT VERTEX player(name, age) VALUES "player100":("Tim Duncan", 42);
    Execution succeeded (time spent 2919/3485 us)
    
    Fri, 16 Oct 2020 03:41:00 UTC
    
    nebula> INSERT VERTEX player(name, age) VALUES "player101":("Tony Parker", 36);
    Execution succeeded (time spent 3007/3539 us)
    
    Fri, 16 Oct 2020 03:41:58 UTC
    
    nebula> INSERT VERTEX player(name, age) VALUES "player102":("LaMarcus Aldridge", 33);
    Execution succeeded (time spent 2449/2934 us)
    
    Fri, 16 Oct 2020 03:42:16 UTC
    
    nebula> INSERT VERTEX team(name) VALUES "team200":("Warriors"), "team201":("Nuggets");
    Execution succeeded (time spent 3514/4331 us)
    
    Fri, 16 Oct 2020 03:42:45 UTC
    
  • Insert edges representing the relations between NBA players and teams:
    nebula> INSERT EDGE follow(degree) VALUES "player100" -> "player101":(95);
    Execution succeeded (time spent 1488/1918 us)
    
    Wed, 21 Oct 2020 06:57:32 UTC
    
    nebula> INSERT EDGE follow(degree) VALUES "player100" -> "player102":(90);
    Execution succeeded (time spent 2483/2890 us)
    
    Wed, 21 Oct 2020 07:05:48 UTC
    
    nebula> INSERT EDGE follow(degree) VALUES "player102" -> "player101":(75);
    Execution succeeded (time spent 1208/1689 us)
    
    Wed, 21 Oct 2020 07:07:12 UTC
    
    nebula> INSERT EDGE serve(start_year, end_year) VALUES "player100" -> "team200":(1997, 2016), "player101" -> "team201":(1999,  2018);
    Execution succeeded (time spent 2170/2651 us)
    
    Wed, 21 Oct 2020 07:08:59 UTC
    

Read data

  • The GO statement traverses the database based on specific conditions. A GO traversal starts from one or more vertices, along one or more edges, and return information in a form specified in the YIELD clause.
  • The FETCH statement is used to get properties from vertices or edges.
  • The LOOKUP statement is based on indexes. It is used together with the WHERE clause to search for the data that meet the specific conditions.
  • The MATCH statement is the most commonly used statement for graph data querying. But, it relies on indexes to match data patterns in Nebula Graph.

nGQL syntax

  • GO
    GO [[<M> TO] <N> STEPS ] FROM <vertex_list>
    OVER <edge_type_list> [REVERSELY] [BIDIRECT]
    [WHERE <expression> [AND | OR expression ...])]
    YIELD [DISTINCT] <return_list>
    
  • FETCH

    • Fetch properties on tags:

      FETCH PROP ON {<tag_name> | <tag_name_list> | *} <vid_list>
      [YIELD [DISTINCT] <return_list>]
      
    • Fetch properties on edges:

      FETCH PROP ON <edge_type> <src_vid> -> <dst_vid>[@<rank>]
      [, <src_vid> -> <dst_vid> ...]
      [YIELD [DISTINCT] <return_list>]
      
  • LOOKUP
    LOOKUP ON {<tag_name> | <edge_type>} 
    WHERE <expression> [AND expression ...])]
    [YIELD <return_list>]
    
  • MATCH
    MATCH <pattern> [<WHERE clause>] RETURN <output>
    

Examples of GO

  • Find the vertices that VID "player100" follows.
    nebula> GO FROM "player100" OVER follow;
    +-------------+
    | follow._dst |
    +-------------+
    | player101   |
    +-------------+
    | player102   |
    +-------------+
    Got 2 rows (time spent 1935/2420 us)
    
  • Search for the players that the player with VID "player100" follows. Filter the players that the player with VID "player100" follows whose age is equal to or greater than 35. Rename the columns in the result with Teammate and Age.
    nebula> GO FROM "player100" OVER follow WHERE $$.player.age >= 35 \
    YIELD $$.player.name AS Teammate, $$.player.age AS Age;
    +-------------+-----+
    | Teammate    | Age |
    +-------------+-----+
    | Tony Parker | 36  |
    +-------------+-----+
    Got 1 rows (time spent 3871/4349 us)
    
    Clause/Sign Description
    YIELD Specifies what values or results you want to return from the query.
    $$ Represents the target vertices.
    \ A line-breaker.
  • Search for the players that the player with VID "player100" follows. Then Retrieve the teams of the players that the player with VID "player100" follows. To combine the two queries, use a pipe or a temporary variable.

    • With a pipe:

      nebula> GO FROM "player100" OVER follow YIELD follow._dst AS id | \
      GO FROM $-.id OVER serve YIELD $$.team.name AS Team, \
      $^.player.name AS Player;
      +---------+-------------+
      | Team    | Player      |
      +---------+-------------+
      | Nuggets | Tony Parker |
      +---------+-------------+
      Got 1 rows (time spent 2902/3496 us)
      
      Clause/Sign Description
      $^ Represents the source vertex of the edge.
      \| A pipe symbol that can combine multiple queries.
      $- Represents the output of the query before the pipe symbol.
    • With a temporary variable:

      NOTE: Once a compound statement is submitted to the server as a whole, the life cycle of the temporary variables in the statement ends.

      nebula> $var = GO FROM "player100" OVER follow YIELD follow._dst AS id; \
      GO FROM $var.id OVER serve YIELD $$.team.name AS Team, \
      $^.player.name AS Player;
      +---------+-------------+
      | Team    | Player      |
      +---------+-------------+
      | Nuggets | Tony Parker |
      +---------+-------------+
      Got 1 rows (time spent 3103/3711 us)
      

Example of FETCH

Use FETCH: Fetch the properties of the player with VID player100.

nebula> FETCH PROP ON player "player100";
+----------------------------------------------------+
| vertices_                                          |
+----------------------------------------------------+
| ("player100" :player{age: 42, name: "Tim Duncan"}) |
+----------------------------------------------------+
Got 1 rows (time spent 2006/2406 us)

Examples of Match and Lookup are provided after index is introduced in the following.

Update vertices and edges

You can use the UPDATE statement or the UPSERT statement to update existing data.

UPSERT is the combination of UPDATE and INSERT. If you update a vertex or an edge with UPSERT, it inserts a new vertex or edge if it does not exist.

Note: UPSERT operates in serial a (partition-based) order and therefore is slower comparing with INSERT OR UPDATE.

nGQL syntax

  • UPDATE vertices:
UPDATE VERTEX <vid> SET <properties to be updated>
[WHEN <condition>] [YIELD <columns>]
  • UPDATE edges:
UPDATE EDGE <source vid> -> <destination vid> [@rank] OF <edge_type>
SET <properties to be updated> [WHEN <condition>] [YIELD <columns to be output>]
  • UPSERT vertices or edges:
UPSERT {VERTEX <vid> | EDGE <edge_type>} SET <update_columns>
[WHEN <condition>] [YIELD <columns>]

Examples

  • UPDATE the name property of the vertex with VID "player100" and check the result with the FETCH statement:
    nebula> UPDATE VERTEX "player100" SET player.name = "Tim";
    Execution succeeded (time spent 3483/3914 us)
    
    Wed, 21 Oct 2020 10:53:14 UTC
    
    nebula> FETCH PROP ON player "player100";
    +---------------------------------------------+
    | vertices_                                   |
    +---------------------------------------------+
    | ("player100" :player{age: 42, name: "Tim"}) |
    +---------------------------------------------+
    Got 1 rows (time spent 2463/3042 us)
    
  • UPDATE the degree value of an edge and check the result with the FETCH statement:
    nebula> UPDATE EDGE "player100" -> "player101" OF follow SET degree = 96;
    Execution succeeded (time spent 3932/4432 us)
    
    nebula> FETCH PROP ON follow "player100" -> "player101";
    +----------------------------------------------------+
    | edges_                                             |
    +----------------------------------------------------+
    | [:follow "player100"->"player101" @0 {degree: 96}] |
    +----------------------------------------------------+
    Got 1 rows (time spent 2205/2800 us)
    
  • Insert a vertex with VID "player111" and UPSERT it.
    nebula> INSERT VERTEX player(name, age) VALUES "player111":("Ben Simmons", 22);
    Execution succeeded (time spent 2115/2900 us)
    
    Wed, 21 Oct 2020 11:11:50 UTC
    
    nebula> UPSERT VERTEX "player111" SET player.name = "Dwight Howard", player.age = $^.player.age + 11 \
    WHEN $^.player.name == "Ben Simmons" AND $^.player.age > 20 \
    YIELD $^.player.name AS Name, $^.player.age AS Age;
    +---------------+-----+
    | Name          | Age |
    +---------------+-----+
    | Dwight Howard | 33  |
    +---------------+-----+
    Got 1 rows (time spent 1815/2329 us)
    

Delete vertices and edges

nGQL syntax

  • Delete vertices:
DELETE VERTEX <vid1>[, <vid2>...]
  • Delete edges:
DELETE EDGE <edge_type> <src_vid> -> <dst_vid>[@<rank>]
[, <src_vid> -> <dst_vid>...]

Examples

  • Delete vertices:
nebula> DELETE VERTEX "team1", "team2";
Execution succeeded (time spent 4337/4782 us)
  • Delete edges:
nebula> DELETE EDGE follow "team1" -> "team2";
Execution succeeded (time spent 3700/4101 us)

About indexes

You can add indexes to tags or edge types with the CREATE INDEX statement.

Must-read for using index

  • Both MATCH and LOOKUP depend on index. But indexes can dramatically reduce the write performance. The performance reduction can be as much as 90% or even more. DO NOT use indexes in production environments unless you are fully aware of their influences on your service.
  • You MUST rebuild indexes for pre-existing data. Otherwise, the pre-existing data can't be indexed (and therefore can't be returned in Match or Lookup). For more information, see REBUILD INDEX.

nGQL syntax

Create an index:

CREATE {TAG | EDGE} INDEX [IF NOT EXISTS] <index_name>
ON {<tag_name> | <edge_name>} (prop_name_list);

Rebuild an index:

REBUILD {TAG | EDGE} INDEX <index_name>

Examples

Create an index for the name property on all vertices with the tag player.

nebula> CREATE TAG INDEX player_index_0 on player(name(20));
nebula> REBUILD TAG INDEX player_index_0;

NOTE: Define the index length when creating an index for a variable-length property. For more information, see CREATE INDEX

Examples of LOOKUP and MATCH (index-based)

Make sure there is an index for LOOKUP or MATCH to use. If there is not, create an index first.

Find the information of the vertex with the tag player and its value of the name property is "Tony Parker".

// Create an index on the player name property.
nebula> CREATE TAG INDEX player_name_0 on player(name(10));
Execution succeeded (time spent 3465/4150 us)

// Rebuild the index to make sure it takes effect on pre-existing data.
nebula> REBUILD TAG INDEX player_name_0
+------------+
| New Job Id |
+------------+
| 31         |
+------------+
Got 1 rows (time spent 2379/3033 us)

// Use LOOKUP to retrieve the vertex property.
nebula> LOOKUP ON player WHERE player.name == "Tony Parker" \
YIELD player.name, player.age;
+-------------+---------------+------------+
| VertexID    | player.name   | player.age |
+-------------+---------------+------------+
| "player101" | "Tony Parker" | 36         |
+-------------+---------------+------------+

// Use MATCH to retrieve the vertex.
nebula> MATCH (v:player{name:"Tony Parker"}) RETURN v;
+-----------------------------------------------------+
| v                                                   |
+-----------------------------------------------------+
| ("player101" :player{age: 36, name: "Tony Parker"}) |
+-----------------------------------------------------+
Got 1 rows (time spent 5132/6246 us)