Nebula Graph Query Language (nGQL)¶

About nGQL¶

nGQL is a declarative, textual query language like SQL, but for graphs. Unlike SQL, nGQL is all about expressing graph patterns. nGQL is a work in progress. We will add more features and further simplify the existing ones. There might be inconsistency between the syntax specs and implementation for the time being.

Goals¶

Easy to learn
Easy to understand
To focus on the online queries, also to provide the foundation for the offline computation

Features¶

Syntax is close to SQL, but not exactly the same (Easy to learn)
Expandable
Case insensitive
Support basic graph traverse
Support pattern match
Support aggregation
Support graph mutation
Support distributed transaction (future release)
Statement composition, but NO statement embedding (Easy to read)

Terminology¶

Graph Space : A physically isolated space for different graph
Tag : A label associated with a list of properties
- Each tag has a name (human readable string), and internally each tag will be assigned a 32-bit integer
- Each tag associates with a list of properties, each property has a name and a type
- There could be dependencies between tags. The dependency is a constrain, for instance, if tag S depends on tag T, then tag S cannot exist unless tag T exists
Vertex : A Node in the graph
- Each vertex has a unique 64-bit (signed integer) ID (VID)
- Each vertex can associate with multiple tags
Edge : A Link between two vertices
- Each edge can be uniquely identified by a tuple
- Edge type (ET) is a human readable string, internally it will be assigned a 32-bit integer. The edge type decides the property list (schema) on the edge.
- Edge rank is an immutable user-assigned 64-bit signed integer. It affects the edge order of the same edge type between two vertices. The edge with a higher rank value comes first. When not specified, the default rank value is zero. The current sorting basis is "binary coding order", i.e. 0, 1, 2, ... 9223372036854775807, -9223372036854775808, -9223372036854775807, ..., -1.
- Each edge can only be of one type
Path : A non-forked connection with multiple vertices and edges between them
- The length of a path is the number of the edges on the path, which is one less than the number of vertices
- A path can be represented by a list of vertices, edge types, and rank. An edge is a special path with length==1

 <vid, <edge_type, rank>, vid, ...>

Language Specification at a Glance¶

For most readers, You can skip this section if you are not familiar with BNF.

General¶

The entire set of statements can be categorized into three classes: query, mutation, and administration
Every statement can yield a data set as the result. Each data set contains a schema (column name and type) and multiple data rows

Composition¶

Statements could be composed in two ways:
- Statements could be piped together using operator "|", much like the pipe in the shell scripts. The result yielded from the previous statement could be redirected to the next statement as input
- More than one statements can be batched together, separated by ";". The result of the last statement (or a RETURN statement is executed) will be returned as the result of the batch

Data Types¶

Simple type: vid, double, int, bool, string, timestamp

vid: 64-bit signed integer, representing a vertex ID

Type Conversion¶

A simple typed value can be implicitly converted into a list
A list can be implicitly converted into a one-column tuple list
- "<type>_list" can be used as the column name

Common BNF¶

::=

<type> ::= |

::= vid (, vid)* | "{" vid (, vid)* "}"

<label> ::= [:alpha] ([:alnum:] | "_")*

::= ("_")* <label>

::= <label>

::= (, )*

::= :<type>

::= ":"

::=

::= <tuple> (, <tuple>)* | "{" <tuple> (, <tuple>)* "}"

<tuple> ::= "(" VALUE (, VALUE)* ")"

Statements¶

Choose a Graph Space¶

Nebula supports multiple graph spaces. Data in different graph spaces are physically isolated. Before executing a query, a graph space needs to be selected using the following statement

USE

Return a Data Set¶

Simply return a single value or a data set

RETURN

::= vid | | | <var>

Create a Tag¶

The following statement defines a new tag

CREATE TAG ()

::= <label>
::= +
::= ,<type>
::= <label>

Create an Edge Type¶

The following statement defines a new edge type

CREATE EDGE ()

:= <label>

Insert Vertices¶

The following statement inserts one or more vertices

INSERT VERTEX [NO OVERWRITE] VALUES

::= () (, ())*
::= :() (, :())*
::= vid
::= (, )*
::= VALUE (, VALUE)*

Insert Edges¶

The following statement inserts one or more edges

INSERT EDGE [NO OVERWRITE] [()] VALUES ()+

edge_value ::= -> [@ <rank>] :

Update a Vertex¶

The following statement updates a vertex

UPDATE VERTEX SET \<update_decl> [WHERE <conditions>] [YIELD ]

::= |
::= = <expression> {, = <expression>}+
::= () = () | () = <var>

Update an Edge¶

The following statement updates an edge

UPDATE EDGE -> [@<rank>] OF SET [WHERE <conditions>] [YIELD ]

Traverse the Graph¶

Navigate from given vertices to their neighbors according to the given conditions. It returns either a list of vertex IDs, or a list of tuples

GO [ STEPS] FROM [OVER [REVERSELY] ] [WHERE ] [YIELD ]

::= [data_set] [[AS] <label>]
::= vid | | | <var>
::= [AS <label>] ::= {, }*
::= <label>

::= <filter> {AND | OR <filter>}*
::= \ \**>**\ | \**>= | < | <= | == | != <expression> | <expression> IN <value_list>
::= {, }*
::= <expression> [AS** <label>]

WHERE clause only applies to the results that are going to be returned. It will not be applied to the intermediate results (See the detail description of the STEP[S] clause)

When STEP[S] clause is skipped, it implies one step

When going out for one step from the given vertex, all neighbors will be checked against the WHERE clause, only results satisfied the WHERE clause will be returned

When going out for more than one step, WHERE clause will only be applied to the final results. It will not be applied to the intermediate results. Here is an example

GO 2 STEPS FROM me OVER friend WHERE birthday > "1988/1/1"

Obviously, you will probably guess the meaning of the query is to get all my fof (friend of friend) whose birthday is after 1988/1/1. You are absolutely right. We will not apply the filter to my friends (in the first step).

Search¶

Following statements looks for vertices or edges that match certain conditions

FIND VERTEX WHERE [YIELD ]

FIND EDGE WHERE [YIELD ]

Property Reference¶

It's common to refer a property in the statement, such as in WHERE clause and YIELD clause. In nGQL, the reference to a property is defined as

::= <object> "."
<object> ::= | | <var>
::= <label>
::= '[' "]"

<var> always starts with "$". There are two special variables: $- and $$.

$- refers to the input stream, while $$ refers to the destination objects

All property names start with a letter. There are a few system property names starting with "_". All properties names starting with "_" are reserved.

Built-in Properties¶

_id : Vertex id
_type : Edge type
_src : Source ID of the edge
_dst : Destination ID of the edge
_rank : Edge rank number