Nebula Graph Query Language (nGQL)¶
About nGQL¶
nGQL
is a declarative, textual query language like SQL, but for graphs. Unlike SQL, nGQL is all about expressing graph patterns. nGQL is a work in progress. We will add more features and further simplify the existing ones. There might be inconsistency between the syntax specs and implementation for the time being.
Goals¶
- Easy to learn
- Easy to understand
- To focus on the online queries, also to provide the foundation for the offline computation
Features¶
- Syntax is close to SQL, but not exactly the same (Easy to learn)
- Expandable
- Case insensitive
- Support basic graph traverse
- Support pattern match
- Support aggregation
- Support graph mutation
- Support distributed transaction (future release)
- Statement composition, but NO statement embedding (Easy to read)
Terminology¶
- Graph Space : A physically isolated space for different graph
- Tag : A label associated with a list of properties
- Each tag has a name (human readable string), and internally each tag will be assigned a 32-bit integer
- Each tag associates with a list of properties, each property has a name and a type
- There could be dependencies between tags. The dependency is a constrain, for instance, if tag S depends on tag T, then tag S cannot exist unless tag T exists
- Vertex : A Node in the graph
- Each vertex has a unique 64-bit (signed integer) ID (VID)
- Each vertex can associate with multiple tags
- Edge : A Link between two vertices
- Each edge can be uniquely identified by a tuple
- Edge type (ET) is a human readable string, internally it will be assigned a 32-bit integer. The edge type decides the property list (schema) on the edge.
- Edge rank is an immutable user-assigned 64-bit signed integer. It affects the edge order of the same edge type between two vertices. The edge with a higher rank value comes first. When not specified, the default rank value is zero. The current sorting basis is "binary coding order", i.e. 0, 1, 2, ... 9223372036854775807, -9223372036854775808, -9223372036854775807, ..., -1.
- Each edge can only be of one type
- Each edge can be uniquely identified by a tuple
- Path : A non-forked connection with multiple vertices and edges between them
- The length of a path is the number of the edges on the path, which is one less than the number of vertices
- A path can be represented by a list of vertices, edge types, and rank. An edge is a special path with length==1
<vid, <edge_type, rank>, vid, ...>
Language Specification at a Glance¶
For most readers, You can skip this section if you are not familiar with BNF.
General¶
- The entire set of statements can be categorized into three classes: query, mutation, and administration
- Every statement can yield a data set as the result. Each data set contains a schema (column name and type) and multiple data rows
Composition¶
- Statements could be composed in two ways:
- Statements could be piped together using operator "|", much like the pipe in the shell scripts. The result yielded from the previous statement could be redirected to the next statement as input
- More than one statements can be batched together, separated by ";". The result of the last statement (or a RETURN statement is executed) will be returned as the result of the batch
Data Types¶
- Simple type: vid, double, int, bool, string, timestamp
- vid: 64-bit signed integer, representing a vertex ID
Type Conversion¶
- A simple typed value can be implicitly converted into a list
- A list can be implicitly converted into a one-column tuple list
- "<type>_list" can be used as the column name
Common BNF¶
<type> ::=
<label> ::= [:alpha] ([:alnum:] | "_")*
<tuple> ::= "(" VALUE (, VALUE)* ")"
<var> ::= "$" <label>
Statements¶
Choose a Graph Space¶
Nebula supports multiple graph spaces. Data in different graph spaces are physically isolated. Before executing a query, a graph space needs to be selected using the following statement
USE
Return a Data Set¶
Simply return a single value or a data set
RETURN
Create a Tag¶
The following statement defines a new tag
CREATE TAG
Create an Edge Type¶
The following statement defines a new edge type
CREATE EDGE
Insert Vertices¶
The following statement inserts one or more vertices
INSERT VERTEX [NO OVERWRITE]
Insert Edges¶
The following statement inserts one or more edges
INSERT EDGE [NO OVERWRITE]
edge_value ::=
Update a Vertex¶
The following statement updates a vertex
UPDATE VERTEX
Update an Edge¶
The following statement updates an edge
UPDATE EDGE
Traverse the Graph¶
Navigate from given vertices to their neighbors according to the given conditions. It returns either a list of vertex IDs, or a list of tuples
GO
[
::= [data_set] [[AS] <label>]
::= vid |
WHERE clause only applies to the results that are going to be returned. It will not be applied to the intermediate results (See the detail description of the STEP[S] clause)
When STEP[S] clause is skipped, it implies one step
When going out for one step from the given vertex, all neighbors will be checked against the WHERE clause, only results satisfied the WHERE clause will be returned
When going out for more than one step, WHERE clause will only be applied to the final results. It will not be applied to the intermediate results. Here is an example
GO 2 STEPS FROM me OVER friend WHERE birthday > "1988/1/1"
Obviously, you will probably guess the meaning of the query is to get all my fof (friend of friend) whose birthday is after 1988/1/1. You are absolutely right. We will not apply the filter to my friends (in the first step).
Search¶
Following statements looks for vertices or edges that match certain conditions
FIND VERTEX
WHERE
FIND EDGE
WHERE
Property Reference¶
It's common to refer a property in the statement, such as in WHERE clause and YIELD clause. In nGQL, the reference to a property is defined as
<object> ::=
<var> always starts with "$". There are two special variables: $- and $$.
$- refers to the input stream, while $$ refers to the destination objects
All property names start with a letter. There are a few system property names starting with "_". All properties names starting with "_" are reserved.
Built-in Properties¶
- _id : Vertex id
- _type : Edge type
- _src : Source ID of the edge
- _dst : Destination ID of the edge
- _rank : Edge rank number