VID¶

In NebulaGraph, a vertex is uniquely identified by its ID, which is called a VID or a Vertex ID.

Features¶

The data types of VIDs are restricted to FIXED_STRING(<N>) or INT64. One graph space can only select one VID type.

A VID in a graph space is unique. It functions just as a primary key in a relational database. VIDs in different graph spaces are independent.

The VID generation method must be set by users, because NebulaGraph does not provide auto increasing ID, or UUID.

Vertices with the same VID will be identified as the same one. For example:
- A VID is the unique identifier of an entity, like a person's ID card number. A tag means the type of an entity, such as driver, and boss. Different tags define two groups of different properties, such as driving license number, driving age, order amount, order taking alt, and job number, payroll, debt ceiling, business phone number.
- When two INSERT statements (neither uses a parameter of IF NOT EXISTS) with the same VID and tag are operated at the same time, the latter INSERT will overwrite the former.
- When two INSERT statements with the same VID but different tags, like TAG A and TAG B, are operated at the same time, the operation of Tag A will not affect Tag B.

VIDs will usually be indexed and stored into memory (in the way of LSM-tree). Thus, direct access to VIDs enjoys peak performance.

VID Operation¶

NebulaGraph 1.x only supports INT64 while NebulaGraph 2.x supports INT64 and FIXED_STRING(<N>). In CREATE SPACE, VID types can be set via vid_type.

id() function can be used to specify or locate a VID.

LOOKUP or MATCH statements can be used to find a VID via property index.

Direct access to vertices statements via VIDs enjoys peak performance, such as DELETE xxx WHERE id(xxx) == "player100" or GO FROM "player100". Finding VIDs via properties and then operating the graph will cause poor performance, such as LOOKUP | GO FROM $-.ids, which will run both LOOKUP and | one more time.

VID Generation¶

VIDs can be generated via applications. Here are some tips:

(Optimal) Directly take a unique primary key or property as a VID. Property access depends on the VID.

Generate a VID via a unique combination of properties. Property access depends on property index.

Generate a VID via algorithms like snowflake. Property access depends on property index.

If short primary keys greatly outnumber long primary keys, do not enlarge the N of FIXED_STRING(<N>) too much. Otherwise, it will occupy a lot of memory and hard disks, and slow down performance. Generate VIDs via BASE64, MD5, hash by encoding and splicing.

If you generate int64 VID via hash, the probability of collision is about 1/10 when there are 1 billion vertices. The number of edges has no concern with the probability of collision.

Define and modify a VID and its data type¶

The data type of a VID must be defined when you create the graph space. Once defined, it cannot be modified.

A VID is set when you insert a vertex and cannot be modified.

Query `start vid` and global scan¶

In most cases, the execution plan of query statements in NebulaGraph (MATCH, GO, and LOOKUP) must query the start vid in a certain way.

There are only two ways to locate start vid:

For example, GO FROM "player100" OVER explicitly indicates in the statement that start vid is "player100".
For example, LOOKUP ON player WHERE player.name == "Tony Parker" or MATCH (v:player {name:"Tony Parker"}) locates start vid by the index of the property player.name.

Caution

For example, match (n) return n; returns an error: Scan vertices or edges need to specify a limit number, or limit number can not push down., because it is a global scan, you must use the LIMIT clause to limit the number of returns.

Last update: March 13, 2023