Data modeling¶

A data model is a model that organizes data and specifies how they are related to one another. This topic describes the Nebula Graph data model and provides suggestions for data modeling with Nebula Graph.

Data structures¶

Nebula Graph data model uses five data structures to store data. They are vertices, edges, properties, tags, and edge types.

Vertices: Vertices are used to store entities.
- In Nebula Graph, vertices are identified with vertex identifiers (i.e. VID). The VID must be unique in the same graph space.
- A vertex must have at least one tag.
Edges: Edges are used to connect vertices. An edge is a connection or behavior between two vertices.
- An edge is identified uniquely with a source vertex, an edge type, a rank value, and a destination vertex.
- Edges are directed. -> identifies the directions of edges. Edges can be traversed in either direction.
- An edge must have one and only one edge type.
- The rank value is an immutable user-assigned 64-bit signed integer. It identifies the edges with the same edge type between two vertices. Edges are sorted by their rank values. The edge with the greatest rank value is listed first. The default rank value is zero.
Properties: Properties are key-value pairs. Both vertices and edges are containers for properties.
Tags: Tags are used to categorize vertices. Vertices that have the same tag share the same definition of properties.
Edge types: Edge types are used to categorize edges. Edges that have the same edge type share the same definition of properties.

Directed property graph¶

Nebula Graph stores data in directed property graphs. A directed property graph has a set of vertices connected by edges. And the edges have directions. A directed property graph is represented as:

G = < V, E, P_V, P_E >

V is a set of vertices.
E is a set of directed edges.
P_V is the property of vertices.
P_E is the property of edges.

The following table is an example of the structure of the basketball player dataset. We have two types of vertices, that is player and team, and two types of edges, that is serve and like.

Element	Name	Property name (Data type)	Description
Tag	player	name (string) age (int)	Represents players in the team.
Tag	team	name (string)	Represents the teams.
Edge type	serve	start_year (int) end_year (int)	Represents actions taken by players in the team. An action links a player and a team and the direction is from a player to a team.
Edge type	like	likeness (int)	Represents actions taken by players in the team. An action links a player and another player and the direction is from one player to the other player.

Graph data modeling suggestions¶

This section provides general suggestions for modeling data in Nebula Graph.

NOTE: The following suggestions may not apply to some special scenarios. In these cases, find help in the Nebula Graph community.

Model for performance¶

There is no perfect method to model in Nebula Graph. Graph modeling depends on the questions that you want to know from the data. Your data drives your graph model. Graph data modeling is intuitive and convenient. Create your data model based on your business model. Test your model and gradually optimize it to fit your business. To get better performance, you can change or re-design your model multiple times.

Edges as properties¶

Traversal depth decreases the traversal performance. To decrease the traversal depth, use vertex properties instead of edges.

For example, to model a graph that have the name, age, and eye color elements, you can:

(RECOMMENDED) Create a tag person, then add the name, age, and eye color as its properties.
(WRONG WAY) Create a new tag eye color and a new edge type has, then create an edge to indicate that a person has an eye color.

The first modeling solution leads to much better performance. DO NOT use the second solution unless you have to.

Multiple properties under one tag are permitted. But make sure that tags are fine-grained. For more information, see the Granulated vertices section.

Granulated vertices¶

In graph modeling, use the data models with a higher level of granularity. Put a set of parallel properties into one tag, i.e., separate different concepts.

Use indexes correctly¶

Correct use of indexes speeds up queries, but indexes reduce the write performance by 90% or more. ONLY use indexes when you locate vertices or edges by their properties.

No long string properties on edges¶

Be careful when you create long string properties for edges. Nebula Graph supports storing such properties on edges. But note that these properties are stored both in the outgoing edges and the incoming edges. Thus be careful with the write amplification.

Last update: April 13, 2021