Skip to content

Full-text indexes

Full-text indexes are used to do prefix, wildcard, regexp, and fuzzy search on a string property.

You can use the WHERE clause to specify the search strings in LOOKUP statements.

Prerequisite

Before using the full-text index, make sure that you have deployed a Elasticsearch cluster and a Listener cluster. For more information, see Deploy Elasticsearch and Deploy Listener.

Precaution

Before using the full-text index, make sure that you know the restrictions.

A natural language search interprets the search string as a phrase in natural human language. The search is case-sensitive and by default prefixes the string with a match. For example, there are three vertices with the tag player. The tag player contains the property name. The name of these three vertices are Kevin Durant, Tim Duncan, and David Beckham. Now that the full-text index of player.name is established, only David Beckham will be queried when using the prefix search statement LOOKUP ON player WHERE PREFIX(player.name,"D");.

Syntax

Create full-text indexes

CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} ([<prop_name>]);

Show full-text indexes

SHOW FULLTEXT INDEXES;

Rebuild full-text indexes

REBUILD FULLTEXT INDEX;

Caution

When there is a large amount of data, rebuilding full-text index is slow, you can modify snapshot_send_files=false in the configuration file of Storage service(nebula-storaged.conf).

Drop full-text indexes

DROP FULLTEXT INDEX <index_name>;

Use query options

LOOKUP ON {<tag> | <edge_type>} WHERE <expression> [YIELD <return_list>];

<expression> ::=
    PREFIX | WILDCARD | REGEXP | FUZZY

<return_list>
    <prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...]
  • PREFIX(schema_name.prop_name, prefix_string, row_limit, timeout)
  • WILDCARD(schema_name.prop_name, wildcard_string, row_limit, timeout)
  • REGEXP(schema_name.prop_name, regexp_string, row_limit, timeout)
  • FUZZY(schema_name.prop_name, fuzzy_string, fuzziness, operator, row_limit, timeout)

    • fuzziness (optional): Maximum edit distance allowed for matching. The default value is AUTO. For other valid values and more information, see Elasticsearch document.
    • operator (optional): Boolean logic used to interpret the text. Valid values are OR (default) and AND.
  • row_limit (optional): Specifies the number of rows to return. The default value is 100.
  • timeout (optional): Specifies the timeout time. The default value is 200ms.

Examples

// This example creates the graph space.
nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));

// This example signs in the text service.
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP);

// This example checks the text service status.
nebula> SHOW TEXT SEARCH CLIENTS;

// This example switches the graph space.
nebula> USE basketballplayer;

// This example adds the listener to the NebulaGraph cluster.
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.5:9789;

// This example checks the listener status. When the status is `Online`, the listener is ready.
nebula> SHOW LISTENER;

// This example creates the tag.
nebula> CREATE TAG IF NOT EXISTS player(name string, age int);

// This example creates the full-text index. The index name starts with "nebula_".
nebula> CREATE FULLTEXT TAG INDEX nebula_index_1 ON player(name);

// This example rebuilds the full-text index.
nebula> REBUILD FULLTEXT INDEX;

// This example shows the full-text index.
nebula> SHOW FULLTEXT INDEXES;
+------------------+-------------+-------------+--------+
| Name             | Schema Type | Schema Name | Fields |
+------------------+-------------+-------------+--------+
| "nebula_index_1" | "Tag"       | "player"    | "name" |
+------------------+-------------+-------------+--------+

// This example inserts the test data.
nebula> INSERT VERTEX player(name, age) VALUES \
  "Russell Westbrook": ("Russell Westbrook", 30), \
  "Chris Paul": ("Chris Paul", 33),\
  "Boris Diaw": ("Boris Diaw", 36),\
  "David West": ("David West", 38),\
  "Danny Green": ("Danny Green", 31),\
  "Tim Duncan": ("Tim Duncan", 42),\
  "James Harden": ("James Harden", 29),\
  "Tony Parker": ("Tony Parker", 36),\
  "Aron Baynes": ("Aron Baynes", 32),\
  "Ben Simmons": ("Ben Simmons", 22),\
  "Blake Griffin": ("Blake Griffin", 30);

// These examples run test queries.
nebula> LOOKUP ON player WHERE PREFIX(player.name, "B") YIELD id(vertex);
+-----------------+
| id(VERTEX)      |
+-----------------+
| "Boris Diaw"    |
| "Ben Simmons"   |
| "Blake Griffin" |
+-----------------+

nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") YIELD player.name, player.age;
+-----------------+-----+
| name            | age |
+-----------------+-----+
| "Chris Paul"    | 33  |
| "Boris Diaw"    | 36  |
| "Blake Griffin" | 30  |
+-----------------+-----+

nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") YIELD player.name, player.age | YIELD count(*);
+----------+
| count(*) |
+----------+
| 3        |
+----------+

nebula> LOOKUP ON player WHERE REGEXP(player.name, "R.*") YIELD player.name, player.age;
+---------------------+-----+
| name                | age |
+---------------------+-----+
| "Russell Westbrook" | 30  |
+---------------------+-----+

nebula> LOOKUP ON player WHERE REGEXP(player.name, ".*") YIELD id(vertex);
+---------------------+
| id(VERTEX)          |
+---------------------+
| "Danny Green"       |
| "David West"        |
...

nebula> LOOKUP ON player WHERE FUZZY(player.name, "Tim Dunncan", AUTO, OR) YIELD player.name;
+--------------+
| name         |
+--------------+
| "Tim Duncan" |
+--------------+

// This example drops the full-text index.
nebula> DROP FULLTEXT INDEX nebula_index_1;

Full-text indexes are used to do prefix, wildcard, regexp, and fuzzy search on a string property.

You can use the WHERE clause to specify the search strings in LOOKUP statements.

Prerequisite

Before using the full-text index, make sure that you have deployed a Elasticsearch cluster and a Listener cluster. For more information, see Deploy Elasticsearch and Deploy Listener.

Precaution

Before using the full-text index, make sure that you know the restrictions.

Full Text Queries

Full-text queries enable you to search for parsed text fields, using a parser with strict syntax to return content based on the query string provided. For details, see Query string query.

Syntax

Create full-text indexes

CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} (<prop_name> [,<prop_name>]...) [ANALYZER="<analyzer_name>"];
  • Composite indexes with multiple properties are supported when creating full-text indexes.
  • <analyzer_name> is the name of the analyzer. The default value is standard. To use other analyzers (e.g. IK Analysis), you need to make sure that the corresponding analyzer is installed in Elasticsearch in advance.

Show full-text indexes

SHOW FULLTEXT INDEXES;

Rebuild full-text indexes

REBUILD FULLTEXT INDEX;

Caution

When there is a large amount of data, rebuilding full-text index is slow, you can modify snapshot_send_files=false in the configuration file of Storage service(nebula-storaged.conf).

Drop full-text indexes

DROP FULLTEXT INDEX <index_name>;

Use query options

LOOKUP ON {<tag> | <edge_type>} WHERE ES_QUERY(<index_name>, "<text>") YIELD <return_list> [| LIMIT [<offset>,] <number_rows>];

<return_list>
    <prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...] [, id(vertex)  [AS <prop_alias>]] [, score() AS <score_alias>]
  • index_name: The name of the full-text index.
  • score(): The score calculated by doing N degree expansion for the eligible vertices. The default value is 1.0. The higher the score, the higher the degree of match. The return value is sorted by default from highest to lowest score. For details, see Search and Scoring in Lucene.

Examples

// This example creates the graph space.
nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));

// This example signs in the text service.
nebula> SIGN IN TEXT SERVICE (192.168.8.100:9200, HTTP);

// This example checks the text service status.
nebula> SHOW TEXT SEARCH CLIENTS;
+-----------------+-----------------+------+
| Type            | Host            | Port |
+-----------------+-----------------+------+
| "ELASTICSEARCH" | "192.168.8.100" | 9200 |
+-----------------+-----------------+------+

// This example switches the graph space.
nebula> USE basketballplayer;

// This example adds the listener to the NebulaGraph cluster.
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.100:9789;

// This example checks the listener status. When the status is `Online`, the listener is ready.
nebula> SHOW LISTENER;
+--------+-----------------+------------------------+-------------+
| PartId | Type            | Host                   | Host Status |
+--------+-----------------+------------------------+-------------+
| 1      | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE"    |
| 2      | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE"    |
| 3      | "ELASTICSEARCH" | ""192.168.8.100":9789" | "ONLINE"    |
+--------+-----------------+------------------------+-------------+

// This example creates the tag.
nebula> CREATE TAG IF NOT EXISTS player(name string, city string);

// This example creates a single-attribute full-text index.
nebula> CREATE FULLTEXT TAG INDEX fulltext_index_1 ON player(name) ANALYZER="standard";

// This example creates a multi-attribute full-text indexe.
nebula> CREATE FULLTEXT TAG INDEX fulltext_index_2 ON player(name,city) ANALYZER="standard";

// This example rebuilds the full-text index.
nebula> REBUILD FULLTEXT INDEX;

// This example shows the full-text index.
nebula> SHOW FULLTEXT INDEXES;
+--------------------+-------------+-------------+--------------+------------+
| Name               | Schema Type | Schema Name | Fields       | Analyzer   |
+--------------------+-------------+-------------+--------------+------------+
| "fulltext_index_1" | "Tag"       | "player"    | "name"       | "standard" |
| "fulltext_index_2" | "Tag"       | "player"    | "name, city" | "standard" |
+--------------------+-------------+-------------+--------------+------------+

// This example inserts the test data.
nebula> INSERT VERTEX player(name, city) VALUES \
  "Russell Westbrook": ("Russell Westbrook", "Los Angeles"), \
  "Chris Paul": ("Chris Paul", "Houston"),\
  "Boris Diaw": ("Boris Diaw", "Houston"),\
  "David West": ("David West", "Philadelphia"),\
  "Danny Green": ("Danny Green", "Philadelphia"),\
  "Tim Duncan": ("Tim Duncan", "New York"),\
  "James Harden": ("James Harden", "New York"),\
  "Tony Parker": ("Tony Parker", "Chicago"),\
  "Aron Baynes": ("Aron Baynes", "Chicago"),\
  "Ben Simmons": ("Ben Simmons", "Phoenix"),\
  "Blake Griffin": ("Blake Griffin", "Phoenix");

// These examples run test queries.
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Chris") YIELD id(vertex);
+--------------+
| id(VERTEX)   |
+--------------+
| "Chris Paul" |
+--------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Harden") YIELD properties(vertex);
+----------------------------------------------------------------+
| properties(VERTEX)                                             |
+----------------------------------------------------------------+
| {_vid: "James Harden", city: "New York", name: "James Harden"} |
+----------------------------------------------------------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"Da*") YIELD properties(vertex);
+------------------------------------------------------------------+
| properties(VERTEX)                                               |
+------------------------------------------------------------------+
| {_vid: "David West", city: "Philadelphia", name: "David West"}   |
| {_vid: "Danny Green", city: "Philadelphia", name: "Danny Green"} |
+------------------------------------------------------------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex);
+---------------------+
| id(VERTEX)          |
+---------------------+
| "Russell Westbrook" |
| "Boris Diaw"        |
| "Aron Baynes"       |
| "Ben Simmons"       |
| "Blake Griffin"     |
+---------------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex) | LIMIT 2,3;
+-----------------+
| id(VERTEX)      |
+-----------------+
| "Aron Baynes"   |
| "Ben Simmons"   |
| "Blake Griffin" |
+-----------------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex) | YIELD count(*);
+----------+
| count(*) |
+----------+
| 5        |
+----------+

nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*") YIELD id(vertex), score() AS score;
+---------------------+-------+
| id(VERTEX)          | score |
+---------------------+-------+
| "Russell Westbrook" | 1.0   |
| "Boris Diaw"        | 1.0   |
| "Aron Baynes"       | 1.0   |
| "Ben Simmons"       | 1.0   |
| "Blake Griffin"     | 1.0   |
+---------------------+-------+

// For documents containing a word `b`, its score will be multiplied by a weighting factor of 4, while for documents containing a word `c`, the default weighting factor of 1 is used.
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_1,"*b*^4 OR *c*") YIELD id(vertex), score() AS score;
+---------------------+-------+
| id(VERTEX)          | score |
+---------------------+-------+
| "Russell Westbrook" | 4.0   |
| "Boris Diaw"        | 4.0   |
| "Aron Baynes"       | 4.0   |
| "Ben Simmons"       | 4.0   |
| "Blake Griffin"     | 4.0   |
| "Chris Paul"        | 1.0   |
| "Tim Duncan"        | 1.0   |
+---------------------+-------+

// When using a multi-attribute full-text index query, the conditions are matched within all properties of the index.
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_2,"*h*") YIELD properties(vertex);
+------------------------------------------------------------------+
| properties(VERTEX)                                               |
+------------------------------------------------------------------+
| {_vid: "Chris Paul", city: "Houston", name: "Chris Paul"}        |
| {_vid: "Boris Diaw", city: "Houston", name: "Boris Diaw"}        |
| {_vid: "David West", city: "Philadelphia", name: "David West"}   |
| {_vid: "James Harden", city: "New York", name: "James Harden"}   |
| {_vid: "Tony Parker", city: "Chicago", name: "Tony Parker"}      |
| {_vid: "Aron Baynes", city: "Chicago", name: "Aron Baynes"}      |
| {_vid: "Ben Simmons", city: "Phoenix", name: "Ben Simmons"}      |
| {_vid: "Blake Griffin", city: "Phoenix", name: "Blake Griffin"}  |
| {_vid: "Danny Green", city: "Philadelphia", name: "Danny Green"} |
+------------------------------------------------------------------+

// When using multi-attribute full-text index queries, you can specify different text for different properties for the query.
nebula> LOOKUP ON player WHERE ES_QUERY(fulltext_index_2,"name:*b* AND city:Houston") YIELD properties(vertex);
+-----------------------------------------------------------+
| properties(VERTEX)                                        |
+-----------------------------------------------------------+
| {_vid: "Boris Diaw", city: "Houston", name: "Boris Diaw"} |
+-----------------------------------------------------------+

// Delete single-attribute full-text index.
nebula> DROP FULLTEXT INDEX fulltext_index_1;

Last update: July 5, 2023