Skip to content

Full-text indexes

Full-text indexes are used to do prefix, wildcard, regexp, and fuzzy search on a string property.

You can use the WHERE clause to specify the search strings in LOOKUP statements.

Prerequisite

Before using the full-text index, make sure that you have deployed a Elasticsearch cluster and a Listener cluster. For more information, see Deploy Elasticsearch and Deploy Listener.

Precaution

Before using the full-text index, make sure that you know the restrictions.

A natural language search interprets the search string as a phrase in natural human language. The search is case-sensitive and by default prefixes the string with a match. For example, there are three vertices with the tag player. The tag player contains the property name. The name of these three vertices are Kevin Durant, Tim Duncan, and David Beckham. Now that the full-text index of player.name is established, only David Beckham will be queried when using the prefix search statement LOOKUP ON player WHERE PREFIX(player.name,"D");.

Syntax

Create full-text indexes

CREATE FULLTEXT {TAG | EDGE} INDEX <index_name> ON {<tag_name> | <edge_name>} ([<prop_name>]);

Show full-text indexes

SHOW FULLTEXT INDEXES;

Rebuild full-text indexes

REBUILD FULLTEXT INDEX;

Caution

When there is a large amount of data, rebuilding full-text index is slow, you can modify snapshot_send_files=false in the configuration file of Storage service(nebula-storaged.conf).

Drop full-text indexes

DROP FULLTEXT INDEX <index_name>;

Use query options

LOOKUP ON {<tag> | <edge_type>} WHERE <expression> [YIELD <return_list>];

<expression> ::=
    PREFIX | WILDCARD | REGEXP | FUZZY

<return_list>
    <prop_name> [AS <prop_alias>] [, <prop_name> [AS <prop_alias>] ...]
  • PREFIX(schema_name.prop_name, prefix_string, row_limit, timeout)
  • WILDCARD(schema_name.prop_name, wildcard_string, row_limit, timeout)
  • REGEXP(schema_name.prop_name, regexp_string, row_limit, timeout)
  • FUZZY(schema_name.prop_name, fuzzy_string, fuzziness, operator, row_limit, timeout)

    • fuzziness (optional): Maximum edit distance allowed for matching. The default value is AUTO. For other valid values and more information, see Elasticsearch document.
    • operator (optional): Boolean logic used to interpret the text. Valid values are OR (default) and AND.
  • row_limit (optional): Specifies the number of rows to return. The default value is 100.
  • timeout (optional): Specifies the timeout time. The default value is 200ms.

Examples

// This example creates the graph space.
nebula> CREATE SPACE IF NOT EXISTS basketballplayer (partition_num=3,replica_factor=1, vid_type=fixed_string(30));

// This example signs in the text service.
nebula> SIGN IN TEXT SERVICE (127.0.0.1:9200, HTTP);

// This example checks the text service status.
nebula> SHOW TEXT SEARCH CLIENTS;

// This example switches the graph space.
nebula> USE basketballplayer;

// This example adds the listener to the NebulaGraph cluster.
nebula> ADD LISTENER ELASTICSEARCH 192.168.8.5:9789;

// This example checks the listener status. When the status is `Online`, the listener is ready.
nebula> SHOW LISTENER;

// This example creates the tag.
nebula> CREATE TAG IF NOT EXISTS player(name string, age int);

// This example creates the full-text index. The index name starts with "nebula_".
nebula> CREATE FULLTEXT TAG INDEX nebula_index_1 ON player(name);

// This example rebuilds the full-text index.
nebula> REBUILD FULLTEXT INDEX;

// This example shows the full-text index.
nebula> SHOW FULLTEXT INDEXES;
+------------------+-------------+-------------+--------+
| Name             | Schema Type | Schema Name | Fields |
+------------------+-------------+-------------+--------+
| "nebula_index_1" | "Tag"       | "player"    | "name" |
+------------------+-------------+-------------+--------+

// This example inserts the test data.
nebula> INSERT VERTEX player(name, age) VALUES \
  "Russell Westbrook": ("Russell Westbrook", 30), \
  "Chris Paul": ("Chris Paul", 33),\
  "Boris Diaw": ("Boris Diaw", 36),\
  "David West": ("David West", 38),\
  "Danny Green": ("Danny Green", 31),\
  "Tim Duncan": ("Tim Duncan", 42),\
  "James Harden": ("James Harden", 29),\
  "Tony Parker": ("Tony Parker", 36),\
  "Aron Baynes": ("Aron Baynes", 32),\
  "Ben Simmons": ("Ben Simmons", 22),\
  "Blake Griffin": ("Blake Griffin", 30);

// These examples run test queries.
nebula> LOOKUP ON player WHERE PREFIX(player.name, "B") YIELD id(vertex);
+-----------------+
| id(VERTEX)      |
+-----------------+
| "Boris Diaw"    |
| "Ben Simmons"   |
| "Blake Griffin" |
+-----------------+

nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") YIELD player.name, player.age;
+-----------------+-----+
| name            | age |
+-----------------+-----+
| "Chris Paul"    | 33  |
| "Boris Diaw"    | 36  |
| "Blake Griffin" | 30  |
+-----------------+-----+

nebula> LOOKUP ON player WHERE WILDCARD(player.name, "*ri*") | YIELD count(*);
+----------+
| count(*) |
+----------+
| 3        |
+----------+

nebula> LOOKUP ON player WHERE REGEXP(player.name, "R.*") YIELD player.name, player.age;
+---------------------+-----+
| name                | age |
+---------------------+-----+
| "Russell Westbrook" | 30  |
+---------------------+-----+

nebula> LOOKUP ON player WHERE REGEXP(player.name, ".*") YIELD id(vertex);
+---------------------+
| id(VERTEX)          |
+---------------------+
| "Danny Green"       |
| "David West"        |
...

nebula> LOOKUP ON player WHERE FUZZY(player.name, "Tim Dunncan", AUTO, OR) YIELD player.name;
+--------------+
| name         |
+--------------+
| "Tim Duncan" |
+--------------+

// This example drops the full-text index.
nebula> DROP FULLTEXT INDEX nebula_index_1;

Last update: February 19, 2024