GROUP BY¶

The GROUP BY clause can be used to aggregate data.

OpenCypher Compatibility¶

This topic applies to native nGQL only.

You can also use the count() function to aggregate data.

nebula>  MATCH (v:player)<-[:follow]-(:player) RETURN v.player.name AS Name, count(*) as cnt ORDER BY cnt DESC;
+----------------------+-----+
| Name                 | cnt |
+----------------------+-----+
| "Tim Duncan"         | 10  |
| "LeBron James"       | 6   |
| "Tony Parker"        | 5   |
| "Chris Paul"         | 4   |
| "Manu Ginobili"      | 4   |
+----------------------+-----+
...

Syntax¶

The GROUP BY clause groups the rows with the same value. Then operations such as counting, sorting, and calculation can be applied.

The GROUP BY clause works after the pipe symbol (|) and before a YIELD clause.

| GROUP BY <var> YIELD <var>, <aggregation_function(var)>

The aggregation_function() function supports avg(), sum(), max(), min(), count(), collect(), and std().

Examples¶

The following statement finds all the vertices connected directly to vertex "player100", groups the result set by player names, and counts how many times the name shows up in the result set.

nebula> GO FROM "player100" OVER follow BIDIRECT \
        YIELD properties($$).name as Name \
        | GROUP BY $-.Name \
        YIELD $-.Name as Player, count(*) AS Name_Count;
+---------------------+------------+
| Player              | Name_Count |
+---------------------+------------+
| "Shaquille O'Neal"  | 1          |
| "Tiago Splitter"    | 1          |
| "Manu Ginobili"     | 2          |
| "Boris Diaw"        | 1          |
| "LaMarcus Aldridge" | 1          |
| "Tony Parker"       | 2          |
| "Marco Belinelli"   | 1          |
| "Dejounte Murray"   | 1          |
| "Danny Green"       | 1          |
| "Aron Baynes"       | 1          |
+---------------------+------------+

The following statement finds all the vertices connected directly to vertex "player100", groups the result set by source vertices, and returns the sum of degree values.

nebula> GO FROM "player100" OVER follow \
        YIELD src(edge) AS player, properties(edge).degree AS degree \
        | GROUP BY $-.player \
        YIELD sum($-.degree);
+----------------+
| sum($-.degree) |
+----------------+
| 190            |
+----------------+

For more information about the sum() function, see Built-in math functions.

Implicit GROUP BY¶

The usage of GROUP BY in the above nGQL statements that explicitly write GROUP BY and act as grouping fields is called explicit GROUP BY, while in openCypher, the GROUP BY is implicit, i.e., GROUP BY groups fields without explicitly writing GROUP BY. The explicit GROUP BY in nGQL is the same as the implicit GROUP BY in openCypher, and nGQL also supports the implicit GROUP BY. For the implicit usage of GROUP BY, see Stack Overflow.

For example, to look up the players over 34 years old with the same length of service, you can use the following statement:

nebula> LOOKUP ON player WHERE player.age > 34 YIELD id(vertex) AS v | \
        GO FROM $-.v OVER serve YIELD serve.start_year AS start_year, serve.end_year AS end_year | \
        YIELD $-.start_year, $-.end_year, count(*) AS count | \
        ORDER BY $-.count DESC | LIMIT 5;
+---------------+-------------+-------+
| $-.start_year | $-.end_year | count |
+---------------+-------------+-------+
| 2018          | 2019        | 3     |
| 2007          | 2012        | 2     |
| 1998          | 2004        | 2     |
| 2017          | 2018        | 2     |
| 2010          | 2011        | 2     |
+---------------+-------------+-------+

Last update: November 10, 2023