Clauses

MATCH

Matching nodes

The MATCH clause is the cornerstone around which the Sparksee Cypher Language queries are constructed. With the MATCH clause, the user specifies the pattern to look for in the graph.

Patterns in a MATCH clause are composed of “Paths”, separated by commas. Paths connect one or more nodes through edges. Restrictions such as the node or edge type, the direction of the connecting edges and node and edge properties can be specified in order to restrict the patterns to be matched.

For each pattern matched, one row is returned containing all the variables that have been bound in the pattern. These variables can then be used in subsequent clauses such as other MATCH, expressions in predicates or RETURN statements, grouping, etc.

The simplest use of the MATCH clause is to match all the nodes of the graph as follows:

MATCH (n)
RETURN *

This query pattern consists of a single path with a single node. Additionally, the type and the properties of the node can be specified as follows, where all the nodes of type person whose name is “John”:

MATCH (n:person { name: 'John' })
RETURN *

Matching nodes and edges

We can specify more complex paths connecting nodes through edges. The following query matches all pairs of “person” and “movie” connected through a “role” edge:

MATCH (n:person)-[:role]->(m:movie)
RETURN *

Similarly to nodes, we can specify properties on edges as follows, where only the roles of property “type” equals “actor” are matched:

MATCH (n:person)-[:role {type : "actor"}]->(m:movie)
RETURN *

Paths, are not limited to just one step, longer paths can be expressed by chaining multiple edges:

MATCH (n:person)-[:role {type : "actor"}]->(:movie)<-[:role {type : "actor"}]-(q:person)
RETURN *

where all pairs of persons (bound to n and q) that exercised as an “actor” in the same movie are searched for. Note that due to the rules described in Node homomorphism, Edge Isomorphism, both n and q could match the same node. The previous query, could also be expressed using two separate paths as follows:

MATCH (n:person)-[:role {type : "actor"}]->(m:movie), (m:movie)<-[:role {type : "actor"}]-(q:person)
RETURN *

with the difference that in this case we would also return the movie “m”. Finally, we could use two MATCH clauses to enforce “n” and “q” to be different:

MATCH (n:person)-[:role {type : "actor"}]->(m:movie)
MATCH (m:movie)<-[:role {type : "actor"}]-(q:person)
RETURN *

where conceptually, two patterns are matched independently and joined through the common variables.

OPTIONAL MATCH

The OPTIONAL MATCH is a variant of MATCH clause that, as its name indicates, optionally checks for the presence of a pattern. It is the equivalent of an outer join in SQL, in this case when joining the result of two MATCH clauses. For instance, the following query matches all persons of the graph and “optionally”, returns the movies where these have exercised as an “actor”:

MATCH (n:person)
OPTIONAL MATCH (n:person)-[:role { type : "actor" }]->(m:movie)
RETURN *

In this query, all the persons would be returned, regardless whether these have been actors in a movie or not. If a person does not optionally match the second pattern, NULL will be returned for the “m” column. However, for each movie she participated as an “actor”, a row will be returned. If, instead of using OPTIONAL MATCH, we used a regular MATCH in the previous query, only those pairs of “person” “n” acting in a “movie” “m” would be returned.

Note that a query cannot start with an OPTIONAL MATCH clause.

RETURN

The RETURN clause is used to decide what to project in the result. The easiest way to use the RETURN clause is by using the * operator, which will return all the variables bound, following the following rules:

All variables bound in the query are returned
The order of the columns is the same as the order of occurrence of the variables in the query, reading from left to right first, and then from top to bottom.
Variables occurring more than once will only be returned once, and that of the first occurrence.

For instance, in the following query:

MATCH (n:person)-[r1:role {type : 'actor'}]-(m:movie),
MATCH (n:person)-[r2:role {type : 'director'}]-(q:movie)
RETURN *

the returned columns will be: n, r1, m, r2, q, in that order. Columns are always named after the expression name, unless an alias is specified.

Users can, however, customize what to project, including properties of nodes an edges. The following query will be equivalent to the previous one:

MATCH (n:person)-[r1:role {type : 'actor'}]-(m:movie),
MATCH (n:person)-[r2:role {type : 'director'}]-(q:movie)
RETURN n, r1, m, r2, q

To specify the projection of a property value, just use the “var.property” convention, as follows:

MATCH (n:person),
RETURN n, n.name, n.age

where both the id, the “name” and the “age” of the “person” nodes are returned as columns named “n”, “n.name” and “n.age” respectively. One can also rename the columns using aliases, as fallows:

MATCH (n:person),
RETURN n AS ID, n.name AS NAME, n.age AS AGE

where the returned columns will be named as “ID”, “NAME” and “AGE” respectively. Finally, arbitrary expressions can be returned:

MATCH (n:person),
RETURN n AS ID, 
       CASE 
          WHEN n.age >= 18 THEN "adult" 
          ELSE "child" 
       END AS CATEGORY, 
       timestamp() - n.timestamp AS ELAPSED_TIME,

Grouping

In RETURN expressions, one can perform grouping operations and compute statistics on these groups. When a grouping operation is specified in a projection, the other projected columns become the grouping columns used to form the groups. For instance, we can write the following query to count the number of persons of each age as follows:

MATCH (n:person)
RETURN n.age, count(*) AS FREQUENCY

Instead, we can count the number of non-NULL values of a particular expression. For instance, the following query returns the number of persons with a non-NULL time.

MATCH (n:person)
RETURN n.age, count(n.time) AS COUNT_NON_NULL

Also, we can use the DISTINCT word to compute the number of distinct values in a group. For instance, the following query computes the number of distinct names on each age group:

MATCH (n:person)
RETURN n.age, count(DISTINCT n.name) AS COUNT_DISTINCT_NAMES

The following, are the different grouping functions available.

Aggregate Functions
Operator	Description
count(*)	Count the number of returned values
count(X)	Counts the number of values different than NULL
min(X)	Gets the mínimum of the values
max(X)	Gets the maximum of the values
avg(X)	Gets the average of the values
sum(X)	Gets the sum of the values

Additionally, all the aggregate functions can be modified with the DISTINCT modifier (e.g. sum(DISTINCT X), count(DISTINCT X), etc)

WITH

The WITH clause is used to chain query parts in such a way that the output of one part is piped into the input of the other. With the WITH clause you can manipulate the output of the query before being consumed further, for instance, by deciding what to project, limiting the number of outputs, filter on aggregate values, etc.

The WITH is used like the RETURN clause. It creates a projection for the data, and after it, only the projected elements can be referenced in the remaining of the query. The * operator can be used to return all variables, aliases can be assigned to columns, etc. For instance, the following query returns the age groups where the frequency is larger than a threshold:

MATCH (n:person)
WITH n.age, count(*) AS FREQUENCY
WHERE FREQUENCY > 10
RETURN *

WHERE

The WHERE clause is used to filter the rows using predicates. WHERE must appear in conjunction with a MATCH, OPTIONAL MATCH or WITH clause. With the WHERE clause, the user must specify a boolean expression to filter the rows. For instance, the following query filters the persons of the graph by their age:

MATCH (n:person)
WHERE n.age > 18
RETURN n

Similarly, we can express the same query using WHERE in conjunction with a WITH clause:

MATCH (n:person)
WITH n, n.age AS AGE
WHERE AGE > 18
return n

ORDER BY

The ORDER BY clause is used to sort the rows using different criteria. ORDER BY is a sub-clause of WITH and RETURN, and as such it must appear in conjunction with one of these two other clauses. Rows can be sorted by different columns, and the sorting direction (ascending or descending) can be specified independently for each of them. For instance, the following query sorts the persons by their “age” ascendingly:

MATCH (n:person)
RETURN n, n.age AS AGE
ORDER BY AGE ASC

Similarly, we can sort the edges descendingly:

MATCH (n:person)
RETURN n, n.age AS AGE
ORDER BY AGE DESC

We can sort rows by multiple columns using different criteria for each column. For instance, we can sort first by “age” descendingly and then by “name” ascendingly as follows:

MATCH (n:person)
RETURN n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC

Finally, we can use ORDER BY with the WITH clause. The previous query could be rewritten as fallows:

MATCH (n:person)
WITH n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC
RETURN *

SKIP

The SKIP clause is used to specify from which point the rows are being included in the result. SKIP clause is a sub-clause of RETURN and WITH, and as such it must appear in conjunction with one of those. Note that without an ORDER BY clause before the SKIP, no guarantees are given to which rows are being skipped. For instance, we can write a query to filter the first 10 persons sorted by name descendingly:

MATCH (n:person)
RETURN n, n.name AS NAME
ORDER BY NAME DESC 
SKIP 10

Also, more complex expressions can be used in the SKIP clause, but always without referencing any variable and resolving into a result of INTEGER or LONG type:

MATCH (n:person)
RETURN n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC
SKIP toInteger(timestamp()/10000000)

LIMIT

The LIMIT clause is used to specify from which point the rows are being included in the result. LIMIT clause is a sub-clause of RETURN and WITH, and as such it must appear in conjunction with one of those. Note that without an ORDER BY clause before the LIMIT, no guarantees are given to which rows are conserved. For instance, we can write a query to only keep the first 10 persons sorted by name descendingly:

MATCH (n:person)
RETURN n, n.name AS NAME
ORDER BY NAME DESC 
LIMIT 10

Also, more complex expressions can be used in the LIMIT clause, but always without referencing any variable and resolving into a result of INTEGER or LONG type:

MATCH (n:person)
RETURN n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC
LIMIT toInteger(timestamp()/10000000)

UNION

The UNION clause performs the union of multiple chained queries. Such queries must always contain the same number of columns and these need to have the same name. For example, the following query returns the union of nodes of type “person” and “movie”, with a column containing their type

MATCH (n:person)
WITH n as ID, type(n) AS TYPE
UNION
MATCH (m:movie)
WITH m as ID, type(m) AS TYPE
RETURN *

Back to Index

Sparksee openCypher

by Sparsity Technologies