The MATCH clause is the cornerstone around which the Sparksee Cypher Language queries are constructed. With the MATCH clause, the user specifies the pattern to look for in the graph.
Patterns in a MATCH clause are composed of “Paths”, separated by commas. Paths connect one or more nodes through edges. Restrictions such as the node or edge type, the direction of the connecting edges and node and edge properties can be specified in order to restrict the patterns to be matched.
For each pattern matched, one row is returned containing all the variables that have been bound in the pattern. These variables can then be used in subsequent clauses such as other MATCH, expressions in predicates or RETURN statements, grouping, etc.
The simplest use of the MATCH clause is to match all the nodes of the graph as follows:
MATCH (n)
RETURN *
This query pattern consists of a single path with a single node. Additionally, the type and the properties of the node can be specified as follows, where all the nodes of type person whose name is “John”:
MATCH (n:person { name: 'John' })
RETURN *
We can specify more complex paths connecting nodes through edges. The following query matches all pairs of “person” and “movie” connected through a “role” edge:
MATCH (n:person)-[:role]->(m:movie)
RETURN *
Similarly to nodes, we can specify properties on edges as follows, where only the roles of property “type” equals “actor” are matched:
MATCH (n:person)-[:role {type : "actor"}]->(m:movie)
RETURN *
Paths, are not limited to just one step, longer paths can be expressed by chaining multiple edges:
MATCH (n:person)-[:role {type : "actor"}]->(:movie)<-[:role {type : "actor"}]-(q:person)
RETURN *
where all pairs of persons (bound to n and q) that exercised as an “actor” in the same movie are searched for. Note that due to the rules described in Node homomorphism, Edge Isomorphism, both n and q could match the same node. The previous query, could also be expressed using two separate paths as follows:
MATCH (n:person)-[:role {type : "actor"}]->(m:movie), (m:movie)<-[:role {type : "actor"}]-(q:person)
RETURN *
with the difference that in this case we would also return the movie “m”. Finally, we could use two MATCH clauses to enforce “n” and “q” to be different:
MATCH (n:person)-[:role {type : "actor"}]->(m:movie)
MATCH (m:movie)<-[:role {type : "actor"}]-(q:person)
RETURN *
where conceptually, two patterns are matched independently and joined through the common variables.
The OPTIONAL MATCH is a variant of MATCH clause that, as its name indicates, optionally checks for the presence of a pattern. It is the equivalent of an outer join in SQL, in this case when joining the result of two MATCH clauses. For instance, the following query matches all persons of the graph and “optionally”, returns the movies where these have exercised as an “actor”:
MATCH (n:person)
OPTIONAL MATCH (n:person)-[:role { type : "actor" }]->(m:movie)
RETURN *
In this query, all the persons would be returned, regardless whether these have been actors in a movie or not. If a person does not optionally match the second pattern, NULL will be returned for the “m” column. However, for each movie she participated as an “actor”, a row will be returned. If, instead of using OPTIONAL MATCH, we used a regular MATCH in the previous query, only those pairs of “person” “n” acting in a “movie” “m” would be returned.
Note that a query cannot start with an OPTIONAL MATCH clause.
The RETURN clause is used to decide what to project in the result. The easiest way to use the RETURN clause is by using the * operator, which will return all the variables bound, following the following rules:
For instance, in the following query:
MATCH (n:person)-[r1:role {type : 'actor'}]-(m:movie),
MATCH (n:person)-[r2:role {type : 'director'}]-(q:movie)
RETURN *
the returned columns will be: n, r1, m, r2, q, in that order. Columns are always named after the expression name, unless an alias is specified.
Users can, however, customize what to project, including properties of nodes an edges. The following query will be equivalent to the previous one:
MATCH (n:person)-[r1:role {type : 'actor'}]-(m:movie),
MATCH (n:person)-[r2:role {type : 'director'}]-(q:movie)
RETURN n, r1, m, r2, q
To specify the projection of a property value, just use the “var.property” convention, as follows:
MATCH (n:person),
RETURN n, n.name, n.age
where both the id, the “name” and the “age” of the “person” nodes are returned as columns named “n”, “n.name” and “n.age” respectively. One can also rename the columns using aliases, as fallows:
MATCH (n:person),
RETURN n AS ID, n.name AS NAME, n.age AS AGE
where the returned columns will be named as “ID”, “NAME” and “AGE” respectively. Finally, arbitrary expressions can be returned:
MATCH (n:person),
RETURN n AS ID,
CASE
WHEN n.age >= 18 THEN "adult"
ELSE "child"
END AS CATEGORY,
timestamp() - n.timestamp AS ELAPSED_TIME,
In RETURN expressions, one can perform grouping operations and compute statistics on these groups. When a grouping operation is specified in a projection, the other projected columns become the grouping columns used to form the groups. For instance, we can write the following query to count the number of persons of each age as follows:
MATCH (n:person)
RETURN n.age, count(*) AS FREQUENCY
Instead, we can count the number of non-NULL values of a particular expression. For instance, the following query returns the number of persons with a non-NULL time.
MATCH (n:person)
RETURN n.age, count(n.time) AS COUNT_NON_NULL
Also, we can use the DISTINCT word to compute the number of distinct values in a group. For instance, the following query computes the number of distinct names on each age group:
MATCH (n:person)
RETURN n.age, count(DISTINCT n.name) AS COUNT_DISTINCT_NAMES
The following, are the different grouping functions available.
Operator | Description |
---|---|
count(*) | Count the number of returned values |
count(X) | Counts the number of values different than NULL |
min(X) | Gets the mínimum of the values |
max(X) | Gets the maximum of the values |
avg(X) | Gets the average of the values |
sum(X) | Gets the sum of the values |
Additionally, all the aggregate functions can be modified with the DISTINCT modifier (e.g. sum(DISTINCT X), count(DISTINCT X), etc)
The WITH clause is used to chain query parts in such a way that the output of one part is piped into the input of the other. With the WITH clause you can manipulate the output of the query before being consumed further, for instance, by deciding what to project, limiting the number of outputs, filter on aggregate values, etc.
The WITH is used like the RETURN clause. It creates a projection for the data, and after it, only the projected elements can be referenced in the remaining of the query. The * operator can be used to return all variables, aliases can be assigned to columns, etc. For instance, the following query returns the age groups where the frequency is larger than a threshold:
MATCH (n:person)
WITH n.age, count(*) AS FREQUENCY
WHERE FREQUENCY > 10
RETURN *
The WHERE clause is used to filter the rows using predicates. WHERE must appear in conjunction with a MATCH, OPTIONAL MATCH or WITH clause. With the WHERE clause, the user must specify a boolean expression to filter the rows. For instance, the following query filters the persons of the graph by their age:
MATCH (n:person)
WHERE n.age > 18
RETURN n
Similarly, we can express the same query using WHERE in conjunction with a WITH clause:
MATCH (n:person)
WITH n, n.age AS AGE
WHERE AGE > 18
return n
The ORDER BY clause is used to sort the rows using different criteria. ORDER BY is a sub-clause of WITH and RETURN, and as such it must appear in conjunction with one of these two other clauses. Rows can be sorted by different columns, and the sorting direction (ascending or descending) can be specified independently for each of them. For instance, the following query sorts the persons by their “age” ascendingly:
MATCH (n:person)
RETURN n, n.age AS AGE
ORDER BY AGE ASC
Similarly, we can sort the edges descendingly:
MATCH (n:person)
RETURN n, n.age AS AGE
ORDER BY AGE DESC
We can sort rows by multiple columns using different criteria for each column. For instance, we can sort first by “age” descendingly and then by “name” ascendingly as follows:
MATCH (n:person)
RETURN n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC
Finally, we can use ORDER BY with the WITH clause. The previous query could be rewritten as fallows:
MATCH (n:person)
WITH n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC
RETURN *
The SKIP clause is used to specify from which point the rows are being included in the result. SKIP clause is a sub-clause of RETURN and WITH, and as such it must appear in conjunction with one of those. Note that without an ORDER BY clause before the SKIP, no guarantees are given to which rows are being skipped. For instance, we can write a query to filter the first 10 persons sorted by name descendingly:
MATCH (n:person)
RETURN n, n.name AS NAME
ORDER BY NAME DESC
SKIP 10
Also, more complex expressions can be used in the SKIP clause, but always without referencing any variable and resolving into a result of INTEGER or LONG type:
MATCH (n:person)
RETURN n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC
SKIP toInteger(timestamp()/10000000)
The LIMIT clause is used to specify from which point the rows are being included in the result. LIMIT clause is a sub-clause of RETURN and WITH, and as such it must appear in conjunction with one of those. Note that without an ORDER BY clause before the LIMIT, no guarantees are given to which rows are conserved. For instance, we can write a query to only keep the first 10 persons sorted by name descendingly:
MATCH (n:person)
RETURN n, n.name AS NAME
ORDER BY NAME DESC
LIMIT 10
Also, more complex expressions can be used in the LIMIT clause, but always without referencing any variable and resolving into a result of INTEGER or LONG type:
MATCH (n:person)
RETURN n, n.age AS AGE, n.name AS NAME
ORDER BY AGE DESC, NAME ASC
LIMIT toInteger(timestamp()/10000000)
The UNION clause performs the union of multiple chained queries. Such queries must always contain the same number of columns and these need to have the same name. For example, the following query returns the union of nodes of type “person” and “movie”, with a column containing their type
MATCH (n:person)
WITH n as ID, type(n) AS TYPE
UNION
MATCH (m:movie)
WITH m as ID, type(m) AS TYPE
RETURN *