52
Nested subqueries and subquery chaining Stefan Plantikow [email protected] Petra Selmer [email protected]

[email protected] Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow [email protected] Petra Selmer [email protected]. Subqueries are self-contained

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Nested subqueries andsubquery chaining

Stefan [email protected] [email protected]

Page 2: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Subqueries are self-contained queries running within the scope of an outer query

<outer-query>

{

<inner-query>

}

<next-outer-query> (continued)

2

What?

Page 3: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

1. Read-only nested subqueries2. Read/Write updating subqueries3. Set operations4. Chained subqueries

Not in this talk: Subqueries in expressions (scalar, lists, existential)

3

What?

Page 4: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

CIP2017-04-20 - Query combinators for set operationshttps://github.com/opencypher/openCypher/pull/227

CIP2016-06-22 - Nested, updating, and chained subquerieshttps://github.com/opencypher/openCypher/pull/100

+ CIPs for subqueries in expressions not covered by this talk

4

Relevant CIPs

Page 5: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Queries are easier to:

• construct

• maintain

• read

5

Why?

Page 6: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Subqueries enable:

• Composition of query pipelines

• Programmatic query composition

• Post-processing of results

• Multiple write actions for each record

6

Why?

Page 7: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

7

1. Read-only, nested subqueries

Page 8: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<inner-query>

Any complete, read-only query "... RETURN" enclosed within { }

May be uncorrelated or correlated(<inner-query> may use variables from the outer query)

Read-only subqueries may be nested at an arbitrary depth

8

Preliminaries

Page 9: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Regular subqueriesMATCH { <inner-query> }

Optional subqueriesOPTIONAL MATCH { <inner-query> }

Mandatory subqueriesMANDATORY MATCH { <inner-query> }

9

Variants

Page 10: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<inner-query> is evaluated for each record from <outer-query>

Variables varouter from <outer-query> are visible to <inner-query>

Variables varinner are introduced and returned by <inner-query>(as output records)

Variables from varouter and varinner are available to <next-outer-query>(as result records)

10

Semantics

Page 11: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<inner-query> may omit variables from varouterBut omitted variables are re-added to the final result records

<inner-query> may shadow and thus alter any variable in varouter

But CIP recommends warning if <inner-query>: • Discards a variable vi in varouter

• Re-introduces vi

11

Semantics

Page 12: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

result records available to the <next-outer-query>

All output records returned by the <inner-query>amended with missing variables from varouter

12

Semantics: Regular MATCH

Page 13: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

result records available to the <next-outer-query>

All output records returned by the <inner-query>amended with missing variables from varouter

Generate an error if no output records are returned by <inner-query>

13

Semantics: Mandatory MATCH

Page 14: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

result records available to the <next-outer-query>

{ All output records returned by the <inner-query> (if at least one of these), or a single record with the samefields as the output records, where any vi in varinner is set to

null

}

and amended with missing variables from varouter14

Semantics: Optional MATCH

Page 15: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

MATCH {

MATCH ...

RETURN *

UNION

MATCH ...

RETURN *

}

WITH *

WHERE ...

RETURN *

15

Example: Post-UNION processing

Page 16: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

MATCH (u:User {id: $userId})

MATCH (f:Farm {id: $farmId})-[:IS_IN]->(country:Country)

MATCH {

MATCH (u)-[:LIKES]->(b:Brand)-[:PRODUCES]->(p:Lawnmower)

RETURN b.name AS name

UNION

MATCH (u)-[:LIKES]->(b:Brand)-[:PRODUCES]->(v:Vehicle)

WHERE v.leftHandDrive = country.leftHandDrive

RETURN b.name AS name

}

RETURN f, name

16

Example: Correlated subquery

Page 17: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

17

2. Read/Write updating subqueries

Page 18: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<updating-query>

Can be any updating query, and may not end with RETURN(No data is returned)

18

Preliminaries

Page 19: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<updating-query>

May be uncorrelated or correlated(<updating-query> may use variables from the outer query)

Updating subqueries may be nested at an arbitrary depth Read-only nested subqueries may not contain updating subqueries

FOREACH removed from Cypher - now obsolete19

Preliminaries

Page 20: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Simple updating subqueriesDO { <updating-query> }

Conditionally-updating subqueriesDO

[WHEN <predicate> THEN { <updating-query> }+

[ELSE { <updating-query> }]

END

20

Variants

Page 21: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<updating-query> is run for each incoming record

Executing DO does not affect the cardinality

Input records are passed on to <next-outer-query>

Conditional DO:This happens whether or not the incoming record is eligiblefor processing by <updating-query>

21

Semantics

Page 22: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

A query can end with a DO subquery in the same way as an updating query currently can end with any update clause

All varinner introduced by <updating-query> are suppressed by DO

DO can be nested within another DO

22

Semantics

Page 23: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

DO

[WHEN <predicate> THEN { <updating-query> }+

[ELSE { <updating-query> }]

END Conditions in WHEN evaluated in order <updating-query> evaluated for the first true condition, falling through to ELSE if no true conditions found

Otherwise, no updates will occur

23

Semantics: conditional variant

Page 24: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Old version:

MATCH (r:Root)

FOREACH(x IN range(1, 10) |

MERGE (c:Child {id: x})

MERGE (r)-[:PARENT]->(c)

)

24

Example: FOREACH vs DO

New version:

MATCH (r:Root)

UNWIND range(1, 10) AS xDO {

MERGE (c:Child {id: x})

MERGE (r)-[:PARENT]->(c)

}

Page 25: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

MATCH (r:Root)

UNWIND range(1, 10) AS xDO WHEN x % 2 = 1 THEN {

MERGE (c:Odd:Child {id: x})MERGE (r)-[:PARENT]->(c)

}ELSE {

MERGE (c:Even:Child {id: x})MERGE (r)-[:PARENT]->(c)

}

END

25

Example: Conditional DO

Page 26: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

26

3. Set operations

Page 27: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Set operations combine results from two queries into one

<left-hand-query>

<SET-OPERATION>

<right-hand-query>

27

What?

Page 28: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

UNION

UNION ALL

UNION MAX

INTERSECT

INTERSECT ALL

EXCEPT

EXCEPT ALL

EXCLUSIVE UNION

EXCLUSIVE UNION MAX28

What?

Page 29: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

• Rule: <left-hand-query> and <right-hand-query> return same variables (fields) in same order

• UNION set union• INTERSECT set intersection• EXCEPT set difference• EXCLUSIVE UNION set union

Set operation semantics

29

Page 30: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Set operation semantics

30

• UNION

• INTERSECT

Page 31: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Set operation semantics

31

• EXCLUSIVE UNION

• EXCLUDE

Page 32: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Multiset operations

• UNION ALL n + k• INTERSECT ALL min(n, k)• EXCLUDE ALL max(0, n-k)

• UNION MAX max(n, k)• EXCLUSIVE UNION MAX max(n, k) - min(n,k)

Page 33: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

• Rule: <left-hand-query> and <right-hand-query> return same variables (fields) in same order

• UNION ALL multiset union• INTERSECT ALL multiset intersection• EXCEPT ALL multiset difference (0-bounded)

Multiset operation semantics

33

Page 34: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

• Rule: <left-hand-query> and <right-hand-query> return same variables (fields) in same order

• UNION MAX max-bounded multiset union(largest number of duplicates from either input query)

• EXCLUSIVE UNION MAX exclusive union(excess duplicates from either input query over the other)

Multiset operation semantics

34

Page 35: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

All set operations are left-associative

Q1 UNION Q2 INTERSECT Q3

is equivalent to

((Q1 UNION Q2) INTERSECT Q3)

35

Using multiple set operations

Page 36: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

• OTHERWISE [ALL] computes the logical choice between<left-hand-query> and <right-hand-query>

• Becomes <left-hand-query> if non-empty,<right-hand-query> otherwise

• OTHERWISE does not preserve duplicates• OTHERWISE ALL preserves duplicates

• Rule: <left-hand-query> and <right-hand-query> return same variables (fields) in same order

OTHERWISE [ALL]

36

Page 37: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

• CROSS computes the cross product (cartesian product) between<left-hand-query> and <right-hand-query>

• Rule: <left-hand-query> and <right-hand-query> must return different variables

CROSS

37

Page 38: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

38

4. Chained subqueries

Page 39: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Chain arbitrary queries:

<query1> composedWith <query2> composedWith <query3> ...

Returned variables from <queryN> are passed on to <queryN+1>

39

What?

Page 40: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

• Query pipeline construction:• Factor out subqueries• Post-union processing• Execute read-only query after updating query

• Programmatic composition:• result.cypher("WITH a")

40

Why?

Page 41: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

MATCH {

MATCH {

MATCH {

...

// death by curly

...

}

}

}

41

What won't work

● Requires deep nesting

● Would require putting updating queries inside read-only queries (instead of after them)

Page 42: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

MATCH ...RETURN ...UNION

MATCH ...RETURN ...

WITH

...MATCH ...RETURN ...

42

Our proposal

Recast WITH after RETURN as query combinator (similar to UNION)

● Linear flow of subqueries

● Post-union processing

Page 43: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<query-without-combinator>

<query-combinator>

<query-without-combinator>

...

43

Our proposal

Integrate query combinators and set operations

● Halve level of nesting

Page 44: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<query1>

WITH ...

<query2>

WITH ...

<query3>

...

44

Data-dependent composition

Pass variables and rows from one query to the next

Page 45: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<query1>

THEN

<query2>

THEN

<query3>

...

45

Data-independent composition

Passes no variables and recordsfrom one query to the next

Instead:Reduces cardinality to single empty record

Page 46: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

WITH ... <query>

Declares expected input variablesUseful for programmatic composition: result.cypher("WITH a")

THEN ... <query>

Drops incoming records(Reduces cardinality to single empty record)

46

WITH and THEN at start of query

Page 47: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Like set operations, chained subqueries are left-associative!

<query1> WITH ... <query2> THEN <query3> ...

is equivalent to

(((<query1> WITH ... <query2>) THEN <query3>) ...)

47

Complex chained queries

Page 48: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

<queryN> cannot contain set operations or query combinatorsBut it can contain nested subqueries again!

MATCH { ... } RETURN ...

UNION

MATCH { ... } RETURN ...

WITH ...

RETURN ...

48

Complex chained queries

Page 49: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

49

Discussion

Page 50: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Multiple Graphs add returning of graphs besides variables

Set operations are defined over sets of recordsSet operations may also be defined for graphs=> This extends to Cypher with support for multiple graphs

Chaining is about passing variables to the follow-up query=> This extends to Cypher with support for multiple graphs

50

Extension to multiple graphs

Page 51: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

Adding subqueries to Cypher will massively increase expressivity

Chained subqueries provide a powerful composition mechanism

Subqueries naturally extends to Cypher for multiple graphs

51

Summary

Page 52: petra.selmer@neo4j.org Petra Selmer … · 2019-10-18 · subquery chaining Stefan Plantikow stefan.plantikow@neo4j.org Petra Selmer petra.selmer@neo4j.org. Subqueries are self-contained

52

Thank you