72
Generating Efficient Execution Plans for Vertically Partitioned XML Databases Patrick Kling, M. Tamer ¨ Ozsu, and Khuzaima Daudjee University of Waterloo David R. Cheriton School of Computer Science VLDB 2011 1

Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Generating Efficient Execution Plans for

Vertically Partitioned XML Databases

Patrick Kling, M. Tamer Ozsu, and Khuzaima Daudjee

University of WaterlooDavid R. Cheriton School of Computer Science

VLDB 2011

1

Page 2: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

The Problem

• Centralized query evaluation techniques for XML wellunderstood

• These techniques do not scale to large collection sizes andheavy workloads

• Goal: use distribution to improve scalability

• Focus on end-to-end cost of query evaluation

2

Page 3: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Distributed XML Query Evaluation: Two Scenarios

• Integrating multiple data sources• Fragmentation is determined by existing data sources• Need flexible fragmentation model to express this

• Distribution for performance• Choose fragmentation to suit workload• Can use more constrained fragmentation model• Fragmentation specification allows for distributed query

optimization

3

Page 4: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Distributed XML Query Evaluation: Two Scenarios

• Integrating multiple data sources• Fragmentation is determined by existing data sources• Need flexible fragmentation model to express this

• Distribution for performance• Choose fragmentation to suit workload• Can use more constrained fragmentation model• Fragmentation specification allows for distributed query

optimization

3

Page 5: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Outline

1 Fragmenting XML Collections

2 Querying Distributed XML CollectionsQuery ModelDistributed Query EvaluationImproving Performance

3 Performance Evaluation

4 Conclusion

4

Page 6: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Outline

1 Fragmenting XML Collections

2 Querying Distributed XML CollectionsQuery ModelDistributed Query EvaluationImproving Performance

3 Performance Evaluation

4 Conclusion

5

Page 7: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Fragmenting XML Collections

• Ad-hoc fragmentation

• Structure-based fragmentation

6

Page 8: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Ad-hoc fragmentation

• Cut arbitrary edges in document tree

• Highly flexible (good for data integration)

• No explicit fragmentation specification

• Limited potential for exploiting fragmentation characteristicsfor query optimization

• Not a suitable choice for this work

7

Page 9: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Structure-based Fragmentation

• Fragmentation according to characteristics of data or schema

• Yields a fragmentation specification that can be exploited forquery optimization

• Better choice when distributing for performance

8

Page 10: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Our Fragmentation Model

• Focus on simplicity and precise fragmentation specification

• Focus on partitioning collection (replication is orthogonal)

• Follow semantics of relational fragmentation techniques• Horizontal fragmentation (based on predicates/selection)• Vertical fragmentation (based on partitioning of

schema/projection)• Hybrid fragmentation (combination of horizontal and vertical

steps)

9

Page 11: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Our Fragmentation Model

• Focus on simplicity and precise fragmentation specification

• Focus on partitioning collection (replication is orthogonal)

• Follow semantics of relational fragmentation techniques• Horizontal fragmentation (based on predicates/selection)• Vertical fragmentation (based on partitioning of

schema/projection)• Hybrid fragmentation (combination of horizontal and vertical

steps)

9

Page 12: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Vertical Fragmentation

author2

P1→213 P1→3

14

f V1

RP1→213

name2

first2

Jane

last2

Dean

f V2

RP1→314

pubs2

f V3

10

Page 13: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Vertical Fragmentation Specification

Vertical fragmentation is specified by a fragmentation schema.

author

agent

OPT

f V1

pubs

book

MULT

f V3

name

first

ONCE

last

ONCE

f V2

chapter

reference

OPT ONCE

f V4

ONCE

ONCE

ONCE

MULT

11

Page 14: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Outline

1 Fragmenting XML Collections

2 Querying Distributed XML CollectionsQuery ModelDistributed Query EvaluationImproving Performance

3 Performance Evaluation

4 Conclusion

12

Page 15: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Query model

XQ, subset of XPath

• Nested paths with child and descendant steps

• Explicit node tests and wild cards

• Value constraints (numeric or textual)• Q := σ | ∗ | Q//Q | Q/Q |Q[q]

q := Q | . = / 6= str | . = / 6= / ≤ / < / ≥ / > num

13

Page 16: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Query Example

“Find all references in publications written by authors whose firstname is ‘William’ and whose last name is ‘Shakespeare’ ”

14

Page 17: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Query Example

“Find all references in publications written by authors whose firstname is ‘William’ and whose last name is ‘Shakespeare’ ”

/author[./name[./first = “William”and

./last = “Shakespeare”]]//reference

14

Page 18: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Query Example

“Find all references in publications written by authors whose firstname is ‘William’ and whose last name is ‘Shakespeare’ ”

/author[./name[./first = “William”and

./last = “Shakespeare”]]//reference

• Node tests

14

Page 19: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Query Example

“Find all references in publications written by authors whose firstname is ‘William’ and whose last name is ‘Shakespeare’ ”

/author[./name[./first = “William”and

./last = “Shakespeare”]]//reference

• Node tests

• Value constraints

14

Page 20: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Query Example

“Find all references in publications written by authors whose firstname is ‘William’ and whose last name is ‘Shakespeare’ ”

/author[./name[./first = “William”and

./last = “Shakespeare”]]//reference

• Node tests

• Value constraints

• Structural constraints

14

Page 21: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Tree Patterns

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

reference

//

15

Page 22: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Tree Patterns

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

reference

//

• Pattern nodes with nodetests and value constraints

15

Page 23: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Tree Patterns

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

reference

//

• Pattern nodes with nodetests and value constraints

15

Page 24: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Tree Patterns

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

reference

//

• Pattern nodes with nodetests and value constraints

• Edges annotated with XPathaxes

15

Page 25: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Tree Patterns

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

reference

//

• Pattern nodes with nodetests and value constraints

• Edges annotated with XPathaxes

• Extraction point nodes

15

Page 26: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 27: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 28: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 29: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 30: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 31: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 32: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 33: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

16

Page 34: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

ae1reference

//

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

[ae1 = reference4]16

Page 35: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Evaluating Tree Pattern Queries

• Various centralized approaches exist• Navigating document trees• Structural joins

• We leverage these for distributed query evaluation

17

Page 36: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Input• Fragmentation-unaware tree pattern query• Fragmentation schema

• Tasks• Annotate tree pattern nodes with corresponding fragments• Decompose tree pattern into sub-patterns for individual

fragments• Convert sub-patterns to local plans using existing techniques

(each site is free to choose local strategy)• Generate distributed execution plan that specifies how results

are combined

18

Page 37: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree pattern nodes

• Decompose tree pattern

• Convert sub-patterns into local plans

• Generate distributed execution plan

author

name

/

first.=’William’

/

last.=’Shakespeare’

/

reference

//

19

Page 38: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree pattern nodes

• Decompose tree pattern

• Convert sub-patterns into local plans

• Generate distributed execution plan

author f V1

name f V2

/

first.=’William’ f V2

/

last.=’Shakespeare’ f V2

/

reference f V4

//

19

Page 39: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree patternnodes

• Decompose tree pattern

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

author f V1

name f V2

/

first.=’William’ f V2

/

last.=’Shakespeare’ f V2

/

reference f V4

//

20

Page 40: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree patternnodes

• Decompose tree pattern

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

author f V1

name f V2

/

first.=’William’ f V2

/

last.=’Shakespeare’ f V2

/

reference f V4

//

author

ap2

P1→2∗

/

ap3

P1→3∗

//

q11(f

V1 )

arp2

RP1→2∗

name

/

first.=’William’

/

last.=’Shakespeare’

/

q21(f

V2 )

arp3

RP1→3∗

ap4

P3→4∗

//

q31(f

V3 )

arp4

RP3→4∗

ae1reference

//

q41(f

V4 )

20

Page 41: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree patternnodes

• Decompose tree pattern

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

author f V1

name f V2

/

first.=’William’ f V2

/

last.=’Shakespeare’ f V2

/

reference f V4

//

author

ap2

P1→2∗

/

ap3

P1→3∗

//

q11(f

V1 )

arp2

RP1→2∗

name

/

first.=’William’

/

last.=’Shakespeare’

/

q21(f

V2 )

arp3

RP1→3∗

ap4

P3→4∗

//

q31(f

V3 )

arp4

RP3→4∗

ae1reference

//

q41(f

V4 )

20

Page 42: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree patternnodes

• Decompose tree pattern

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

⋊⋉id(ap3)=id(arp3 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) p21(f

V2 )

⋊⋉id(ap4)=id(arp4 )

p31(fV3 ) p41(f

V4 )

author

ap2

P1→2∗

/

ap3

P1→3∗

//

q11(f

V1 )

arp2

RP1→2∗

name

/

first.=’William’

/

last.=’Shakespeare’

/

q21(f

V2 )

arp3

RP1→3∗

ap4

P3→4∗

//

q31(f

V3 )

arp4

RP3→4∗

ae1reference

//

q41(f

V4 )

21

Page 43: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree patternnodes

• Decompose tree pattern

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

⋊⋉id(ap3)=id(arp3 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) p21(f

V2 )

⋊⋉id(ap4)=id(arp4 )

p31(fV3 ) p41(f

V4 )

author

ap2

P1→2∗

/

ap3

P1→3∗

//

q11(f

V1 )

arp2

RP1→2∗

name

/

first.=’William’

/

last.=’Shakespeare’

/

q21(f

V2 )

arp3

RP1→3∗

ap4

P3→4∗

//

q31(f

V3 )

arp4

RP3→4∗

ae1reference

//

q41(f

V4 )

21

Page 44: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree patternnodes

• Decompose tree pattern

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

⋊⋉id(ap3)=id(arp3 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) p21(f

V2 )

⋊⋉id(ap4)=id(arp4 )

p31(fV3 ) p41(f

V4 )

author

ap2

P1→2∗

/

ap3

P1→3∗

//

q11(f

V1 )

arp2

RP1→2∗

name

/

first.=’William’

/

last.=’Shakespeare’

/

q21(f

V2 )

arp3

RP1→3∗

ap4

P3→4∗

//

q31(f

V3 )

arp4

RP3→4∗

ae1reference

//

q41(f

V4 )

21

Page 45: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Querying Vertically Distributed XML Collections

• Annotate tree patternnodes

• Decompose tree pattern

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

⋊⋉id(ap3)=id(arp3 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) p21(f

V2 )

⋊⋉id(ap4)=id(arp4 )

p31(fV3 ) p41(f

V4 )

author

ap2

P1→2∗

/

ap3

P1→3∗

//

q11(f

V1 )

arp2

RP1→2∗

name

/

first.=’William’

/

last.=’Shakespeare’

/

q21(f

V2 )

arp3

RP1→3∗

ap4

P3→4∗

//

q31(f

V3 )

arp4

RP3→4∗

ae1reference

//

q41(f

V4 )

21

Page 46: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Improving Distributed Execution Plans

• Pruning irrelevant fragments

• Join order

• Push cross-fragment joins into local plans

22

Page 47: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Improving Distributed Execution Plans

• Pruning irrelevant fragments

• Join order

• Push cross-fragment joins into local plans

22

Page 48: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross-Fragment Joins

Large fraction of local results are discarded by cross-fragment join

RP3→419

chapter2

reference2

RP3→420

chapter3

reference3

RP3→421

chapter4

reference4

f V4

23

Page 49: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross-Fragment Joins

Large fraction of local results are discarded by cross-fragment join

RP3→419

chapter2

reference2

RP3→420

chapter3

reference3

RP3→421

chapter4

reference4

f V4

23

Page 50: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross-Fragment Joins

Large fraction of local results are discarded by cross-fragment join

RP3→419

chapter2

reference2

RP3→420

chapter3

reference3

RP3→421

chapter4

reference4

f V4

• Idea: only access relevant sub-trees in fragment

• Avoid computing irrelevant local results

• Use pipelining to push cross-fragment join into local plan

23

Page 51: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

A Local Query Plan

πarp2

⋉a1/a3

⋉a1/a2

⋊⋉arp2 /a1

scanarp2 :RP1→2∗

scana1:name

σa2=’William’

scana2:first

σa3=’Shakespeare’

scana3:last

p21(f

V2 )

24

Page 52: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

A Local Query Plan

πarp2

⋉a1/a3

⋉a1/a2

⋊⋉arp2 /a1

scanarp2 :RP1→2∗

scana1:name

σa2=’William’

scana2:first

σa3=’Shakespeare’

scana3:last

p21(f

V2 )

• Plan scans root proxynodes in fragment

• Idea: filter these rootproxy nodes beforeevaluating remainderof plan

• Works for navigatingplans and plans basedon structural joins(shown here)

24

Page 53: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

A Local Query Plan

πarp2 ,...

⋉a1/a3

⋉a1/a2

⋊⋉arp2 /a1

⋊⋉

. . . scanarp2 :RP1→2∗

scana1:name

σa2=’William’

scana2:first

σa3=’Shakespeare’

scana3:last

p21(f

V2 )

• Plan scans root proxynodes in fragment

• Idea: filter these rootproxy nodes beforeevaluating remainderof plan

• Works for navigatingplans and plans basedon structural joins(shown here)

25

Page 54: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross-Fragment Joins

p4′

1 (fV4 )

⋊⋉id(ap4)=id(arp4 )

p3′

1 (fV3 )

⋊⋉id(ap3)=id(arp3 )

p2′

1 (fV2 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) scanarp2 :RP1→2

scanarp3 :RP1→3∗

scanarp4 :RP3→4∗

26

Page 55: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross-Fragment Joins

p4′

1 (fV4 )

⋊⋉id(ap4)=id(arp4 )

p3′

1 (fV3 )

⋊⋉id(ap3)=id(arp3 )

p2′

1 (fV2 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) scanarp2 :RP1→2

scanarp3 :RP1→3∗

scanarp4 :RP3→4∗

26

Page 56: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross-Fragment Joins

p4′

1 (fV4 )

⋊⋉id(ap4)=id(arp4 )

p3′

1 (fV3 )

⋊⋉id(ap3)=id(arp3 )

p2′

1 (fV2 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) scanarp2 :RP1→2

scanarp3 :RP1→3∗

scanarp4 :RP3→4∗

26

Page 57: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross-Fragment Joins

p4′

1 (fV4 )

⋊⋉id(ap4)=id(arp4 )

p3′

1 (fV3 )

⋊⋉id(ap3)=id(arp3 )

p2′

1 (fV2 )

⋊⋉id(ap2)=id(arp2 )

p11(fV1 ) scanarp2 :RP1→2

scanarp3 :RP1→3∗

scanarp4 :RP3→4∗

26

Page 58: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross Fragment Joins: Implementation

• Can use full pipelining if both inputs to join are ordered

• Alternatively, can use index on root proxy nodes

• Full parallelism after first tuple received by local plan

27

Page 59: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Pushing Cross Fragment Joins

• Avoids accessing large portion of sub-trees within a fragment

• Can only be fully used in left-deep plans

• Decreases flexibility (e.g., where joins are performed)

28

Page 60: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Label Path Filtering

• Cross-fragment join pushing works well but decreases flexibility

• Goal: find a solution that can obtain partial benefit forscenarios where join pushing cannot be applied

• Idea: use selection instead of join to filter out some root proxynodes

29

Page 61: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Label Path Filtering

• Assign to each proxy node the label path from the documentroot

• Filter for label paths that are compatible with the query

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

30

Page 62: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Label Path Filtering

• Assign to each proxy node the label path from the documentroot

• Filter for label paths that are compatible with the query

author4

name4

first4

William

last4

Shakespeare

pubs4

book4

chapter4

reference4

chapter5

/author/pubs/book

30

Page 63: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Label Path Filtering

⋊⋉id(arp4 )=id(ap4)

⋊⋉id(arp3 )=id(ap3)

⋊⋉id(arp2 )=id(ap2)

p11(fV1 ) p21 ∗ (f

V2 )

p31(fV3 )

p4′

1 (fV4 )

σlabel(arp4 )=/author/pubs/book

scanarp4 =(RP3→4∗

)

• Assume there are twotypes of publications:book and article

• Can use selection tofilter chapters basedon publication type

31

Page 64: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Label Path Filtering

• Can be used in more cases

• Retains higher degree of flexibility

• Benefit is more limited (does not filter all irrelevant root proxynodes)

32

Page 65: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Determining the Best Distributed Execution Plan

• Join pushing and label path filtering are not alwaysadvantageous

• Determine best execution plan using cost model

33

Page 66: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Outline

1 Fragmenting XML Collections

2 Querying Distributed XML CollectionsQuery ModelDistributed Query EvaluationImproving Performance

3 Performance Evaluation

4 Conclusion

34

Page 67: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Performance Evaluation

• Implemented techniques within Natix

• 12 GB XMark collection (auction data)

• 1 Amazon EC2 instance for each of each of 10 verticalfragments

35

Page 68: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Performance Evaluation

• XPathMark queries (with few filtering value constraints)

• Modified, more selective XPathMark queries

A1 /site/closed auctions/closed auction/annotation/description/text/keyword

A2 //closed auction//keyword

A3 /site/closed auctions/closed auction//keyword

A4 /site/closed auctions/closed auction[annotation/description/text/keyword]/date

A5 /site/closed auctions/closed auction[descendant::keyword]/date

A6 /site/people/person[profile/gender and profile/age]/name

B7 //person[profile/@income]/name

A1S /site/closed auctions/closed auction[price > 600]/annotation/description/text/keyword

A2S //closed auction[price > 600]//keyword

A3S /site/closed auctions/closed auction[price > 600]//keyword

A4S /site/closed auctions/closed auction[price > 600][annotation/description/text/keyword]/date

A5S /site/closed auctions/closed auction[price > 600][descendant::keyword]/date

A6S /site/people/person[starts-with(name, ’Ry’)][profile/gender and profile/age]/name

B7S //person[starts-with(name, ’Ry’)][profile/@income]/name

36

Page 69: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Performance Evaluation: XPathMark

200

400

600

800

1000

1200

1400

A1 A2 A3 A4 A5 A6 B7

Res

pons

e tim

e (s

econ

ds)

Plan

centdist

push

37

Page 70: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Performance Evaluation: Selective XPathMark

0

200

400

600

800

1000

1200

1400

A1 A2 A3 A4 A5 A6 B7

Res

pons

e tim

e (s

econ

ds)

Plan

centdist

push

38

Page 71: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

Conclusions

• Distribution can make XML query evaluation more scalable

• Join pushing can significantly improve query performance

• A cost model is essential for finding the optimal technique fora given query

39

Page 72: Generating Efficient Execution Plans for Vertically ...Vertical Fragmentation Specification Vertical fragmentation is specified by a fragmentationschema. author agent OPT fV 1 pubs

References

[1] Patrick Kling, M. Tamer Ozsu, Khuzaima Daudjee: GeneratingEfficient Execution Plans for Vertically Partitioned XMLDatabases, PVLDB 2010.

[2] Patrick Kling, M. Tamer Ozsu, Khuzaima Daudjee: ScalingXML Query Processing: Distribution, Localization and Pruning,DAPD 2011.

40