Generating Eﬃcient Execution Plans for Vertically ...Vertical Fragmentation Speciﬁcation...

Generating Efficient Execution Plans for

Vertically Partitioned XML Databases

Patrick Kling, M. Tamer Ozsu, and Khuzaima Daudjee

University of WaterlooDavid R. Cheriton School of Computer Science

VLDB 2011

The Problem

• Centralized query evaluation techniques for XML wellunderstood

• These techniques do not scale to large collection sizes andheavy workloads

• Goal: use distribution to improve scalability

• Focus on end-to-end cost of query evaluation

Distributed XML Query Evaluation: Two Scenarios

• Integrating multiple data sources• Fragmentation is determined by existing data sources• Need flexible fragmentation model to express this

• Distribution for performance• Choose fragmentation to suit workload• Can use more constrained fragmentation model• Fragmentation specification allows for distributed query

optimization

Distributed XML Query Evaluation: Two Scenarios

• Integrating multiple data sources• Fragmentation is determined by existing data sources• Need flexible fragmentation model to express this

• Distribution for performance• Choose fragmentation to suit workload• Can use more constrained fragmentation model• Fragmentation specification allows for distributed query

optimization

Outline

1 Fragmenting XML Collections

2 Querying Distributed XML CollectionsQuery ModelDistributed Query EvaluationImproving Performance

3 Performance Evaluation

4 Conclusion

Outline

4 Conclusion

Fragmenting XML Collections

• Ad-hoc fragmentation

• Structure-based fragmentation

Ad-hoc fragmentation

• Cut arbitrary edges in document tree

• Highly flexible (good for data integration)

• No explicit fragmentation specification

• Limited potential for exploiting fragmentation characteristicsfor query optimization

• Not a suitable choice for this work

Structure-based Fragmentation

• Fragmentation according to characteristics of data or schema

• Yields a fragmentation specification that can be exploited forquery optimization

• Better choice when distributing for performance

Our Fragmentation Model

• Focus on simplicity and precise fragmentation specification

• Focus on partitioning collection (replication is orthogonal)

• Follow semantics of relational fragmentation techniques• Horizontal fragmentation (based on predicates/selection)• Vertical fragmentation (based on partitioning of

schema/projection)• Hybrid fragmentation (combination of horizontal and vertical

steps)

Our Fragmentation Model

• Focus on simplicity and precise fragmentation specification

• Focus on partitioning collection (replication is orthogonal)

• Follow semantics of relational fragmentation techniques• Horizontal fragmentation (based on predicates/selection)• Vertical fragmentation (based on partitioning of

schema/projection)• Hybrid fragmentation (combination of horizontal and vertical

steps)

Vertical Fragmentation

author2

P1→213 P1→3

RP1→213

first2

RP1→314

Vertical Fragmentation Specification

Vertical fragmentation is specified by a fragmentation schema.

author

chapter

reference

OPT ONCE

Outline

4 Conclusion

Query model

XQ, subset of XPath

• Nested paths with child and descendant steps

• Explicit node tests and wild cards

• Value constraints (numeric or textual)• Q := σ | ∗ | Q//Q | Q/Q |Q[q]

q := Q | . = / 6= str | . = / 6= / ≤ / < / ≥ / > num

Query Example

“Find all references in publications written by authors whose firstname is ‘William’ and whose last name is ‘Shakespeare’ ”

Query Example

/author[./name[./first = “William”and

./last = “Shakespeare”]]//reference

Query Example

• Node tests

Query Example

• Node tests

• Value constraints

Query Example

• Node tests

• Value constraints

• Structural constraints

Tree Patterns

author

first.=’William’

last.=’Shakespeare’

reference

Tree Patterns

author

reference

• Pattern nodes with nodetests and value constraints

Tree Patterns

author

reference

Tree Patterns

author

reference

• Edges annotated with XPathaxes

Tree Patterns

author

reference

• Edges annotated with XPathaxes

• Extraction point nodes

Evaluating Tree Pattern Queries

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

author

ae1reference

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

[ae1 = reference4]16

• Various centralized approaches exist• Navigating document trees• Structural joins

• We leverage these for distributed query evaluation

Querying Vertically Distributed XML Collections

• Input• Fragmentation-unaware tree pattern query• Fragmentation schema

• Tasks• Annotate tree pattern nodes with corresponding fragments• Decompose tree pattern into sub-patterns for individual

fragments• Convert sub-patterns to local plans using existing techniques

(each site is free to choose local strategy)• Generate distributed execution plan that specifies how results

are combined

• Annotate tree pattern nodes

• Decompose tree pattern

• Convert sub-patterns into local plans

• Generate distributed execution plan

author

reference

• Annotate tree pattern nodes

• Convert sub-patterns into local plans

• Generate distributed execution plan

author f V1

name f V2

first.=’William’ f V2

last.=’Shakespeare’ f V2

reference f V4

• Annotate tree patternnodes

• Convert sub-patternsinto local plans

• Generate distributedexecution plan

author f V1

name f V2

reference f V4

author f V1

name f V2

reference f V4

author

P1→2∗

P1→3∗

RP1→2∗

RP1→3∗

P3→4∗

RP3→4∗

ae1reference

author f V1

name f V2

reference f V4

author

P1→2∗

P1→3∗

RP1→2∗

RP1→3∗

P3→4∗

RP3→4∗

ae1reference

⋊⋉id(ap3)=id(arp3 )

p11(fV1 ) p21(f

p31(fV3 ) p41(f

author

P1→2∗

P1→3∗

RP1→2∗

RP1→3∗

P3→4∗

RP3→4∗

ae1reference

p11(fV1 ) p21(f

p31(fV3 ) p41(f

author

P1→2∗

P1→3∗

RP1→2∗

RP1→3∗

P3→4∗

RP3→4∗

ae1reference

p11(fV1 ) p21(f

p31(fV3 ) p41(f

author

P1→2∗

P1→3∗

RP1→2∗

RP1→3∗

P3→4∗

RP3→4∗

ae1reference

p11(fV1 ) p21(f

p31(fV3 ) p41(f

author

P1→2∗

P1→3∗

RP1→2∗

RP1→3∗

P3→4∗

RP3→4∗

ae1reference

Improving Distributed Execution Plans

• Pruning irrelevant fragments

• Join order

• Push cross-fragment joins into local plans

Improving Distributed Execution Plans

• Pruning irrelevant fragments

• Join order

• Push cross-fragment joins into local plans

Pushing Cross-Fragment Joins

Large fraction of local results are discarded by cross-fragment join

RP3→419

chapter2

reference2

RP3→420

chapter3

reference3

RP3→421

chapter4

reference4

RP3→419

chapter2

reference2

RP3→420

chapter3

reference3

RP3→421

chapter4

reference4

RP3→419

chapter2

reference2

RP3→420

chapter3

reference3

RP3→421

chapter4

reference4

• Idea: only access relevant sub-trees in fragment

• Avoid computing irrelevant local results

• Use pipelining to push cross-fragment join into local plan

A Local Query Plan

πarp2

⋉a1/a3

⋉a1/a2

⋊⋉arp2 /a1

scanarp2 :RP1→2∗

scana1:name

σa2=’William’

scana2:first

σa3=’Shakespeare’

scana3:last

A Local Query Plan

πarp2

⋉a1/a3

⋉a1/a2

⋊⋉arp2 /a1

scana1:name

σa2=’William’

scana2:first

scana3:last

• Plan scans root proxynodes in fragment

• Idea: filter these rootproxy nodes beforeevaluating remainderof plan

• Works for navigatingplans and plans basedon structural joins(shown here)

A Local Query Plan

πarp2 ,...

⋉a1/a3

⋉a1/a2

⋊⋉arp2 /a1

⋊⋉

. . . scanarp2 :RP1→2∗

scana1:name

σa2=’William’

scana2:first

scana3:last

• Plan scans root proxynodes in fragment

• Idea: filter these rootproxy nodes beforeevaluating remainderof plan

• Works for navigatingplans and plans basedon structural joins(shown here)

1 (fV4 )

1 (fV3 )

1 (fV2 )

p11(fV1 ) scanarp2 :RP1→2

1 (fV4 )

1 (fV3 )

1 (fV2 )

1 (fV4 )

1 (fV3 )

1 (fV2 )

1 (fV4 )

1 (fV3 )

1 (fV2 )

Pushing Cross Fragment Joins: Implementation

• Can use full pipelining if both inputs to join are ordered

• Alternatively, can use index on root proxy nodes

• Full parallelism after first tuple received by local plan

Pushing Cross Fragment Joins

• Avoids accessing large portion of sub-trees within a fragment

• Can only be fully used in left-deep plans

• Decreases flexibility (e.g., where joins are performed)

Label Path Filtering

• Cross-fragment join pushing works well but decreases flexibility

• Goal: find a solution that can obtain partial benefit forscenarios where join pushing cannot be applied

• Idea: use selection instead of join to filter out some root proxynodes

• Assign to each proxy node the label path from the documentroot

• Filter for label paths that are compatible with the query

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

• Assign to each proxy node the label path from the documentroot

• Filter for label paths that are compatible with the query

author4

first4

William

Shakespeare

chapter4

reference4

chapter5

/author/pubs/book

⋊⋉id(arp4 )=id(ap4)

p11(fV1 ) p21 ∗ (f

p31(fV3 )

1 (fV4 )

σlabel(arp4 )=/author/pubs/book

scanarp4 =(RP3→4∗

• Assume there are twotypes of publications:book and article

• Can use selection tofilter chapters basedon publication type

• Can be used in more cases

• Retains higher degree of flexibility

• Benefit is more limited (does not filter all irrelevant root proxynodes)

Determining the Best Distributed Execution Plan

• Join pushing and label path filtering are not alwaysadvantageous

• Determine best execution plan using cost model

Outline

4 Conclusion

Performance Evaluation

• Implemented techniques within Natix

• 12 GB XMark collection (auction data)

• 1 Amazon EC2 instance for each of each of 10 verticalfragments

Performance Evaluation

• XPathMark queries (with few filtering value constraints)

• Modified, more selective XPathMark queries

A1 /site/closed auctions/closed auction/annotation/description/text/keyword

A2 //closed auction//keyword

A3 /site/closed auctions/closed auction//keyword

A4 /site/closed auctions/closed auction[annotation/description/text/keyword]/date

A5 /site/closed auctions/closed auction[descendant::keyword]/date

A6 /site/people/person[profile/gender and profile/age]/name

B7 //person[profile/@income]/name

A1S /site/closed auctions/closed auction[price > 600]/annotation/description/text/keyword

A2S //closed auction[price > 600]//keyword

A3S /site/closed auctions/closed auction[price > 600]//keyword

A4S /site/closed auctions/closed auction[price > 600][annotation/description/text/keyword]/date

A5S /site/closed auctions/closed auction[price > 600][descendant::keyword]/date

A6S /site/people/person[starts-with(name, ’Ry’)][profile/gender and profile/age]/name

B7S //person[starts-with(name, ’Ry’)][profile/@income]/name

Performance Evaluation: XPathMark

A1 A2 A3 A4 A5 A6 B7

centdist

Performance Evaluation: Selective XPathMark

A1 A2 A3 A4 A5 A6 B7

centdist

Conclusions

• Distribution can make XML query evaluation more scalable

• Join pushing can significantly improve query performance

• A cost model is essential for finding the optimal technique fora given query

References

[1] Patrick Kling, M. Tamer Ozsu, Khuzaima Daudjee: GeneratingEfficient Execution Plans for Vertically Partitioned XMLDatabases, PVLDB 2010.

[2] Patrick Kling, M. Tamer Ozsu, Khuzaima Daudjee: ScalingXML Query Processing: Distribution, Localization and Pruning,DAPD 2011.

Generating Eﬃcient Execution Plans for Vertically ...Vertical Fragmentation Speciﬁcation...

Documents

Fragmentation and vertical intra-industry trade in East Asiavenus.unive.it/mvolpe/Articolo 3.pdf · Fragmentation and vertical intra-industry ... two-digit level of the Harmonized

Surviving IPv6 Fragmentation · IPv6 and Packet Fragmentation IPv6 made two major changes to IP’s handling of packet fragmentation: •The fragmentation control header has been

Neck fragmentation

UG Fragmentation

Design SBDT-Vertical Fragmentation file2 Pendekatan GROUPING (pendekatankelompok) dimulaidenganmenempatkansetiap attributkedalamsebuahfragmen yang lain sampaikriterianyaterpenuhi SPLITTING

Materials: Projectable showing horizontal and … · Materials: Projectable showing horizontal and vertical fragmentation I. What is a Distributed Database System? ... remotely distributed

Globalization & Fragmentation

Rock Fragmentation

070326 Fragmentation

RTOF Tutorial 4. Fragmentation patterns Molecules are dissociated/ionized by electron impact Fragmentation pattern depends on electron energy Fragmentation

Chessboard Distributions and Random Vectors with Speciﬁed Marginals and Covariance ... · 2014. 1. 17. · Chessboard Distributions and Random Vectors with Speciﬁed Marginals

Fragmentation Functions Fragmentation Functions and Fragmentation Processes Stefan Kretzer Brookhaven National Laboratory & RIKEN-BNL XXXIV International

QRS Fragmentation

habitat fragmentation & CORRIDORS X - lbprastdp.staff.ipb ...lbprastdp.staff.ipb.ac.id/files/2010/12/Habitat-fragmentation.pdf · habitat fragmentation & CORRIDORS habitat fragmentation

A Review on Fragmentation Techniques in Distributed Database€¦ · Keywords - Fragmentation, Distributed Database System, Horiz ontal fragmentation algorithm, Vertical fragmentation

I. Fragmentation des habitats Fragmentation et passages à

Luftfahrtkarte ICAO – Carte aéronautique OACI ... · Airspace vertical limits in ft or FL as speciﬁed in each case. 7185 3000m 4200m 4800m 3600m 2400m 1800m 1200m 900m ... Projection

Fragmentation & Advertising

Distributed DBMS architecture - DoACT, AKTmazsola.iit.uni-miskolc.hu/tempus/discom/doc/db/tema04.pdf · Distributed DBMS architecture ... • (Vertical) fragmentation is similar to

Digital Fragmentation