53
XML Native Query Processing Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003

XML Native Query Processing

Embed Size (px)

DESCRIPTION

XML Native Query Processing. Chun-shek Chan Mahesh Marathe Wednesday, February 12, 2003. Topics. XML Indexing “Accelerating XPath Location Steps” Torsten Grust, ACM SIGMOD 2002 XML Query Optimization - PowerPoint PPT Presentation

Citation preview

Page 1: XML Native Query Processing

XML Native Query Processing

Chun-shek ChanMahesh Marathe

Wednesday, February 12, 2003

Page 2: XML Native Query Processing

Topics

• XML Indexing– “Accelerating XPath Location Steps”

Torsten Grust, ACM SIGMOD 2002

• XML Query Optimization– “Multi-level Operator Combination in XML

Query Processing”Shurug Al-Khalifa and H.V. Jagadish,ACM CIKM 2002

Page 3: XML Native Query Processing

XML Query Languages

• XPath– Developed by the World Wide Web

Consortium– Version 1.0 became a W3C Recommendation

on November 16, 1999– Version 2.0 is a working draft.

Page 4: XML Native Query Processing

XML Query Languages

• XQuery– Developed by the World Wide Web

Consortium as well– Currently a working draft

Page 5: XML Native Query Processing

Axes on XPath Tree

• There are 13 axes according to the XPath 2.0 Technical Report– Forward Axes

• child, descendant, attribute, self,descendant-or-self, following-sibling, following, namespace (deprecated)

– Reverse Axes• parent, ancestor, preceding-sibling, preceding,

ancestor-or-self

Page 6: XML Native Query Processing

XML Traversal and Storage

• Tree-based traversal

• Efficient storage is challenging– Especially for relational databases, which

deals with tuples and is not designed to handle recursion or nested elements

Page 7: XML Native Query Processing

Proposed Solutions

• “Querying XML Data for Regular Path Expressions”Li and Moon, VLDB 2001

• “A Fast Index for Semistructured Data”Cooper, Sample, Franklin, Hjaltason and Shadmon, VLDB 2001

• “DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases”Goldman and Widom, VLDB 1997

Page 8: XML Native Query Processing

Problems withProposed Solutions

• Solutions focus on support of / and // location steps. Inadequate support for XPath.

• Proposals rely on technologies outside the relational domain.

Page 9: XML Native Query Processing

Author’s Proposal

• XPath Accelerator

• Works entirely within relational database.

• Uses traditional relational syntax for queries.

• Benefits from advanced index technologies, such as R-tree.

Page 10: XML Native Query Processing

XPath Tree Traversal

• Context Node: starting point of any traversal

• Location Steps: syntactically separatedby /, evaluated from left to right– A step’s axis establishes a subset of

document nodes (a document region)

Page 11: XML Native Query Processing

XPath Forward Axes

• Child• Descendant• Attribute• Self• Descendant-or-self• Following-sibling• Following• Namespace

Page 12: XML Native Query Processing

XPath Reverse Axes

• Parent

• Ancestor

• Preceding-sibling

• Preceding

• Ancestor-or-self

Page 13: XML Native Query Processing

Sample XML Tree

a

b f

c

e d

g h

i j

Page 14: XML Native Query Processing

Encoding XMLDocument Regions

• Formula:v/descendant v/descendant v/following v/preceding v/self

• Each node appears once in this formula

• What are the ways to uniquely identify different nodes?

Page 15: XML Native Query Processing

Numbering Nodes

• Grust: Find out preorder and postorder rank posts

• Tatarinov: Global, Local, Dewey

• Li & Moon: Order-size pairs

Page 16: XML Native Query Processing

XML Document Regions

0

1

2

3

4

5

6

7

8

9

0 1 2 3 4 5 6 7 8 9

Preorder Rank Post

Po

sto

rder

Ran

k P

ost

a

b

c

de

f

g

h

ij

• Descendants?• Ancestors?• Preceding?• Following?

a

b f

c

d e

g h

i j

Page 17: XML Native Query Processing

XPath Tree Node Descriptor

• desc(v) = {pre(v),post(v),par(v),att(v),tag(v)}

• window(α,v) ={condition for each field in desc()}

• Example:window(child,v) = {(pre(v),∞),[0,post(v)),pre(v),false,*}

Page 18: XML Native Query Processing

XPath Query Windows

Axis α pre post par att tag

Child (pre(v),∞) [0,post(v)) pre(v) false *

Descendant (pre(v),∞) [0,post(v)) * false *

Desc-or-self [pre(v),∞) [0,post(v)] * false *

Parent par(v) (post(v),∞) * false *

Ancestor [0,pre(v)) (post(v),∞) * false *

Anc-or-self [0,pre(v)] [post(v),∞) * false *

Following (pre(v),∞) (post(v),∞) * false *

Preceding (0,pre(v)) (0,post(v)) * false *

Fol-sibling (pre(v),∞) (post(v),∞) par(v) false *

Prec-sibling (0,pre(v)) (0,post(v)) par(v) false *

Attribute (pre(v),∞) [0,post(v)) pre(v) true *

Page 19: XML Native Query Processing

XPath Evaluation

• Given an XPath expression e, an axis α, and a node v, we can evaluate this:– query(e/α) =

SELECT v’,*FROM query(e) v, accel v’WHERE v’ INSIDE window(α,v)

• This pseudo-SQL code can be flattened into a plain relational query with a flatn-ary self-join.

Page 20: XML Native Query Processing

XML Instance Loading

• Loading XML Instance into the database means mapping its nodes into the descriptor table.

• Can use callback procedures described in text to load element nodes into relational table.

• Make separate table for element contents.

Page 21: XML Native Query Processing

Potential Issues

• Insertion of node– Need to renumber all nodes to reflect

changes

• Deletion of node– Only need to remove its entry in accelerator

table

Page 22: XML Native Query Processing

Node Descriptor Indexing

• Efficiently supported by R-trees.

• Can also be supported by B-trees.

Page 23: XML Native Query Processing

Example of pre/postrank distribution

Page 24: XML Native Query Processing

Shrink-wrapping the //-axis

• Optimizing window for descendant axis

• For each node, we need to determine the ranges of pre and post ranks for its leftmost and rightmost leaf nodes.

• For any node v in a tree t, we havepre(v) − post(v) + size(v) = level(v)

• For a leaf node v’, size(v’) = 0, thereforepre(v’) − post(v’) = level(v’) ≤ height(t)

Page 25: XML Native Query Processing

Shrink-wrapping the //-axis

• For the rightmost leaf v’ of node v:post(v) = post(v’) + (level(v’) − level(v))

• Using the previous equations, we have:pre(v’) ≤ post(v) + height(t)

• For the leftmost left v’’ of node v, we have a similar result:post(v’’) ≥ pre(v) − height(t)

• Can use these formula to shrink windows

Page 26: XML Native Query Processing

Shrink-wrapping the //-axis

• Original window{ (pre(v),∞), [0,post(v)), *, false, * }

• New window{ (pre(v),post(v)+height(t)], [pre(v)−height(t),post(v)), *, false, * }

• Similar techniques can be used to optimize the query windows of other axes.

Page 27: XML Native Query Processing

Shrink-wrapping the //-axis

Page 28: XML Native Query Processing

Finding Leavesin an XML Tree

Page 29: XML Native Query Processing

XPath Traversals with and without shrunk windows

Query ShrunkNot

Shrunk#

Nodes

//open_auction//description 0.2 53 120

//open_auction//description//listitem 0.32 55.5 126

//open_auction//description//listitem//keyword 0.34 124 90

Page 30: XML Native Query Processing

XPath Acceleratorv. Edge Map

Page 31: XML Native Query Processing

R-Tree v. B-Tree

Page 32: XML Native Query Processing

Performance for the ancestor axis

Page 33: XML Native Query Processing

Performance: XPath Accelerator v. EE/EA-Join

1.150.7

5.41

7

0

1

2

3

4

5

6

7

Tim

e [s

]

Shakespeare[//ACT//SPEECH]

NITF [//block/attribute::dir]

Xpath Accel EE/EA

Page 34: XML Native Query Processing

Capabilities ofXPath Accelerator

• Runs on top of a relational backend to leverage its stability, scalability, and performance.

• Supports the whole family of XPath axes in an adequate manner.

• To originate XPath traversals in arbitrary context nodes.

• Provides the groundwork for an effective cost-estimation for XPath queries.

Page 35: XML Native Query Processing

XML Query Optimization

• Macro-level algebra: manipulates sets of trees directly– heavyweight, but more directly expressive

• Micro-level algebra: manipulates sets of elements

• In both algebra, basic operators are “intuitive” unit operations such as selections, projections, joins and set operations.

Page 36: XML Native Query Processing

XQuery Expression and Pattern Tree

Page 37: XML Native Query Processing

Macro-algebra

• A macro-algebra would implement this entire expression as a single pattern-tree based selection operator (to select matching books), followed by a projection operator (to return titles).

Page 38: XML Native Query Processing

Micro-algebra

• A micro-algebra would break up the selection pattern into one selection operator per node (e.g. (tag=“book”), (tag=“year” && content > 1995)) and one containment join operator per edge.

• Result of sequence of joins would then be projected on the book element, after which its title can be obtained as before.

Page 39: XML Native Query Processing

Query Processing Implementation

1. Identify lists of candidate elements in the database to match each node in the specified structural pattern.

2. Find combinations of candidate elements, one from each list, that satisfy the required structural relationships.

3. Apply any conditions that involve multiple nodes in the structural pattern to eliminate some combinations.

Page 40: XML Native Query Processing

Containment Join

• Given two sets of elements U and V, a containment join returns pairs of elements (u,v) such that– u U and v V– u “contains” v

• i.e. node u is an ancestor of node v in the tree representation

Page 41: XML Native Query Processing

Containment Join Implementation

• Three main options:– Scan the entire database– Use an index to find candidate nodes for one

end of the join, and navigate from there– Use indices to find candidate nodes for both

ends of the join, and compute a containment join between these candidate sets

Page 42: XML Native Query Processing

Projection Merging

Page 43: XML Native Query Processing

Set Operations

• Union compatibility is not an issue.– In the relational world, union compatibility is

an important consideration with respect to set operations.

– In XML, since heterogeneous collections are allowed, this is not an issue.

Page 44: XML Native Query Processing

Union in XML

• Give two pattern trees PT1 and PT2, let PTC be a common component of the two pattern trees such that:– PT1 − PTC = PT’1 and PT2 − PTC = PT’2

where PT’1 and PT’2 are both trees

– Node i in PTC has node j in PT’1 such that edge (i,j) is in PT1, if and only if node i also has some node k in PT’2 such that edge (i,k) is in PT2.

Page 45: XML Native Query Processing

Different PatternTrees and Plans

Page 46: XML Native Query Processing

Micro-operator Merging: New Access Methods

• At macro-level, we considered a pattern tree selection as a single heavyweight operator.

• At micro-level, the approach is to break up a pattern tree selection into multiple containment join operators.

Page 47: XML Native Query Processing

Performance: Union

020406080

100120140160180200

No Push Push

Page 48: XML Native Query Processing

Performance: Intersection

0

20

40

60

80

100

120

No Push Push

Page 49: XML Native Query Processing

Performance byQuery Structure

1

10

100

1000

10000

Pair Twig Chain1 Chain2

Union No Push Union Push

Intersection No Push Intersection Push

Page 50: XML Native Query Processing

Parent-Child Join Performance

0

10

20

30

40

50

60

70

Not Pushed Macro Push Micro Push

Page 51: XML Native Query Processing

Ancestor-Descendant Join Performance

0

10

20

30

40

50

60

Not Pushed Macro Push Micro Push

Page 52: XML Native Query Processing

Performance Comparison for Different Pushes

Page 53: XML Native Query Processing

Conclusions

• It is not enough to consider XML query optimization purely at the micro-algebra or purely at the macro-algebra level, with simple operators.

• One has to consider access methods for combination of operators, switching between the micro and macro levels as needed.