34
A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

  • View
    242

  • Download
    2

Embed Size (px)

Citation preview

Page 1: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

A Framework for Using Materialized XPath Views inXML Query Processing

Dapeng HeWei Jin

Page 2: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Introduction XML languages, such as XQuery, XSLT and SQL/XML, employ

XPath as the search and extraction language. XPath expressions often define complicated navigation, resulting in expensive query processing, especially when executed over large collections of documents. As a result, optimization of XPath expressions is vital to efficiently process XML queries.

This paper proposes a framework for exploiting materialized XPath views to process XML queries. It develops an XPath matching algorithm to determine when such views can be used to answer a user query containing XPath expressions.

Page 3: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Introduction

There are two main problems associated with answering XML queries using materialized XPath views. First, an XPath query containment is required to make sure that a view can be used to answer a query. Second, a compensation expression needs to be constructed, that would compute the query result using the information available from the view.

Page 4: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Introduction We address the XPath query containment problem with an XPath

matching algorithm. The containment problem was shown to be NP complete for a restricted subset of XPath. We propose an efficient polynomial-time matching algorithm which is sound and works in most practical cases.

The algorithm is based on the observation that a total node mapping from view nodes to query nodes implies containment for conjunctive XPath expressions. We build on the same observation, but extend it to a more functional subset of XPath that includes value predicates, disjunction and the axes allowed in XQuery.

Page 5: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

XPath Matching Algorithm Here we present an algorithm to decide if a given XPath view can

be utilized in a user query. The algorithm finds tree mappings between the view and the query expression trees, and records them in a match structure. If a mapping exists then the view can potentially be used to evaluate the XPath expression in the user query.

In the remainder of this presentation, we first introduce our XPath representation, then describe the basic algorithm, followed by an extension to handle comparison predicates.

Page 6: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

XPath Representation We represent XPath expressions as labeled binary trees, called XPS

trees. An XPS node is labeled with its axis and test, where axis is either the special "root", or one of the 6 axes allowed in XQuery: "child", "descendant", "self", "attribute", "descendant-or-self", or "parent". The test is either a name test, a wildcard test, or a kind test.

The first child of an XPS node is called predicate, and it can be a conjunction (and), a disjunction (or), a comparison operator (<, ≤, >, ≥, =, ≠, eq, ne, lt, le, gt, ge), a constant, or an XPath Step (XPS) node. The second child, called next, points to the next step, and is always an XPS node.

Page 7: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Examples of Xpath and XPS Tree

Page 8: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

XPS Tree Construction To consider the special need for construction of XPS tree, we

define the structure of XPS node including Axis, Test, and Sequence Number field using Java from scratch without using any auxiliary tool. Meanwhile using this node structure to express the predicate including conjunction (and), a disjunction (or), a comparison operator (<, ≤, >, ≥, =, ≠, eq, ne, lt, le, gt, ge) and a constant.

To deal with the complication of the XPath expression, We use recursion method to parse the Xpath expression to build subtrees that can handle the complicate predicate condition. For Example: the predicate of an XPath step may contain a nested XPath expression; multiple conjunction, disjunction or comparison operators appearing in predicate conditions.

Page 9: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Example of XPS Tree Structure View= //order[lineitem/@price>130 and @count>100 and itemNum=10] root root 1 descendant order 2 predicate AND 0 predicate > 0 child lineitem 5 attribute price 6 predicate 130 0 predicate AND 0 predicate > 0 attribute count 11 predicate 100 0 predicate = 0 child itemNum 15 predicate 10 0

Here to handle multiple conjunction and the predicate of an XPath step

containing a nested XPath expression

Page 10: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Example of XPS Tree Structure View = "//order[@price>150 and discount[count>10 and

itemNum=100] and ordeNum=101]"; root root 1 descendant order 2 predicate AND 0 predicate > 0 attribute price 5 predicate 150 0 predicate AND 0 child discount 9 predicate AND 0 predicate > 0 child count 12 predicate 10 0 predicate = 0 child itemNum 16 predicate 100 0 predicate = 0 child ordeNum 21 predicate 101 0Here we handle nested predicate condition and multiple “And”.

Page 11: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Basic Matching Algorithm The algorithm described here traverses both the view and the query

expression trees and computes all possible mappings from XPS nodes of the view to XPS nodes of the query expression, in a single top-down pass of the view tree.

The table below summarizes the basic algorithm in terms of the four functions used. Every function of the table evaluates to Boolean. The algorithm is invoked by the initial call matchStep(v.root, q.root), and there exists a match if this call evaluates to true. The first rule whose condition is satisfied is fired for each function.

Page 12: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Basic Matching Algorithm

Page 13: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Basic Matching Algorithm Using this algorithm, we can handle the situation where the query

expression can be more restrictive than the view definition.

For example, the view V = //* [@*], which contains all XML element nodes which have an attribute, can be used to evaluate Q1 = //order/lineitem[@price and discount] as shown in Figure. Dotted lines denote the mapping.

Rule 1.2 says that if one disjunct of pred is mapped by a node v , then v also has to map to some node in the other disjunct of Q. For example, the same V of Figure cannot be used to evaluate the expression Q=//order/lineitem[@price or price], which asks for lineitem nodes, which have either a price attribute or a price element.

Page 14: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Basic Matching Algorithm

When the view node contains a \descendant“ axis, we need to keep looking for matches down in the tree, even if the current query expression node matches (rules 1.3). For example, in Figure 2, we will try to map XPS2(//*) to XPS5 (//order), XPS6(/lineitem), and XPS9(/discount).

Page 15: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Basic Matching Algorithm

Page 16: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Recording the Match Why do we need to record matching?

basic matching algorithm may generate exponential number of tree mappings.Example: View: //a//a…//aQuery: /a/a../aMight have distinct tree mappings

Redundant informationmatchStep() function would be called multiple times with same parameters.

Page 17: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Recording the Match What to record?

Match matrix structurerow: XPS nodes of query

column: XPS nodes of viewcell: pair of view and query XPS tree node.possible values: “empty”, “true”, “false”

Directed edges between cellsMeaning: Representing the context in mappingExplanation: edge (i,j) (k,l) means matchStep( , ) was called from matchStep( , )

This is a DAG (Directed Acyclic Graph): matching process is in top-down manner.

kv jqiv jq

Page 18: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Recording the Match

Benefits: Reduce run-time to polynomial

It is possible to handle comparison predicates

Page 19: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Example of Match Matrix

view=//order/*[@price]query = //order[LineItem/@price]

root 1

//order 2

/* 3

@price 4

view tree

root 5

//order 6

/LineItem 7

@price 8

query tree

Page 20: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Example of Match Matrix

Q V

root 5 //order 6 //LineItem 7 @price 8

root 1 True

//order 2 True False

/* 3 True

@price 4 True

Page 21: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Comparison Predicates Format L op R L and R could be either XPS nodes or a constant. op could be <, <= ,> ,>= ,=

Some logic constrains V =//order/* [@price > 60] Q =//order[lineitem/@price > 30] View can not be used to answer Query.

Handling Comparison Predicates

Page 22: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Handling Comparison Predicates

Example V =//order/* [@price > 60]

root 1

//order 2

/* 3

60 6

view tree

@price 5

> 4

Page 23: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Two Types of Comparisons Local predicates

n op constant (@price >60) Intra-document join

n op m (@price > @salary) Normalization

Local predicatesReplace comparison operator with sub-tree from n Add comparison into filter list

Intra-document joinReplace comparison operator with “and”

Handling Comparison Predicates

Page 24: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Handling Comparison Predicates

Examples of local predicate:V =//order/* [@price > 60]

root 1

//order 2

/* 3

60 6

view tree

@price 5

> 4@price 5 Filter: “5”, ”>”, “60”

Page 25: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Handling Comparison Predicates

Examples of intra-document join:V =//order/*[@price > @salary]

root 1

//order 2

/* 3

@salary 6

view tree

@price 5

> 4

@salary 6@price 5

AND 4

Filter: “5”,”>”,”6”

Page 26: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Handling Comparison Predicates

Check restriction for local predicatesV: …@price>60…

Q: …@price>40…

Fail to pass “restriction check” Check restriction for intra-document join

V: …salary <= bonus[christmas]

Q: …salary and bonus[christmas]

Fail to pass “restriction check”

Page 27: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Matching Intradocument Joins Clean up in intra-document join

Remove all dangling edges for which either source or target matrix cell is not set to true.

Remove orphan node matches, i.e., matrix cells with value true that do not have at least one incoming edge, are set to false.

Page 28: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Matching Intradocument Joins Clean up example:V =//a[@b > @c]; Q =//a[@b > @c]/a[@b and @c]

root 1

//a 2

AND 3

view tree

@c 5@b 4

root 6

//a 7

AND 8

query tree

@c 10@b 9AND 12

@c 14@b 13

/a 11

Filter: 4,>,5 Filter: 9,>,10

Page 29: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Matching Intradocument JoinsClean-up example continue: Matching matrix

QV

root 6 //a 7 @b 9 @c 10 /a 11 @b 13 @c 14

root 1 T

//a 2 T T

@b 3 T T

@c 4 T T

Page 30: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Complexity of the Algorithm Size of the match matrix is O( | V | * | Q | )

V and Q are the number of XPS nodes in the view and query expressions respectively.

Number of edges in DAG is O( |V| * |Q| 2) Each matrix cell can have at most |Q| incoming edges (by construction

an edge (i, j) (l, k) may exist only if vi is the parent of vl). Thus the number of edges in the DAG is O( |V| * |Q| 2)

Page 31: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Complexity of the Algorithm The cost of constructing the matrix is also polynomial

The matchStep function has only |V| * | Q | distinct sets of parameters By definition of a match matrix, the same pair of nodes cannot be

matched more than once In the worst case (rule 1.3) a function call may expand into | Q |

function calls Thus the algorithm runs in O( |V | * | Q | 2) time.

Page 32: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

References A Framework for Using Materialized XPath Views in XML Query

Processing

Andrey Balmin Fatma Ä Ozcan Kevin S. Beyer Roberta J. Cochrane Hamid Pirahesh IBM Almaden Research Center, San Jose CA

S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing queries with materialized views. In Proceedings of ICDE, pages 190-200, 1995.

A. Deutsch and V. Tannen. Containment and integrity constraints for xpath. In Proceedings of KRDB, 2001.

J. Goldstein and P. Larson. Optimizing queries using materialized

views: A practical, scalable solution. In Proceedings of SIGMOD, Santa Barbara, CA, 2001.

Page 33: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

References A. Y. Levy, A. O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering

queries using views. In Proceedings of PODS, pages 95-104, 1995. G. Miklau and D. Suciu. Containment and equivalence for an xpath

fragment. In Proceedings of PODS, pages65-76, 2002.

Page 34: A Framework for Using Materialized XPath Views in XML Query Processing Dapeng He Wei Jin

Questions? &

Thank you