WIDM 2002 DSRG, Worcester Polytechnic Institute
1
Honey, I Shrunk the XQuery! —— An XML Algebra Optimization Approach
Xin Zhang, Bradford Pielechand
Elke A. Rundensteiner
WIDM 2002 DSRG, Worcester Polytechnic Institute 2
XML and Relational
XML
Flexible and powerful way to:
1) Represent data on the web
2) Exchange data between applications
Relational Database
1) Widely used to store business data
2) Efficient, reliable, secure3) Provides standard querying
(SQL)
The look and feel of an XML query system combined with the maturity and technology support of RDB
+
WIDM 2002 DSRG, Worcester Polytechnic Institute 3
Tuples
XA
T M
erger
SQL Generator
RDBMS
User XQuery
SQL
XA
T G
enerator
XAT Executor
User Query Results in XML
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
View XAT
User XAT
Architecture
XAT
XAT: XML Algebra Tree
Virtual XML DocumentVirtual XML DocumentVirtual XML Document
View XAT
User XAT
XAT
Virtual XML DocumentVirtual XML DocumentXML Document
WIDM 2002 DSRG, Worcester Polytechnic Institute
4
GOAL: XQuery level optimization
WIDM 2002 DSRG, Worcester Polytechnic Institute 5
<results><title>TCP/IP Illustrated</title><title>Data on the Web</title>
</results>
Running Example
Data on the Web002
TCP/IP Illustrated001
TitleBid
34.95002
65.95001
PriceBid
<prices><row>
<bid>001</bid><price>65.95</price>
</row><row>
<bid>002</bid><price>34.95</price>
</row></prices>
</dxv>
<dxv><book>
<row><bid>001</bid><title>TCP/IP Illustrated</title>
</row><row>
<bid>002</bid><title>Data on the Web</title>
</row></book>
<result>FOR $t IN
document(“prices.xml”)/book/titleRETURN
$t</result>
<prices><book>
<title>TCP/IP Illustrated</title><price>65.95</price>
</book><book>
<title>Data on the Web</title><price>34.95</price>
</book></prices>
<prices>FOR $book IN document(“dxv.xml”)/book/row
$prices IN document(“dxv.xml”)/prices/rowWHERE $book/bid = $prices/bidRETURN
<book>$book/title,$prices/price
</book></prices>
WIDM 2002 DSRG, Worcester Polytechnic Institute 6
T<results>$t</result>col3
Agg
S”prices.xml”R0
R0, book/title$t
col31:
2:
3:
6:
7:
User Query
User XML Algebra Tree (XAT)
<result>FOR $t IN
document(“prices.xml”)/book/titleRETURN
$t</result>
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
WIDM 2002 DSRG, Worcester Polytechnic Institute 7
$book, titlecol10T<prices>col5</prices>
col4
S“dxv.xml” R1
R1, /book/row$book
Agg
T<book> [col10][col12] </book>col5
S“dxv.xml” R3
R3, /prices/row$prices
$prices, pricecol12
11:
12:
22:
23:
25:
14:
15:
20:
21:
31:
$book, bidcol6
$prices, bidcol7
27:
28:
col6=col726:
View Query
View XML Algebra Tree (XAT)
<prices>FOR $book IN document(“dxv.xml”)/book/row
$prices IN document(“dxv.xml”)/prices/rowWHERE $book/bid = $prices/bidRETURN
<book>$book/title,$prices/price
</book></prices>
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
WIDM 2002 DSRG, Worcester Polytechnic Institute 8
T<results>$t</result>col3
Agg
col4 R0
R0, book/title$t
col31:
2:
3:
6:
7:$book, title
col10
T<prices>col5</prices>col4
S“dxv.xml” R1
R1, /book/row$book
Agg
T<book> [col10][col12] </book>col5
S“dxv.xml” R3
R3, /prices/row$prices
$prices, pricecol12
11:
12:
22:
23:
25:
14:
15:
20:
21:
31:
$book, bidcol6
$prices, bidcol7
27:
28:
col6=col726:
User QueryView Query
Merged XML Algebra Tree (XAT)
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
WIDM 2002 DSRG, Worcester Polytechnic Institute 9
Outline XAT Optimization:
XAT Rewrite XAT Cleanup
Preliminary Evaluation Related Work Summary
WIDM 2002 DSRG, Worcester Polytechnic Institute 10
XAT Rewrite Query Optimization at Logic Level. Goal:
Redundancy Elimination. Computation Pushdown.
Technique: Equivalence Rewrite Rules. Heuristics:
Pushdown Navigates Remove Construction of Intermediate Result Combine Multiple Operators.
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
WIDM 2002 DSRG, Worcester Polytechnic Institute 11
T<results>$t</result>col3
Agg
col4 R0
R0, book/title$t
col31:
2:
3:
6:
7: $book, titlecol10
T<prices>col5</prices>col4
S“dxv.xml” R1
R1, /book/row$book
Agg
T<book> [col10][col12] </book>col5
S“dxv.xml” R3
R3, /prices/row$prices
$prices, pricecol12
11:
12:
22:
23:
25:
14:
15:
20:
21:
31:
$book, bidcol6
$prices, bidcol7
27:
28:
col6=col726:
User Query View Query
Before Navigation Pushdown
WIDM 2002 DSRG, Worcester Polytechnic Institute 12
31:
$book, bidcol6
27:
R1, /book/row$book14:
S“dxv.xml” R115:
$book, titlecol1023:
$prices, bidcol7
28:
R3, /prices/row$prices20:
S“dxv.xml” R321:
$prices, pricecol12
25:
T<results>$t</result>col3
Agg
col31:
2:
3:
R0, book/title$t
6:
col6=col726:
T<prices>col5</prices>R011:
Agg
12:
T<book> [col10][col12] </book>col522:
After Navigation PushdownView QueryUser Query
WIDM 2002 DSRG, Worcester Polytechnic Institute 13
After Tagger Cancel Out
JOIN col6=col731:
$book, bidcol6
27:
R1, /book/row$book14:
S“dxv.xml” R115:
$book, title$t23:
$prices, bidcol7
28:
R3, /prices/row$prices20:
S“dxv.xml” R321:
$prices, pricecol12
25:
col31:
T<results>$t</result>col32:
Agg3:
View QueryUser Query
WIDM 2002 DSRG, Worcester Polytechnic Institute 14
Outline XAT Optimization
XAT Rewrite XAT Cleanup
Preliminary Evaluation Related Work Summary
WIDM 2002 DSRG, Worcester Polytechnic Institute 15
XAT Cleanup Why:
SQL engine cannot reduce redundancy in XQuery.
How: Data Redundancy by Schema Cleanup
Each operator produced, consumed and modified some columns.
Minimum schema is then computed. Tree Redundancy by Unused Operator Cutting
Cutting matrix generation. Required columns analysis. Operator cutting.
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
WIDM 2002 DSRG, Worcester Polytechnic Institute 16
XAT Operator Properties Produced
Desc: New column generated by operator. Example: , S, T
Consumed Desc: Columns required by operator. Example: ,
Modified Desc: Columns modified by operator. Example: , ,
WIDM 2002 DSRG, Worcester Polytechnic Institute 17
Schema Computation
{R3}{}{R3}2021
{R3, $prices}{R3}{$prices}2820
{R3, $prices, col7}{$prices}{col7}2528
{R3, $prices, col7, col12}{$prices}{col12}3125
{R1}{}{R1}1415
{R1, $book}{R1}{$book}2714
{R1, $book, col6}{$book}{col6}2327
{R1, $book, col6, $t}{$book}{$t}3123
{R1, $book, col6, $t, R3, $prices, col7, col12}
{col6, col7}
{}331
{R1, $book, col6, $t, R3, $prices, col7, col12}
{}{}23
{col3, R1, $book, col6, $t, R3, $prices, col7, col12}
{$t}{col3}12
{col3}{col3}{}1
Old SchemaConsumedProducedParentNode
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
WIDM 2002 DSRG, Worcester Polytechnic Institute 18
Schema Computation
{R3}P2021
{$prices}CP2820
{$prices, col7}
CP2528
{col7, col12}
CP3125
{R1}P1415
{$book}CP2714
{$book, col6}
CP2327
{col6, $t}CP3123
{$t}CC331*
{$t}23
{col3}CP12
{col3}C1
New SchemaR3$pricescol12R1$bookcol7col6$tcol3Parent()#
*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
Intuition: Don’t keep anything that’s not used later.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
WIDM 2002 DSRG, Worcester Polytechnic Institute 19
Schema Cleanup ResultNode
Original Schema Minimum Schema
1 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}
{col3}
2 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}
{col3}
3 {R1, $book, col6, $t, R3, $prices, col7, col12}
{$t}
31 {R1, $book, col6, $t, R3, $prices, col7, col12}
{$t}
23 {R1, $book, col6, $t} {col6, $t}
27 {R1, $book, col6} {$book, col6}
14 {R1, $book} {$book}
15 {R1} {R1}
25 {R3, $prices, col7, col12} {col7, col12}
28 {R3, $prices, col7} {$prices, col7}
20 {R3, $prices} {$prices}
21 {R3} {R3}
WIDM 2002 DSRG, Worcester Polytechnic Institute 20
XAT Cleanup Schema Cleanup
Each operator produced, consumed and modified some columns.
Minimum schema is then computed. Unused Operator Cutting
Cutting matrix generation. Required columns analysis. Operator cutting.
WIDM 2002 DSRG, Worcester Polytechnic Institute 21
Cutting Matrix Purpose:
Get rid of the unused operators. Equations:
Propagation of modified Propagation of required
Identify cuttable node.
WIDM 2002 DSRG, Worcester Polytechnic Institute 22
Matrix Computation
# Parent()
col3
$t
col6
col7
$book
R1
col12
$prices
R3
Cut?
1 C
2 1 P C
3 2 - - - - - - - - -
31*
3 C C
23 31 P C
27 23 P C
14 27 P C
15 14 P
25 31 P C
28 25 P C
20 28 P C
21 20 P*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
WIDM 2002 DSRG, Worcester Polytechnic Institute 23
Matrix Computation (Cont.1)
P2021
CP2820
CP2528
CP3125
P1415
CP2714
CP2327
CP3123
CC331*
-------M-23
CP12
RRRR1
Cut?R3$pricescol12R1$bookcol7col6$tcol3Parent()#
*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
Intuition: Give me only the required columns in order to get the final result.
WIDM 2002 DSRG, Worcester Polytechnic Institute 24
Matrix Computation (Cont. 2)
# Parent()
col3
$t
col6
col7
$book
R1
col12
$prices
R3
Cut?
1 R R R R
2 1 P C
3 2 - M - - - - - - -
31*
3 C C X
23 31 P C
27 23 P C X
14 27 P C
15 14 P
25 31 P C X
28 25 P C X
20 28 P C X
21 20 P X*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
WIDM 2002 DSRG, Worcester Polytechnic Institute 25
XAT after Cutting
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
Agg
col3
14:
15:
23:
1:
3:
T<results>$t</result>col32:
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
Reduced To
WIDM 2002 DSRG, Worcester Polytechnic Institute 26
SQL Generated
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
Agg
col3
14:
15:
23:
1:
3:
T<results>$t</result>col32:
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27: 28:
14:
15:
20:
21:
31:
23: 25:
1:
2:
3:
SELECT “$book”.title as “$t”, “$book”.bid as “col6”,“$prices”.price as “col12”,“$prices”.bid as “col7”
FROM book “$book”,prices “$prices”
WHERE “col6”=“col7”
SELECT “$book”.title as “$t”, FROM book “$book”,
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
WIDM 2002 DSRG, Worcester Polytechnic Institute 27
Outline XAT Optimization
XAT Rewrite XAT Cleanup
Preliminary Evaluation Related Work Summary
WIDM 2002 DSRG, Worcester Polytechnic Institute 28
Preliminary Evaluation Experiment Setup
XQuery over Kweelt Parser PIII800 256 MB, Win 2k Pro.
Data Setup Synthetic Data Synthetic Queries
Query Execution Native XML Engine.
WIDM 2002 DSRG, Worcester Polytechnic Institute 29
Performance Gain in Execution
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
10 100 1,000 10,000
# of Elements in XML dataset
Tim
e (
ms
)
None Rewrite Cleanup Rewrite+Cleanup
WIDM 2002 DSRG, Worcester Polytechnic Institute 30
Query Engine Overhead
1%42%
2%
55%
Generation(ms)
Rewrite(ms)
Decorrelation(ms)
Cleanup(ms)
XA
T M
erger
SQL Generator
User XQuery
XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT XAT
Rewrite
XAT Cleanup
Total:32,522 ms
WIDM 2002 DSRG, Worcester Polytechnic Institute 31
Outline XAT Optimization
XAT Rewrite XAT Cleanup
Preliminary Evaluation Related Work Summary
WIDM 2002 DSRG, Worcester Polytechnic Institute 32
Related Work Rainbow:
Optimize on XAT. (static analysis) Algebra level rewriting.
SQL Optimization Algebra based optimization. Static analysis.
XQuery by Views: Optimize in SQL. XPERANTO[VLDBJ2000]: XQGM vs. XAT
Extension by UDFs for XML features. SilkRoute[IEEE2001(24:2)]:
Generate SQL Efficiently. AGORA[VLDB2000]:
Syntax level rewriting.
WIDM 2002 DSRG, Worcester Polytechnic Institute 33
Summary Efficient XQuery Processing XML Algebra Tree (XAT) XAT Optimization:
Rewrite by using equivalent rules Cleanup
Schema cleanup Operator cutting
Prototype system implementation.
WIDM 2002 DSRG, Worcester Polytechnic Institute
34
Questions?(Futures!)
http://davis.wpi.edu/dsrg/rainbowhttps://sourceforge
.net/projects/rainbow-engine/
Special Thanks:Brian Murphy, Luping Ding, DSRG group.
WIDM 2002 DSRG, Worcester Polytechnic Institute 35
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
WIDM 2002 DSRG, Worcester Polytechnic Institute 36
Schema ComputationNode
Parent
Produced
Consumed
Minimum Schema
1 {} {col3} {col3}
2 1 {col3} {$t} {col3}
3 2 {} {} {$t}
31 3 {} {col6, col7}
{$t}
23 31 {$t} {$book} {col6, $t}
27 23 {col6} {$book} {$book, col6}
14 27 {$book}
{R1} {$book}
15 14 {R1} {} {R1}
25 31 {col12} {$prices} {col7, col12}
28 25 {col7} {$prices} {$prices, col7}
20 28 {$prices}
{R3} {$prices}
21 20 {R3} {} {R3}
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:28:
14:
15:
20:
21:
31:
23: 25:
1:
2:
3:
WIDM 2002 DSRG, Worcester Polytechnic Institute 37
col31:
T<results>$t</result>col32:
Agg3:
col6=col726:
After Tagger Cancel Out
31:
$book, bidcol6
27:
R1, /book/row$book14:
S“dxv.xml” R115:
$book, title$t23:
$prices, bidcol7
28:
R3, /prices/row$prices20:
S“dxv.xml” R321:
$prices, pricecol12
25:
View QueryUser Query