22
Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA The 15 th International World Wide Web Conference May 2006 Edinburgh, Scotland

Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Embed Size (px)

Citation preview

Page 1: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML

Shuohao Zhang and Curtis DyresonSchool of E.E. and Computer Science

Washington State UniversityPullman, Washington, USA

The 15th International World Wide Web ConferenceMay 2006

Edinburgh, Scotland

Page 2: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

• Hierarchical model vs. relational model• Codd: symmetric exploitation of data

part/project works on some, but not all

• Path expressions are asymmetric• Currently, all XML query languages use path expressions

1970’s Database Controversy

Part

Project

Project

Part

Commit

Project Part

Page 3: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Querying Data with Path Expressions

• Task Find books by E. F. Codd

• XQuery return doc("author.xml")//author[name= 'E. F. Codd']/book

name

author

book book

titletitle publisherpublisher price price

Addison Wesley Academic PressDB 46.95 Automata 9.99

E. F. Codd

Page 4: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Same Data, Different Structure

• Same task Find books by E. F. Codd

• Need different XQuery return doc("book.xml")//book[author/name='E. F. Codd']

publisher

book book

titletitle authorauthor price price

Addison Wesley

DB 46.95 Automata 9.99name

E. F. Codd

publisher

Academic Pressname

Codd

name

author

book book

titletitle publisherpublisher price price

Addison Wesley Academic PressDB 46.95 Automata 9.99

E. F. Codd

Page 5: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Goal

• Make same query work on different structures

• Useful when there is lack of schema knowledge heterogeneous data irregular data schema evolution

• Factor off problem of different label sets, others are working on it

Page 6: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Existing Axes are Directional

preceding followingdescendent

ancestor

self

Page 7: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Proposal: A Non-directional Axis

preceding followingdescendent

ancestor

self

Page 8: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Proposal: A Non-directional Axis

preceding followingdescendent

ancestor

self

Page 9: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Proposal: A Non-directional Axis

preceding followingdescendent

ancestor

self

Page 10: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

The Closest Axis• Syntax

closest:: ->name is abbreviation for closest::name

• Semantics a function that takes a context node and returns a sequence of

closest nodes

Page 11: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Closest Axis of the First Title

• closest::* Returns a list of five nodes

• closest::price Returns the first price node

name

author

book book

titletitle publisherpublisher price price

Page 12: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

• Node selection restricted by minimal type distance The minimal distance between a title and a price is 2

• closest::price Returns an empty list

When the First Book Lacks a Price

name

author

book book

titletitle publisherpublisher price

Page 13: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

• closest::name for each book?

• Root-to-node path type author/name author/book/publisher/name

Type Distance is Crucial

name

author

book book

titletitle publisherpublisher price

name

Page 14: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Querying with the Closest Axes

Closest axis-enabled

XQuery evaluation

engine

Query

Same query --return doc("any.xml")->author[->name='E. F. Codd']->book

Query Result#2

Result#3

Query

Result#1

Page 15: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Querying with Directional Axes

XQuery

evaluation

engine

Query#1 -- return doc("author.xml")//author[name= 'E. F. Codd']/book

Query#2 -- ……

Query#3 -- return doc("book.xml")//book[author/name='E. F. Codd']

Result#2

Result#3

Result#1

Page 16: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Find the closest price for title Non-directional expression closest::price

Directional (path) expression parent::*/child::price

In-memory Implementation

• Naïve approach Compute Closest for every node Time complexity is O(sn2)

s: number of labels in the signature n: number of nodes

• Converting to a path expression

name

author

book

title publisher price

Page 17: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Experiment

• Compare directional vs. nondirectional for $b in doc("bib.xml")//title/closest::publisher

return $b

for $b in doc("bib.xml")//title/..//publisher

return $b

• Implemented closest in

eXist (an XML DBMS)

0

200

400

600

800

1000

1200

1400

1600

2500

0

5000

0

7500

0

1000

00

1250

00

1500

00

Number of Nodes

Tim

e (m

illi

seco

nd

s)

descendant

closest

Page 18: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Persistent Implementation

• Take advantage of type indexes• LCA-join

Every Closest pair related via an LCA Idea is to merge lists of types

O(sn)

Page 19: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Related Work

• Data integration TSIMMIS

Garcia-Molina et al. (Journal of Intelligent Information Systems 1997) YAT

Christophides, Cluet, Simèon (SIGMOD Record June 2000) Silkroute

Fernandez, Tan, Suciu (WWW 2000)

• LCA-related techniques Schmidt, Kersten, Windhouwer (ICDE 2001) Cohen, Mamou, Kanza, Sagiv (VLDB 2003) Li, Yu, Jagadish (VLDB 2004)

Page 20: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Related Research Projects

• XML Restructuring Zhang, Dyreson (IIWeb 2006)

• XML Compaction Zhang, Dyreson, Dang (DASFAA 2006)

• Common theme – symmetric exploitation!

Page 21: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Symmetrically Exploiting XML: Zhang, Dyreson

Conclusion

• Current XQuery depends on path expressions

• A path expression is directional (asymmetric) May break down if structure changes

• The closest axis is non-directional (symmetric) Simple in syntax

Can be easily integrated in XQuery Can be implemented efficiently

In-memory Persistent

Page 22: Symmetrically Exploiting XML Shuohao Zhang and Curtis Dyreson School of E.E. and Computer Science Washington State University Pullman, Washington, USA

Thank You!