1 On Efficient Matching of Streaming XML Documents and Queries Laks V.S. Lakshmanan 1 P. Sailaja 2 1...

On Efficient Matching of Streaming XML Documents and Queries

Laks V.S. Lakshmanan1

P. Sailaja2

1 University of British Columbia, Canada

2 Indian Inst. of Tech., Bombay, India (work performed while visiting IIT-Bombay).

UBC, Canada EDBT 2002, Prague. 2

Outline

I. Motivating Applications

II. Problem

III. Dual Index

IV. Algorithms

V. Experiments

VI. Summary & Future Work

Motivating Application 1

Information dissemination in the large Numerous data sources on the web

Traditional means: search and browse

Alternative – publish and subscribe

System matches (new) data to subscribers’

interests

Periodic notification

Motivating Application 2

Supply chain automation Catalog of products and services from

suppliers (data) Registered sets of requirements

(subscriptions) from manufacturing units Notify relevant consumers upon arrival

of new data Other applications include electronic

auctioning, online shopping, etc.

Problem

Matching specifications (of products, services, etc.) to requirements (subscriptions) efficiently.

Specs – akin to data. Requirements – queries. Data may stream through. Quickly determine which

subscribers/users a piece of data is relevant to.

Problem Traditional setting:

Large DB One (at most a few) query at a time

Our problem: A small DB (a tuple, XML doc, etc.) Large no. of queries Dual to traditional problem

Focus of this paper: data = XML docs Queries = a fragment of XPath

Problem (Formalized) Given

an XML document a large number of XPath queries

Determine which queries are answered by each element (formalized using matching)

Query labeling: label each node with sets of queries answered by the subtree rooted there

Naïve Approach doesn’t scale w/ no. of queries Main challenge: small (1 or 2) # passes over

data tree

An Exampe Query

<Result> FOR $p IN

document(“catalog.xml”)//part, $b in $p/brand, $q IN $p//part WHERE A2D IN $q/name AND AMD IN $q/brand RETURN $p </Result>

Problem (An Example)

Dual Index Traditional index – quickly localize

search for data matching query pattern

Dual index – for each primitive pattern,

determine (sub)queries to which they

are relevant

Choice of primitive patterns depends on

type of data (e.g., XML vs. relational)

And on classes of queries considered

(e.g., chains vs. trees)

Tree Dual Index

Primitive “access path” questions to be answered: For a constant c, what are leaf appearances? For a tag t, what are non-leaf appearances? What query

nodes are its pc- and ad-children? Example:

b 2 c 3

b 5 c 6

c 2 a 6

b 5a 4

Index entry for a:

DI(a)[L]: (P, 3, {}), (Q, 6, {4,6})* *

DI(a)[N]: (P, 1, F, {2,3}, {}),

(P, 4, T, {6}, {5}).

Tree Labeling Algorithm – 3 Lists

b 2 c 9

c 3 a 8 d 10

c 4 a 11 b 15

a 5 d 6 c 12 d 13

b 7 b 14

3 lists (conceptually)

TML(u): (Query, query node, DN, ans-node)

PL(u): (P,l,m,x): rel

QL(u): Query Ids

Tree Labeling Algo. – TML base case

b 2 c 9

c 3 a 8 d 10

c 4 a 11 b 15

a 5 d 6 c 12 d 13

b 7 b 14

(P,m,{v1, …, vk}) DI(t)[L]

(P,v1,m,?), …, (P,vk,m,?) TML(u),

whenever u.tag= t;

e.g.: DI(a)[L] has (Q,6,{4,6}).

So, add (Q,4,6,?), & (Q,6,6,?) to TML()

(Q,6,6,?) (Q,6,6,i), i = 1,5, 8, 11.

If vi=m, ? u.

Tree Labeling Algo. – TML PL

b 2 c 9

c 3 a 8 d 10

c 4 a 11 b 15

a 5 d 6 c 12 d 13

b 7 b 14

(P,l,m,x) TML(u)

(P,l,m,x):child PL(parent(u)).

(P,l,m,x):desc PL(anc(u)).

e.g.: (Q,4,6,?) PL(5)

So, (Q,4,6,?):child PL(4).

And (Q,4,6,?):desc PL(i), i= 3, 2, 1.

Optimizations possible, but suppressed.

Tree Labeling Algo. – TML inductive case

b 2 c 9

c 3 a 8 d 10

c 4 a 11 b 15

a 5 d 6 c 12 d 13

b 7 b 14

(P,l,B,C,D) DI(t)[N]

c C: (P,c,m,y):child PL(u) &

d D: (P,d,m,y):rel PL(u)

(P,l,m,x) TML(u).

If l=m, x u.

e.g.: (P,4,T,{6},{5}) DI(a)[N].

(P,6,3,?) TML(12), so (P,6,3,?):child PL(11).Similarly, (P,5,3,?):desc PL(11)

So, (P,4,3,?) TML(11).

Tree Labeling Algo. – QL

b 2 c 9

c 3 a 8 d 10

c 4 a 11 b 15

a 5 d 6 c 12 d 13

b 7 b 14

• TML, PL, feed each other.

•QL – special case of TML

•P QL(u) iff

(P,1,m,x) TML(x).

•e.g.: (P,1,3,9) TML(1),

so P QL(9).

& (Q,1,6,5) TML(2), so

Q QL(5).

Tree Labeling – Summary

labeling completed in two passes pass 1: compute TML/PL (bottom-up) pass 2: compute QL (top-down)

no. of I/O invocations is 2 * # data tree nodes.

Other algorithms in paper: chain labeling chain split labeling of trees

Experiments

matchMaker implementation: JDK1.3 and C++ storage – BerkeleyDB 3.17 dual index stored in disk lists manipulated in memory Intel PIII, 1GB RAM, 512K cache, Linux 7.0

Data sets: generated using IBM’s XML Gen tool conforming to GEDCOM DTD (geological

data) (about 120 elements)

Experiments document depth 10; avg fanout – [2, 5] chain labeling algorithm is at least 5 times

faster than query-at-a-time approach For tree labeling, query-at-a-time doesn’t

produce results in reasonable time! Focus of experiments (for trees):

Direct tree labeling algorithm vs. chain split algorithm (not discussed)

Experiments

Related Work Documents – user profile match (IR) Notion of standing queries – long history:

E.g., Tapesty, TriggerMan, NiagaraCQ, etc. Publish-and-subscribe – Fabret et al. 00,

01. Patterns: boolean combo of relOp comp value

XFilter 00, 01. Only determine if doc contains an answer Multiple answers in one doc not considered

Related Work XTrie approach

Decompose query tree into ad-free chains Index using trie Determine only if a doc contains an answer

Main distinguishing features of matchMaker: Answers located Multiple answers per doc All proposed algorithms – guaranteed

resource bounds (e.g., #passes, I/O)

Summary & Future Work

Matching large no. of queries to XML data trees (as they stream through)

Dual to usual query processing Dual index (chains vs. trees) Algorithms for query labeling of data

trees Making algorithms more efficient (single

pass algorithm for chains: done) Expanding classes of queries handled Algebra for this dual query processing

problem?

1 On Efficient Matching of Streaming XML Documents and Queries Laks V.S. Lakshmanan 1 P. Sailaja 2 1...

Documents

Laks i Kragerøvassdraget - Drangedal

Computational Social Influence€¦ · L14, TXS14, TSX15] Fast heuristics ... with Wei Lu and Laks Lakshmanan 29 SocInf Workshop, IJCAI'2015, July 27, 2015. Competition and complementarity

CPSC 404, Laks V.S. Lakshmanan1 Welcome to CPSC 404 Advanced Relational Databases Instructor: Laks V.S. Lakshmanan Email: laks@cs.ubc.ca Office: ICICS/CICSR

Discovering Social Networks from Enterprise Data Laks V.S. Lakshmanan Based on: Wil M.P. van der Aalst, Hajo A. Reijers, Minseok Song. Discovering Social

Ranking in DB Laks V.S. Lakshmanan Depf. of CS UBC

On Testing Satisfiability of Tree Pattern Queries Laks V.S. Lakshmanan, Ganesh Ramesh, Hui (Wendy) Wang, Zheng (Jessica) Zhao Department of Computer Science

Revenue Maximization in Incentivized Social Advertising · Revenue Maximization in Incentivized Social Advertising Cigdem Aslay Francesco Bonchi Laks V.S. Lakshmanan Wei Lu ISI Foundation

Laks - fra introduksjonsfase til vekstfase?

From Group Recommendations to Group FormationFrom Group Recommendations to Group Formation Senjuti Basu Royy, Laks V. S. Lakshmanan , Rui Liuy. yUniversity of Washington Tacoma, University

Inﬂuence Maximization with Bandits · 2016. 4. 28. · Inﬂuence Maximization with Bandits Sharan Vaswani, Laks V.S. Lakshmanan, Mark Schmidt University of British Columbia fsharanv,laks,schmidtmg@cs.ubc.ca

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University

View-Based Tree-Language Rewritings - UVic.cawebhome.cs.uvic.ca/~thomo/papers/TreeLanguageRewritingsFoIKS2… · View-Based Tree-Language Rewritings Laks Lakshmanan, Alex Thomo University

+ Efficient network aware search in collaborative tagging Sihem Amer Yahia, Michael Benedikt, Laks V.S. Lakshmanan, Julia Stoyanovich Presented by: Ashish

CPSC 404, Laks V.S. Lakshmanan 1 Data Warehousing & OLAP Chapter 25, Ramakrishnan & Gehrke (Sections 25.1-25.10)

Danske Laks og Havoerreder

Amit Goyal Wei Lu Laks V. S. Lakshmanan Simpath: An Efficient Algorithm for Influence Maximization under Linear Threshold Model University of British Columbia

Learning Influence Probabilities in Social Networks 1 2 Amit Goyal 1 Francesco Bonchi 2 Laks V. S. Lakshmanan 1 U. of British Columbia Yahoo! Research

Ferskrøget laks hos Varde Laks A/S - Røgeri - VARDE ......2020/11/10 · VARDE LAKS A/S VARDE LAKS A/S was established in 1987 by Jørn and Lisbeth Jakobsen. In 2005, Thomas Jakobsen

1 Colorful XML: One Hierarchy Isn't Enough Authors : H. V. Jagadish, Laks V. S. Lakshmanan, Monica Scannapieco, Divesh Srivastava, Nuwee Wiwatwattana Presented

Laks V.S. Lakshmanan University of British Columbia Vancouver, Canada laks Joint work with Zeinab Abbassi. Recommender Systems Revisited