26
1 Querying Infinite Databases Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and Vianu ’97) Itay Maman 049011 Student Symposium, 5 July 2006

Querying Infinite Databases

  • Upload
    kellsie

  • View
    58

  • Download
    0

Embed Size (px)

DESCRIPTION

Querying Infinite Databases. Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90) Queries and Computation on the Web (Abiteboul and Vianu ’97). Itay Maman 049011 Student Symposium, 5 July 2006. Simple Technion Queries…. (Domain: The Technion’s students database) - PowerPoint PPT Presentation

Citation preview

Page 1: Querying Infinite Databases

1

Querying Infinite Databases

Safety of Datalog Queries over infinite Databases (Sagiv and Vardi ’90)Queries and Computation on the Web (Abiteboul and Vianu ’97)

Itay Maman049011 Student Symposium, 5 July 2006

Page 2: Querying Infinite Databases

2/19

Simple Technion Queries…

(Domain: The Technion’s students database)

• Q1: Which courses did Gidi attend? SELECT course FROM students WHERE name='Gidi'

• Q2: Which students took 234218? SELECT name FROM students WHERE course='234218'

coursescourse name234218 Gidi236703 Gidi234218 Dina… …

Page 3: Querying Infinite Databases

3/19

Simple Web Queries…

• Q3: Which pages does my home page link to? SELECT target FROM links WHERE source='www.geocities.com/mysite'

• Q4: Which pages link to my home page? SELECT source FROM links WHERE target='www.geocities.com/mysite'

• Q4 is challenging: No matter how long my web-crawler works… … I can never find all incoming links of a page! This is an infinite query

• The more you crawl the more answers you get (In Q3 the size of the result set is bounded)

linksSource target www.google.com www.google.co.il www.geocities.com/mysite www.ynet.co.il www.cnn.com www.geocities.com/mysite … …

Page 4: Querying Infinite Databases

4/19

Leading questions

• What does an infinite DB look like? • Can we evaluate a query over an infinite DB?• Can we determine the finiteness of a query?

• But first, some Datalog…

Page 5: Querying Infinite Databases

5/19

Datalog• Why Datalog?

Supports recursion/transitive closure (unlike SQL)• Recursion is essential in large data-sets

Terminates if DB is finite Very simple

• program = A collection of rules• rule = A sequence of terms

• In our program: Three rules Two queries (AKA: IDB): g(X), small(X,Y) One Table (AKA: EDB): before(X,Y) A goal predicate from which execution starts

• We choose g(X) as the goal

g(X) :- small(X,2).small(X,Y) :- before(X,Y).small(X,Y) :- small(X,Z), before(Z,Y).

Page 6: Querying Infinite Databases

6/19

Finiteness

• A DB is finite If every table is a finite set before(X,Y) { (0,1), (1,2), (2,3) }

• Possible evaluation schemes: Brute force Bottom up

• Optimizations

•The Requirement: Finiteness of tables

•The guarantee: Termination of the Datalog program

Page 7: Querying Infinite Databases

7/19

Infinity

• Here is another definition for our table before(X,Y) { (X,X+1) | X 0 }

• We now have an infinite DB The Problem: we cannot iterate over the tuples in the set The solution: Top-down algorithm

• Such tables are quite common The internet links relation

links(X,Y) { (X,Y) | page X links to page Y } Java’s subclassing relation

extends(X,Y) { (X,Y) | class X extends Y }

Leading question:What does as infinite DB look like?

Page 8: Querying Infinite Databases

8/19

Example: Top-down evaluation

g(W) = s(W,2) = b(W,2) s(W,Z) b(Z,2) = {(1,2)} s(W,1) {(1,2)} = {(1,2)} [b(W,1) s(W,Z) b(Z,1)] {(1,2)} = {(1,2)} [{(0,1)} s(W,0) {(0,1)}] {(1,2)} = {(1,2)} [{(0,1)} [b(W,0) s(W,Z) b(Z,0)] {(0,1)}]

{(1,2)} = {(1,2)} [{(0,1)} [ s(W,Z) ] {(0,1)}] {(1,2)} = {(1,2)} [{(0,1)} {(0,1)}] {(1,2)} = {(1,2)} {(0,1)} {(1,2)} = {(1,2)} {(0,2)} = {(1,2),

(0,2)}

g(W) :- small(W,2).small(A,B) :- before(A,B).small(X,Y) :- small(X,Z), before(Z,Y).before(X,Y) { (X,X+1) | X 0 }

•b : before•s : small : Join

s(X,Y) = b(X,Y) s(X,Z) b(Z,Y)

Page 9: Querying Infinite Databases

9/19

Top-down evaluation• The Top-down algorithm

Init: assign r body of the goal Loop:

• (Intelligently) Pick a term, t, from r• If t is a query term:

Replace it with the union of the rules indicated by t• If t is a table term:

Replace it with the set generated by the table• Replace s expressions (in r) with • Replace s expressions (in r) with s• Evaluate relational algebra expressions (if both sides are known)

Stop if no further replacements can be made

Leading question:Can we evaluate a query over an infinite DB?

Yes

Page 10: Querying Infinite Databases

10/19

Infinite Queries• Can the top-down algorithm run forever?

Yes

• Case 1: An table that returns an infinite result evenProduct(X,Y) { (X,Y) | X*Y mod 2 = 0 } divides(X,Y) { (X,Y) | X mod Y = 0 } links(X,Y) { (X,Y) | page X links to page Y }

• weak-safety: all intermediate results are finite

• Result #1 (Sagiv and Vardi ’90): Weak-safety is decidable given F/C (finiteness constraints) of tables

• F/C of evenProduct: None• F/C of divides: X => Y• F/C of links: X => Y

Algorithm: Tracking flow of values from assigned variables

Page 11: Querying Infinite Databases

11/19

g(W) = s(2,W) = b(2,W) s(2,Z) b(Z,W) = {(2,3)} s(2,Z) b(Z,W) = {(2,3)} [b(2,Z) s(2,Z’) b(Z’,Z)] b(Z,W)…

Infinite Queries (cont.)• Can the top-down algorithm run forever?

Yes

• Case 2: The algorithm’s recursion never stops A query/table is used in its “unbounded” direction

g(W) :- small(2,W).small(A,B) :- before(A,B).small(X,Y) :- small(X,Z), before(Z,Y).before(X,Y) { (X,X+1) | X 0 }

s(X,Y) = b(X,Y) s(X,Z) b(Z,Y)

• Results #2-3 (Sagiv and Vardi ’90): Termination is undecidable in the general case Termination is decideable if all queries are unary

Page 12: Querying Infinite Databases

12/19

Infinite Queries (summ.)

• We can automatically determine weak-safety• We cannot (automatically) determine termination

• But, one can analytically prove that a given query over a given DB is finite E.g., our small(W,2) program

Leading question:Can we determine the finiteness of a query?

No

Page 13: Querying Infinite Databases

13/19

The Web as a DB

• The web data model (WDM): A scheme of a DB that can represent the web graph Just three tables:

urls = { u | u is a url of a web-page }links = { (u1,u2) | u1 links to u2; u1, u2 urls }Words = { (u,w) | w appears in page u; u urls }

• Result #4 (Abiteboul and Vianu ’97): If a Datalog program with no literals halts over

an infinite DB, its result is • => A non-trivial query (over an infinite DB) must have a literal

Page 14: Querying Infinite Databases

14/19

Web - Machines

• Browsing Machine A weakly safe Datalog program (over WDM) At least one URL literal

• Searching/Browsing Machine An unsafe Datalog program (over WDM)

• Evaluates queries in parallel Allowed literal types: URLs, Words

• Claims #1-2 (Abiteboul and Vianu ’97): Browsing machine:

• Represent a user following static links from a page Searching/Browsing machine:

• Also allows the user to access search engine

Page 15: Querying Infinite Databases

15/19

Discussion: Finite approximation• Relational Database servers are very popular

Such DBs are finite

• Also, computing a table on demand may be slow Better performance at batch processing

The challenge: Build a finite replacement for an infinite DB

• Formally: Given a finite query, q, over an infinite DB,

• (Finiteness of q proved analytically) Build a finite Database, , such that q over yield the

same result as q over

Page 16: Querying Infinite Databases

16/19

Discussion: Finite approximation

• Example: Our small(W,2) program A finite, sound table: before(X,Y) { (0,1), (1,2) } A finite, unsound table: before(X,Y) { (0,1) }

• The process: Compute the transitive closure of the before relation Start from the literal ‘2’ at the right-hand side position

• Condition: the table graph must end with a sink In before the sink is the vertex ‘0’

• => We can build a finite DB Sadly, In the web-graph no such sink exists

Page 17: Querying Infinite Databases

17/19

Discussion: Temporality• Crawling takes time• The subject may change while crawling

The DB is a snapshot which never happened

• (Open Question):• Can we decide whether a result was really “true”

at some point?

Page 18: Querying Infinite Databases

18/19

More issues

• Relational algebra over large relations BDD

• Negation Stratified Datalog

Page 19: Querying Infinite Databases

19/19

- Questions ? -

Page 20: Querying Infinite Databases

20/19

Page 21: Querying Infinite Databases

21/19

Datalog

• Semantics: ???• Straight forward mapping to Relational

Algebra??

g(X) :- small(X,2).small(X,Y) :- before(X,Y).small(X,Y) :- small(X,Z), before(Z,Y).

Page 22: Querying Infinite Databases

22/19

Example: Bottom-up evaluation

beforeX Y0 11 22 3

Initialization: Translate the EDBs into relations

Page 23: Querying Infinite Databases

23/19

Example: Bottom-up evaluation

smallX Y0 11 22 3

apply small(X,Y) :- before(X,Y).beforeX Y0 11 22 3

Page 24: Querying Infinite Databases

24/19

Example: Bottom-up evaluation

beforeZ Y0 11 22 3

apply small(X,Y) :- small(X,Z), before(Z,Y).lessX Z0 11 22 3

smallX Y0 11 22 30 21 3

Join

smallX Z0 11 22 3

beforeZ Y0 11 22 3

smallX Z0 11 22 30 21 3

smallX Z0 11 22 30 21 3

smallX Z0 11 22 30 21 30 3

Page 25: Querying Infinite Databases

25/19

Example: Bottom-up evaluation

apply g(X) :- small(X,2).smallX Y0 11 22 30 21 30 3

gX10

smallX Y0 11 22 30 21 30 3

Page 26: Querying Infinite Databases

26/19

Finitenessbefore(X,Y) { (0,1) (1,2) (2,3) }

• The Bottom-up algorithm: Init:

• For each EDB, p, assign r(p) Relation of all tuples satisfying p• For each IDB, p, assign r(p)

Loop:• Choose a rule p(…) :- t1(…), t2(…), … tn(…)• t join of all r(ti), where 1 i n• r(p) r(p) t

Continue until a fix-point is reached•Requires: Finiteness of EDBs•Ensures: Termination