48
CS848: Topics in Databases: Foundations of Query Optimization Topics Covered Databases QL Query containment More on QL

CS848: Topics in Databases: Foundations of Query Optimization Topics Covered Databases QL Query containment More on QL

Embed Size (px)

Citation preview

CS848: Topics in Databases: Foundations of Query Optimization

Topics Covered

Databases

QL

Query containment

More on QL

CS848: Topics in Databases: Foundations of Query Optimization

A simple case of finding a query plan

Subsystem3

Subsystem2

Subsystem1

SQL

Global Schema

A single table: T

Open, Scan

SQL Server

A “local as view (LAV) integration schema”: T ´ Q2. User submits Q1. Query optimizer must determine if a scan of T suffices. True iff Q1 is equivalent to Q2.

CS848: Topics in Databases: Foundations of Query Optimization

In the beginning …

Infinite countable sets of each of the following kinds of symbols:

C = {C1, C2, … } (primitive concepts)

A = {A1, A2, …} [ {B1, B2, …} (attributes)

R = {R1, R2, … } (roles)

Conventions: Attributes (resp. primitive concepts and roles) correspond to words in lower case or to positive integers (resp. words in upper case and words in mixed case).

CS848: Topics in Databases: Foundations of Query Optimization

For a particular database I

h, (¢)Ii

where is a countable possibly infinite domain, and where for each symbol

(C)I µ

(A)I : !

(R)I µ ( £ )

CS848: Topics in Databases: Foundations of Query Optimization

Partial Databases (Aboxes)

e

e : {C1, … , Cn}

e2e1

A

e2e1

R

e 2

e 2 (Ci)I

(A)I(e1) = e2

(e1, e2) 2 (R)I

e2e1 e1 e2

e2e1 e1 e2

CS848: Topics in Databases: Foundations of Query Optimization

Relational Databases

“John”“Mary”

3333

name ageEMP

e1 : {EMP}

e2 : {EMP}

33

“Mary”

“John”name

name

age

CS848: Topics in Databases: Foundations of Query Optimization

Relational Databases (cont’d)

{e1, e2} µ (EMP)I

(name)I(e1) = “John”

(age)I(e1) = (age)I(e2) = 33

{e1, e2, “John”, 33, “Mary”} µ

e1 e2

“John” ? 33

e1 : {EMP}

e2 : {EMP}

33

“Mary”

“John”name

name

age

CS848: Topics in Databases: Foundations of Query Optimization

Dialects of QL

(expressiveness)

(semantics)Conjunctive QL

withbag semantics†

Positive QL First order QLConjunctive QL

First order QLwith

bag semantics

Positive QLwith

bag semantics‡

†[Khizder et al., 1999], ‡[Lui et al., 2002]

CS848: Topics in Databases: Foundations of Query Optimization

Conjunctive QL

Q ::= D as A (quantification)

| A1 = A2.R (unnest)

| A1.Pf1 = A2.Pf2 (selection)

| elim A1, … , An Q (projection)

| true (null tuple)

| from Q1, Q2 (natural join)

| ( Q )

D ::= THING | C (basic description)

Pf ::= id | A.Pf (path function)

CS848: Topics in Databases: Foundations of Query Optimization

Well Formed Queries: (Q)

(D as A)´ {A}

(A1 = A2.R) ´ {A1, A2}

(A1.Pf1 = A2.Pf2) ´ {A1, A2}

(elim A1, … , An Q) ´ {A1, … , An}

(true) ´ ;

(from Q1, Q2) ´ (Q1) [ (Q2)

Require {A1, … , An} µ (Q) for projection operators.

CS848: Topics in Databases: Foundations of Query Optimization

Tuples and Bags

A (duplicate) tuple t with attribute bindings for attributes {A1, … , An} over a database I = h, (¢)Ii has the general form

hA1 : e1, … , An : en , cnt : ii,

where {e1, … , en} µ , “cnt” is a distinct attribute not used in queries, and i a positive integer.

A set of duplicate tuples that contain the same attribute bindings is called a bag.

CS848: Topics in Databases: Foundations of Query Optimization

Operations on Tuples

(t) ´ set of attributes occurring in t, excluding cnt.

t@cnt ´ integer i such that “cnt : i” occurs in t

t@A ´ element e 2 such that “A : e” occurs in t; defined only when A 2 (t)

t[{A1, … , An}] ´ {A1 : t@A1, … , An : t@An}; defined only when {A1, … , An} µ (t)

[t] ´ t[(t)]

CS848: Topics in Databases: Foundations of Query Optimization

SemanticsThe meaning of a query Q, denoted «Q¬, is a function that maps databases to bags. The behavior of this function on a particular database I = h, (¢)Ii is defined as follows.

«THING as A¬(I) ´ {hA : e, cnt : 1i : e 2 }

«C as A¬(I) ´ {hA : e, cnt : 1i : e 2 (C)I}

«A1 = A2.R¬(I) ´ {hA1 : e1, A2 : e2, cnt : 1i : (e2, e1) 2 RI}

«A1.Pf1 = A2.Pf2¬(I)´ {hA1 : e1, A2 : e2, cnt : 1i : (Pf1)I(e1) = (Pf2)I(e2)}

where

(id)I ´ {(e, e) : e 2 }

(A.Pf )I ´ {(e1, e2) : (Pf )I((A)I(e1)) = e2}

CS848: Topics in Databases: Foundations of Query Optimization

Semantics (cont’d)

«elim A1, … , An Q ¬(I) ´ ; , if not well formed; otherwise {hA1 : t@A1, … , An : t@An , cnt : 1i : t 2 «Q¬(I)}

«true¬(I) ´ {hcnt : 1i}

«from Q1, Q2¬(I) ´ {t : (t) = (Q1) [ (Q2) Æ 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) : t@cnt = t1@cnt £ t2@cnt Æ t[(t1)] = [t1] Æ t[(t2)] = [t2]}

CS848: Topics in Databases: Foundations of Query Optimization

Syntactic Sugar

A1. .An.id ´ A1. .An

select distinct A1, … , An Q ´ elim A1, … , An Q

select * Q ´ Q

Q1 where Q2 ´ from Q1, Q2

Q1 and Q2 ´ from Q1, Q2

from ´ true

from Q1, Q2, … , Qn ´ from (from Q1, Q2, …) Qn

CS848: Topics in Databases: Foundations of Query Optimization

Examples

The names of employees who have the same age as another employee with a given name.

select distinct :p, namefrom EMP as e, ( select distinct :p, e1

from EMP as e1, EMP as e2where e1.age = e2.age and e2.name = :p )

where e.name = nameand e.id = e1.id

CS848: Topics in Databases: Foundations of Query Optimization

Method Calls (more syntactic sugar)

A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.Pfn

select distinct A1, … , An

´ from C as A where A.1 = A1.Pf1 and … and A.n = An.Pfn

A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) as An

´ A1.Pf1.C(A2.Pf2, … , An-1.Pfn-1) = An.id

CS848: Topics in Databases: Foundations of Query Optimization

Examples (cont’d)

select distinct namefrom EMP as e, ( select distinct e1

from EMP as e1, EMP as e2where e2.age.+(e2.age) = e1.age )

where e.name = nameand e.id = e1.id

The names of employees who have an age double that of another employee.

CS848: Topics in Databases: Foundations of Query Optimization

Conjunctive Datalog (more syntactic sugar)

C(A1, … , An)

select distinct A1, … , An

´ from C as A where A.1 = A1.id and … and A.n = An.id

(A1, … , Am) :- Q1, … , Qn.

´ select distinct A1, … , Am from Q1, … , Qn

CS848: Topics in Databases: Foundations of Query Optimization

Positive QL

Q ::= empty A1, … , An (empty set)

| Q1 union Q2 (union)

(empty A1, …, An) ´ {A1, …, An}

(Q1 union all Q2) ´ (Q1)

Require (Q1) = (Q2) in union operations.

CS848: Topics in Databases: Foundations of Query Optimization

Semantics

«empty A1, … , An¬(I) ´ ;

«Q1 union Q2¬(I)

´ {t : t@cnt = 1 Æ (t) = (Q1) Æ (t) = (Q2) Æ (

( 9 t1 2 «Q1¬(I) : [t] = [t1] Æ :9 t2 2 «Q2¬(I) : [t] = [t2] ) Ç ( 9 t2 2 «Q2¬(I) : [t] = [t2] Æ :9 t1 2 «Q1¬(I) : [t] = [t1] ) Ç ( 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) : [t] = [t1] Æ [t] = [t2] ) )}

CS848: Topics in Databases: Foundations of Query Optimization

First Order QL

Q ::= Q1 minus Q2 (difference)

(Q1 minus Q2) ´ (Q1)

Require (Q1) = (Q2) in difference operations.

CS848: Topics in Databases: Foundations of Query Optimization

Semantics

«Q1 minus Q2¬(I)

´ {t : t@cnt = 1 Æ (t) = (Q1) Æ (t) = (Q2) Æ ( 9t1 2 «Q1¬(I) : [t] = [t1] ) Æ ( :9t2 2 «Q2¬(I) : [t] = [t2] )}

CS848: Topics in Databases: Foundations of Query Optimization

QL with Duplicates

Q ::= select A1, … , An Q (duplicate preserving projection)

| Q1 union all Q2 (bag union)

| Q1 minus all Q2 (bag difference)

CS848: Topics in Databases: Foundations of Query Optimization

Well Formed Queries (cont’d)

(select A1, … , An Q) ´ {A1, … , An}

(Q1 union all Q2 ) ´ (Q1)

(Q1 minus all Q2) ´ (Q1)

Require (Q1) = (Q2) in bag union and bag difference operations, and that {A1, … , An} µ (Q) in (duplicate preserving) projection operations.

CS848: Topics in Databases: Foundations of Query Optimization

Semantics

«select A1, … , An Q ¬(I)

´ ; , if not well formed and representable†; otherwise

{hA1 : t1@A1, … , An : t1@An , cnt : ni : t1 2

«Q¬(I)

Æ n = (t2@cnt) } t2 2 «Q¬(I) :

t2[{A1, … , An}] = t1 [{A1, … , An}]

†The selection operation is representable on database I iff, for everyt1 2 «Q¬(I),

|{t2 2 «Q¬(I) : t2[{A1, … , An}] = t1 [{A1, … , An}]}|

is finite.

CS848: Topics in Databases: Foundations of Query Optimization

Example

A duplicate preserving projection operation that is not representable in any database with an infinite domain.

select e1 from THING as e1, THING as e2

Observation: All well-formed duplicate preserving projection operations on databases with finite domains are representable.

CS848: Topics in Databases: Foundations of Query Optimization

Semantics (cont’d) «Q1 union all Q2¬(I)

´ ; , if not well formed; otherwise

{t 2 «Q1¬(I) : :9 t2 2 «Q2¬(I) : [t] = [t2]}

[ {t 2 «Q2¬(I) : :9 t1 2 «Q1¬(I) : [t] = [t1]}

[ {t : 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) : [t] = [t1]} Æ [t] =

[t2]

Æ t@cnt = t1@cnt + t2@cnt} «Q1 minus all Q2¬(I)

´ ; , if not well formed; otherwise

{t 2 «Q1¬(I) : :9 t2 2 «Q2¬(I) : [t] = [t2]}

[ {t : 9 t1 2 «Q1¬(I), t2 2 «Q2¬(I) : [t] = [t1]} Æ [t] = [t2]

Æ t@cnt = t1@cnt t2@cnt

Æ t1@cnt t2@cnt }

CS848: Topics in Databases: Foundations of Query Optimization

Summary

at, =, elim,true, from,

select

at, =, elim,true, from,

empty, union

at, =, elim,true, from,

empty, union,minus

at, =, elim,true, from

at, =, elim,true, from,

select,empty, union all,

minus all

at, =, elim,true, from,

select,empty, union all

(conjunctive)

(bag semantics)

(set semantics)

(positive) (first order)

CS848: Topics in Databases: Foundations of Query Optimization

Query Contexts

An expression Q[] in the language QL enriched by an additional terminal symbol [] is called a query context.

For a query Q1 2 QL, the expression Q1[Q2] denotes the syntactical substitution of Q2 for [].

Q2 is compatible with Q1 if Q1[Q2] 2 QL. For example, Q2 is compatible with Q1 in the following.

Q2: EMP as e where e.name = :p

Q1: select distinct :p, d from DEPT as d, [] where d = e.dept

CS848: Topics in Databases: Foundations of Query Optimization

The Query Equivalence Problem

Q1 is equivalent to Q2 for database I, written I ² (Q1 ´ Q2 ),if «Q1¬(I) = «Q2¬(I).

A query equivalence dependency E has the form (Q1 ´ Q2).E = (Q1 ´ Q2) is an axiom if, for any database I, I ² (Q1 ´ Q2 ).

A query equivalence problem for a given set of query equivalence dependencies is to determine if a given member of the set is an axiom.

CS848: Topics in Databases: Foundations of Query Optimization

Some Axioms

Question: Is it true that any E with the following form is an axiom?

(elim A1, … , Am Q1)[elim B1, … , Bn Q2] ´ elim A1, … , Am Q1 [Q2]

Answer: No. However, any such E is an axiom if any attribute in

(Q2) – {B1, … , Bn})

does not occur in query context

(elim A1, … , Am Q1 []).

CS848: Topics in Databases: Foundations of Query Optimization

Excluding variable reuse in QL

Q has an occurrence of variable reuse if there is a query context Q1[] and a query of the form

elim A1, … , An Q2

or of the form

select A1, … , An Q2

such that Q = Q1[Q2] and there exists A in ((Q2) – {A1, … , An}) thatalso occurs in Q1[].

Observation: For any Q1, there exists an equivalent class of query Q2 that has no occurrence of variable reuse.

CS848: Topics in Databases: Foundations of Query Optimization

The Query Containment Problem

Q1 is contained in Q2 for database I, written I ² (Q1 v Q2), if, for any tuple t1 in «Q1¬(I), there exists t2 in «Q2¬(I) such that [t1] = [t2] and t1@cnt t2@cnt.

A query containment dependency C has the form (Q1 v Q2). C = (Q1 v Q2) is an axiom if, for any database I, I ² (Q1 v Q2).

A query containment problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom.

CS848: Topics in Databases: Foundations of Query Optimization

Equivalence and Containment

Observation: Equivalence reduces to containment.

Q1 ´ Q2 iff Q1 v Q2 and Q2 v Q1

Observation: Containment reduces to equivalence in first order QL.

Q1 v Q2 iff (Q1 minus all Q2) ´ empty (Q1)

CS848: Topics in Databases: Foundations of Query Optimization

Some Complexity Results

Theorem: The query equivalence and containment problems for conjunctive QL is NP-complete.†

†Chandra, A. K. and P. M. Merlin. Optimal implementation of conjunctive queries in relational databases. Proc. Ninth Annual ACM Symposium on the Theory of Computing, pp. 77–90, 1977.

CS848: Topics in Databases: Foundations of Query Optimization

A Decision Procedure

Theorem: The following procedure decides if C = (Q1 v Q2) is anaxiom for conjunctive QL.†

1. Freeze the body of Q1 by creating a partial database consisting of individuals that include its variables.

2. If the tuple

hA1 : A1 , … , An : An , cnt : 1i occurs in «Q2¬(I), where (Q1) = {A1, … , An}, then return true;

otherwise return false.‡

†Derived from [Ullman, 1999].‡Use forced semantics for selection operations.

CS848: Topics in Databases: Foundations of Query Optimization

Obtaining a Partial Database from Q

A1.A2. .Am = B1.B2. .Bn

A1

A2 Am…

B1

B2 Bn…

THING as A A

C as A A : {C}

A1 = A2.R

A2

RA1

CS848: Topics in Databases: Foundations of Query Optimization

Deriving Partial Databases (cont’d)

w : L u : L1

A

v : L2

A

w : L u : L1

A

v : L2

A

u : L1 v : L3

A

x : L4

Aw : L2

u : L1 v : L3

A

x : L4

Aw : L2

CS848: Topics in Databases: Foundations of Query Optimization

Deriving Partial Databases (cont’d)

n1 : L1 n2 : L2 n1 : L1 [ L2 n2 : L1 [ L2

n1 : L1 n2 : L2 n3 : L3

n1 : L1 n2 : L2 n3 : L3

CS848: Topics in Databases: Foundations of Query Optimization

Evaluating Selections on Partial Databases

Note that selection conditions can navigate missing attribute values. In such cases, assume a forced semantics. In particular, two nodes n1 and n2 satisfy a selection condition iff the condition has the form

n1.Pf1.Pf = n2.Pf2.Pf

where (Pf1)I(n1) and (Pf2)I(n2) are defined and lead to nodes connected by an equality arc.

CS848: Topics in Databases: Foundations of Query Optimization

Some Complexity Results (cont’d)

Theorem: The query equivalence problem for conjunctive QL with bag semantics is NP-complete.

Observation: The complexity of the query containment problem for conjunctive QL with bag semantics remains open at this time.

Example:† In conjunctive QL with bag semantics, the query containment dependency Q1 v Q2 is an axiom, where Q1 and Q2 have the respective definitions

select x, z select x, z from P as x, R as z from P as x, R as z where x = u.Q and z = v.Q where y = u.Q and y = v.Q

† [Chaudhuri and Vardi, 1993]

CS848: Topics in Databases: Foundations of Query Optimization

The Query Membership Problem

A database schema, denoted T, consists of a finite set {C1, … , Cn} of query containment dependencies.

C is an axiom relative to database schema T = {C1, … , Cn}, writtenT ² C, if, for any database I, I ² C if I ² Ci for each i.

A query membership problem for a given set of query containment dependencies is to determine if a given member of the set is an axiom relative to a given database schema also consisting of members of the set.

CS848: Topics in Databases: Foundations of Query Optimization

More Complexity Results

Theorem: The query membership problem for conjunctive QL is undecidable.

Theorem: The query membership problem for first order QL is equivalent to the query containment problem for first order QL.

Proof: Assignment.

CS848: Topics in Databases: Foundations of Query Optimization

More on QL

Defining database schema

Expressing access plans

CS848: Topics in Databases: Foundations of Query Optimization

Modeling Generalization Taxonomies

Consider a simple object-oriented schema language consisting of sentencesof the following form.†

class C {A1 : ref C1, … , Am : ref Cm} [isa C1 , … , Cn];

Assignment: Encode a fixed collection of such sentences as a databaseschema in conjunctive QL. Your encoding should be as compact as possibleand should enable the following questions to be expressed as querycontainment dependencies over your schema.

1. Is C a defined class?

2. Is attribute A defined on class C?

3. Can an object reside in both class C1 and class C2?†Assume that any object in a database was created with respect to a singleclass.

CS848: Topics in Databases: Foundations of Query Optimization

Modeling Pipelined Query Access Plans

(syntax) (defn of (¢))

(parameter) Q ::= (PARAM as A) {A}(index scan) | (from C as A, A.1 = B1, … , A.n = Bn) {A}(nested loops) | (from Q1, Q2) (Q1) [ (Q2)(noop) | (select A1, … , An Q) (Q) Å {A1, … , An}(record field access) | (A1 = A2.B) {A1}(comparison) | (A1 = A2) ;(catenation) | (Q1 union all Q2) (Q1) Å (Q2)(cut) | (elim A1, … , An Q) ;

| …

Require:1. ((Q2) – (Q2)) µ (Q1) for nested loops, and2. (Q) = (Q) for top-level queries.

CS848: Topics in Databases: Foundations of Query Optimization

Alternative Semantics

Require richer models theories for

1. sort operations, and

2. named cuts.