19
Dependency Problems for Future Database Systems: Ingres Project D Adrian Hudnott Warwick Postgraduate Colloquium in Computer Science 2008

Dependency Problems for Future Database Systems: Ingres Project D Adrian Hudnott Warwick Postgraduate Colloquium in Computer Science 2008

  • View
    215

  • Download
    2

Embed Size (px)

Citation preview

Dependency Problems for

Future Database Systems:Ingres Project D

Adrian Hudnott

Warwick Postgraduate Colloquium in Computer

Science 2008

Research Problems• The Third Manifesto is the subtitle of a book written by Chris Date and

Hugh Darwen that defines what a relational database system should and should not do

• The Third Manifesto and related publications do not include any implementation techniques

• There is no scalable implementation of any part of The Manifesto currently in existence

• There are concerns that some of the demands of The Manifesto cannot be met using extensions of techniques in use by SQL DBMS

• When these were catalogued all of the difficult ones had a common theme: constraints

• All the problems involve the DBMS efficiently deducing a fact using known properties about the data source

• Little or no access to the actual data

Research Problems• Constraint Enforcement

• Semantic Query Optimization

• Multiple Simultaneous Assignment

• Updates to Views

• Interface with SQL Databases

• Determination of Type (compile- and run-time)

Database Representation• Use Datalog (like Prolog)

• E.g.supplier(s1, smith, 20, london).supplier(s2, jones, 10, paris). supplier(s3, blake, 30, paris). supplier(s4, clarke, 20, london).supplier(s5, adams, 30, athens).

Views

• Expressed using Datalog rules

• E.g.supplier_city(C) :-

supplier(SN, N, ST, C).

Notation• Extensional Database: EDB

• Intensional Database: IDB

• Statistics: Stat

• Constraints (invariant): Inv

• Assignments: Ri := U

i

• Queries: Qi(t)

• Boolean expressions: Bi(t)

• Cost function: c(exp)

Constraint Enforcement

RUInv

EDBStatRUIDBInv

/

/

Or, alternatively:

RU¬Inv

EDBStatRUIDBInv

/

/

(Prove Safety)

(Prove Failure)

N.B. Minimize access to the database (EDB)

Updates in the Presence of Views

• Example insertion:

supplier(s6, carter, 30, oxford).• R is “supplier”• U is <previous definition>

OR <tuple is the one above>• supplier_city(C) :- supplier(SN, N, ST, C); (SN=s6, N=carter, ST=30, C=oxford).

RUIDB

/

Materialized Views• Materialized View: contents of a view cached

to avoid re-calculation.

• Can be modelled using constraintsCREATE TABLE supplier (City PRIMARY KEY

REFERENCES supplier_city,...

CREATE TABLE supplier_city (City PRIMARY KEY REFERENCES supplier);

• And vice versa: A constraint is a view that is always empty.

Multiple Assignment: Naive

Implementation

• Example: X := Y, Y := X;

• Internal representation:

atomic {

Z := X;

X := Y;

Y := Z;

}

Multiple Assignment

tUIDBEDBStat

tRUURUIDBEDBStat

'

/'/

)'

/'/

(

tUIDBEDBStat

tRUURUIDBEDBStat

Inv

IIF U’ is the first assignment:

.... R’:= U’, R1 := U1, …, Rn := Un;

Multiple Assignment Idea1) Semantically optimize RHS expressions

2) Place RHS expressions into canonical form and find

subexpressions equal to other RHS expressions

3) Build a dependency graph

4) Depth first search to find cycles

5) Break cycles by creating simulated copies

6) Schedule and execute assignments

Transaction Repair

• Transaction repair is modifying an

update request so that it conforms with

the constraints

• Compensating actions

– E.g. ON DELETE CASCADE

• Research on more advanced repairs

– Generates more than one repair

View Update =

Transaction Repair + Disambiguation

DELETE FROM supplier_city WHERE City = ‘London’;

Transaction repair needed:

DELETE FROM supplier WHERE city=‘London’;

For more complex cases, disambiguation is needed

– Default values, rules, etc.

Semantic Optimization

tQtQ

StatIDBInv

21

No access to the data

and:

21 QC<QC

(Equivalence)

(Lower Cost)

• NULLs are flawed for many reasons

– Simple example: B OR NOT B is NULL if B is NULL

• Can consider missing data as free variables

• Problem is to efficiently determine if a formula is true

for all possible values of the free variable

Missing Information

])/[()(:)dom( and

)()()(

22

21

axtBtBax

tBtBtQEDBStatIDBInv

Current Progress• Catalogued all known issues requiring investigation• Outlined solutions or literature for the lesser problems

– E.g. comparison of large values

• Devised formal semantics for type checking Tutorial D programs

• Elaborated on the problems requiring original research in the context of dependency

• Resolved perceived ambiguities in The Manifesto• Detailed ideas for future development into a solution

to the multiple assignment problem• Completed a literature review on constraint checking

techniques

Project D RoadmapFurther literature review & code familiarization Summer 2008

Address multiple assignment problem Oct 2008 – Jan 2009

Development Phase 1 Feb – Aug 2009

Development Phase 2 Aug 2009 – Mar 2010

Thesis write-up Apr – Sep 2010

First complete build Mid – Late 2011

Product released Early 2012

Summing up Project D

• Essentially about relieving the industry of the burdens that have been imposed by SQL implementations in the past 30 years

• We think that the relational model can be properly implemented without SQL’s restrictions and ad hoc additions

– … but requires innovative research

• Project D is still in very early development

• However, some small successes already