33
1 CS511 Advanced Database Management Systems Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture slides

CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

Embed Size (px)

Citation preview

Page 1: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

1CS511 Advanced Database Management Systems

Lecture 4:

DBMS ArchitectureSept. 6 2006

ChengXiang Zhai

Most slides are adapted from Kevin Chang’s lecture slides

Page 2: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

2CS511 Advanced Database Management Systems

DBMS Mission Statement

• Simply: maintenance and computation of data

• But how to do it?

Data Operations Results

Page 3: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

3CS511 Advanced Database Management Systems

DBMS Architecture

Query Executor

Buffer Manager

Storage Manager

Storage

Transaction Manager

Logging & Recovery

Lock Manager

Buffers Lock Tables

Main Memory

User/Web Forms/Applications/DBAquery transaction

Query Optimizer

Query Rewriter

Query Parser

Files & Access Methods

Page 4: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

4CS511 Advanced Database Management Systems

A Design Dilemma

• To what extent should we reuse OS services?

• Reuse as much as we can

– Performance problem (inefficient)

– Lack of control (incorrect crash recovery)

• Replicating some OS functions (“mini OS”)

– Have its own buffer pool

– Directly manage record structures with files

– …

Page 5: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

5CS511 Advanced Database Management Systems

OS vs. DBMS

• Conjecture: Perhaps pretty close

• Proof:

– There exists someone who can write popular textbooks in both OS and DBMS!

[Operating | Database] System Concepts

– Jim Gray is from OS background!

Page 6: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

6CS511 Advanced Database Management Systems

OS vs. DBMS Similarities??

• What do they manage?

• What do they provide?

Page 7: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

7CS511 Advanced Database Management Systems

OS vs. DBMS: Similarities

• Purpose of an OS:

– managing hardware

– presenting interface abstraction to applications

• DBMS is in some sense an OS?

– DBMS manages data

– presenting interface abstraction to applications

• Both as API for application development!

Page 8: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

8CS511 Advanced Database Management Systems

Applications built upon DBMS• ERP: Enterprise Resource Planning

– SAP, Baan, PeopleSoft, Oracle, IBM,...

• CRM: Customer Relationship Management– E.phiphany, Siebel, Vantive, Oracle, IBM, ...

• SCM: Supply Chain Management – Trilogy, i2, Oracle, IBM, ...

• A lot more in the Info Tech era: – e-business software

– scientific data

– multimedia

– data analysis and decision support

Page 9: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

9CS511 Advanced Database Management Systems

OS vs. DBMS: Related Concepts

• Process Management What DB concepts? – process synchronization

– deadlock handling

• Storage management What DB concepts?– virtual memory

– file system

• Protection and security What DB concepts?– authentication

– access control

Page 10: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

10CS511 Advanced Database Management Systems

OS vs. DBMS: Differences??

Page 11: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

11CS511 Advanced Database Management Systems

OS vs. DBMS: Differences

• DBMS: Top-down to encapsulate high-level semantics!

– Data

• data with particular logical structures

– Queries

• query language with well defined operations

– Transactions

• transactions with ACID properties

• OS: Bottom-up to present low-level hardware

Page 12: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

12CS511 Advanced Database Management Systems

DBMS on top of OS: Relations vs. File system

• Data object abstraction– file: array of characters

– relation: set of tuples

• Physical contiguity: – large DB files want clustering of blocks

– extent: larger granularity allocation unit• sol1: managing raw disks by DBMS

• sol2: simulate by managing free spaces in DBMS

• Multiple trees (access methods)– file access: directory hierarchy (user access method)

– block access: inodes

– tuple access: DBMS indexes

Page 13: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

13CS511 Advanced Database Management Systems

DBMS on top of OS: BM vs. VM

• Query-aware replacement needed for performance

– not always LRU

• Examples?

– how about sort-merge join??

– how about nested-loop join??

Page 14: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

14CS511 Advanced Database Management Systems

DBMS on top of OS: BM vs. VM

• System-controlled replacement needed for correctness

– not always LRU

• Examples?

Page 15: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

15CS511 Advanced Database Management Systems

Not Really OS Problems: Deferred Update Semantics

• Update emp.sal = 0.8*emp.sal if emp.sal > mgr.sal

empname sal manager

Smith 10k Brown

Jones 9k

Brown 11k Jones

– what are the possible semantics?

• INGRES solution: deferred updates

– buffer updates in intentions list for actual updates (also serve as redo log)

– an example of “needing buffer knowledge in DBMS”, so perhaps not sensible to do BM totally in OS

Page 16: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

16CS511 Advanced Database Management Systems

As the data model and application context change, so does the DBMS architecture…

Page 17: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

17CS511 Advanced Database Management Systems

Post-Relational DB Projects

• Motivation:– RDBMS not powerful enough for non-administrative data-

intensive applications such as: CAD/CAM, GIS…

• Buzz terms: object-oriented, extensible

• Sample projects– Postgres: U.C. Berkeley

– Starburst: IBM Almaden – “highly extensible”

• after System R (relational), R* (distributed)

• ultimately finding its way into IBM DB2 UDB

– Exodus: U. Wisconsin

• not a complete DB; an OO-style storage manager toolkit

• followed by Shore at Wisconsin, Predator at Cornell

Page 18: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

18CS511 Advanced Database Management Systems

POSTGRES: Post INGRES

Stonebraker, U.C. Berkeley

• 1977-1985: INGRES

– among the first relational DB implementation

Ingres Inc. --> .. acquired by Computer Associates

• 1986-1994: POSTGRES

– among the first object-relational DB implementation

Illustra acauqired by Informix

– PostgreSQL (the SQL version)

Page 19: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

19CS511 Advanced Database Management Systems

RDBMS: the Relational Root

• Data model: (Codd, 1970’s)

– a database is a set of relations

– relation of n attributes: a set of n-tuples

– n-tuple: (v1, …, vn), where vi is in domain Si

Page 20: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

20CS511 Advanced Database Management Systems

Relational Model: Normal Forms

• Basic: 1NF (First Normal Form)

– implicitly required in the relation model

– definition:

• only simple domains of atomic elements (Codd)

• simple domains represent the base (built-in) types

– ? why?

• “Stronger” normal forms:

– 4NF, Boyce-Codd Normal Form, 3NF, 2NF, …

– ? why?

Page 21: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

21CS511 Advanced Database Management Systems

Normalizing Relations: Example• Unnormalized relation of book “objects”:

• Normalized relations: by decomposition

• ?? Problems of the relational model?

Books: title authors dategreat future {smith, jones}4/01/01career {jones} 7/12/00

Books: title day month yeargreat future 4 1 01career 7 12 00

Books: title authorsgreat future smithgreat future jonescareer jones

Page 22: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

22CS511 Advanced Database Management Systems

Relational Model Problems“A relational DB is like a garage that forces you to take your car apart and

store the pieces in little drawers.” (some researcher)

• “Object” notion lost by decomposition– non-intuitive: object is decomposed into several relations– inefficient: a lot of online assembling by joins

• Base types are too restrictive– integers and strings are very primitive

– data “types” are typically application specific

• Relational algebra is the only allowed operation– simple, declarative, but also restrictive

– application = host language + embedded SQL

• ?? How to remedy these problems?

Page 23: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

23CS511 Advanced Database Management Systems

Quest for a Richer Model?

• Object-oriented data model

• Extensible ADTs

• Programming-language constructs

Page 24: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

24CS511 Advanced Database Management Systems

ORDBMS vs. OODBMS

• Question: How important is the relation?

• ORDBMS:

– RDBMS + OO features #

– query-based

• OODBMS:

– OO PL + database features (persistent objects)

– programming-based

• Meeting in the middle

Page 25: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

25CS511 Advanced Database Management Systems

Stonebraker’s Matrix

• Prediction: ORDBMS will dominate

– evidence: big DB players are all on this side

Simple Data Complex Data

Query RDBMS ORDBMS

No Query File System OODBMS

Page 26: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

26CS511 Advanced Database Management Systems

Object Orientation Concepts

• Classes:– classes as types

– encapsulation: interface + implementation

– inheritance: building class hierarchies

• Objects:– complex objects:

• built from constructors, e.g., set-of, array, nested objs

– object identity (OID):• system generated as unique object reference

• enables (efficient) object linking and navigation

Page 27: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

27CS511 Advanced Database Management Systems

POSTGRES Data Model

POSTGRES data model:

• OO constructs

– classes as relations

• object (class instance) = tuple

• object-id = tuple-id

• method = attribute or function of attributes

– inheritance (multiple parents)

• ADT constructs:

– types

– functions

Page 28: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

28CS511 Advanced Database Management Systems

POSTGRES Functions• Arbitrary C functions

– e.g.: overpaid(Employee)

– arbitrary semantics-- not optimized

– no fancy access methods-- typically sequential scan

• Binary operators– “hints” to provide semantics

– extensible access methods• extensible B+tree or user-defined index

• PostQuel procedures– parameterized queries as functions

– e.g.: sal-lookup(name):

retrieve Emp.salary where Emp.name = name

Page 29: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

29CS511 Advanced Database Management Systems

POSTGRES Storage SystemWe were guided by a missionary zeal to do something different…

• No-overwrite system

• Logging:– old values are not overwritten-- no value logging necessary

– log only needs to keep transaction state (commit/abort/going)

– ?? crash recovery-- how?

• Vacuum-cleaner daemon to archive historical data

• Advantages:– recovery is cheap

– time travel is easy

Page 30: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

30CS511 Advanced Database Management Systems

Storage System: Problems• Problems

– flushing differential data (why?) by commit time can be costly• unless “stable” main memory

• more costly than sequentially writing out logs – why ??

– reads have to stitch together current picture

• And, yes, there are lots details unexplored or unexplained

Page 31: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

31CS511 Advanced Database Management Systems

Questing for the Right Models

Speaking about knowledge representation– The simple relational model is by far the only successful KR paradigm.

When the relational model came along, the network guys resisted and their companies went under. …

When the OO model came along, the relational guys absorb its best, and their companies prospered again!

-- Jeffery Ullman

Page 32: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

32CS511 Advanced Database Management Systems

What You Should Know

• What are some major limitations of services provided by an OS in supporting a DBMS?

• In response to such limitations, what does a DBMS do?

• As the data model and task environment change, the architecture will also need to change

Page 33: CS511 Advanced Database Management Systems 1 Lecture 4: DBMS Architecture Sept. 6 2006 ChengXiang Zhai Most slides are adapted from Kevin Chang’s lecture

33CS511 Advanced Database Management Systems

Carry Away Messages• One usually doesn’t fit all!

– An OS is designed to serve all kinds of applications, so it’s not optimal for supporting a DBMS

– Other examples: a search engine is designed to serve all kinds of people, so it’s not optimal for a particular person (personalized search)

• When a problem is recognized, there are often opportunities for breakthroughs in multiple areas

– DBMS could take over OS functions

– OS could provide more opportunities for customization

• From “day 1”, high efficiency has been the primary challenge/concern in designing and implementing a DBMS; reliability may be the second major concern

– In contrast, “accuracy of answers” is at least as important as efficiency for a Web search engine

– In the future, accuracy of answers will likely become more important