28
CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

Embed Size (px)

Citation preview

Page 1: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1

Database Systems II

Introduction

Page 2: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 2

Database Systems I Recap

A Database Management System (DBMS) is a software package designed to store, manage and retrieve databases.A Database System (DBS) consists of two components:

the DBMSthe database.

Page 3: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 3

Database Systems I Recap

Why use a DBS?- Logical data independence.- Physical data independence.- Efficient access.- Reduced application development time.- Data integrity and security.- Concurrent access / concurrency control.- Recovery from crashes.

Page 4: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 4

Database Systems I Recap

A data model is a collection of concepts for describing data (a formal language!).

A schema is a description of a particular collection of data (database), using the given data model.

The relational data model is the most widely used model today.

Main concept: relation, basically a table with rows and columns.

Every relation has a schema, which describes the columns, or fields.

Page 5: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 5

Database Systems I Recap

The conceptual schema defines the logical structure of the whole database.

An external schema (view) describes how some user sees the data (restricted access, derived data).

The physical schema describes the storage and index structures of the database.

Physical Schema

Conceptual Schema

View 1 View 2 View 3

Page 6: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 6

Database Systems I Recap

Relational database: a set of relations

Relation: made up of 2 parts:Instance : a table, with rows and columns.

#Rows = cardinality, #attributes = degree / arity.

Schema : specifies name of relation, plus name and type of each attribute.

e.g. Students(sid: string, name: string, login: string, age:

integer, gpa: real).

Can think of a relation as a set of rows or tuples (i.e., all rows are distinct).

Page 7: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 7

Database Systems I Recap

Relational algebra: mathematical query language which forms the basis for “real” languages (e.g. SQL), and for implementation.

Five basic operations:

union, set-difference, selection, projection,cartesian product.

Shortcuts for common operations:join, division.

Page 8: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 8

Database Systems I Recap

SQL: the standard practical query language for relational databases.

Schema modifications: create, alter, delete table.

Instance modifications: insert, delete, update tuples of a table.

Queries to retrieve a specified set of tuples (what).

Queries are descriptive, which allows the DBS to find the most efficient way how to process a query.

Page 9: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 9

Database Systems I Recap

relation-list A list of relation names (possibly with a range-variable after each name).

target-list A list of attributes of relations in relation-list.

qualification Comparisons (“Attr op const” or “Attr1 op Attr2”, where op is one of ) combined using AND, OR and NOT.

SELECT [DISTINCT] target-listFROM relation-listWHERE qualification

Page 10: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 10

Database Systems I Recap

Semantics of an SQL query defined in terms of the following conceptual evaluation strategy.

Compute the cross-product of relation-list.

Selection of the tuples satisfying qualifications.

Projection onto the attributes that are in target-list.

If DISTINCT is specified, eliminate duplicate rows.

A query optimizer will find more efficient strategies to compute the same answers.

Page 11: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 11

A Simple DBS Implementation

Relations SQL Statements

Results

A B C D E A D

A D

Page 12: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 12

A Simple DBS Implementation

Relations stored in files (ASCII)e.g., relation R is in /usr/db/R.txt

Schema file (ASCII) in /usr/db/schema.txt

Smith # 123 # CSJones # 522 # EE

.

.

R1 # A # INT # B # STR …R2 # C # STR # A # INT …

Page 13: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 13

A Simple DBS Implementation

Sample query

& select * from R #

Relation R A B C SMITH 123 CS

&

Page 14: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 14

A Simple DBS Implementation

Sample session

Query result sent to printer

& select * from R | LPR #&

Page 15: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 15

A Simple DBS Implementation

Creating a new relation T

& select * from R where R.A < 100 | T #&

Page 16: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 16

A Simple DBS Implementation

Processing single table queriesTo process “select * from R where condition”:(1) Read dictionary to get R attributes(2) Read R file.

For each line:(a) Check condition(b) If OK, display

Page 17: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 17

A Simple DBS Implementation

Processing single table queries creating a new tableTo process

“select * from R where condition | T”:

(1) Process select as before(2) Write results to new file T(3) Append new line to dictionary

Page 18: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 18

A Simple DBS Implementation

Processing multi-table queriesTo process “select A,B from R,S where condition”:(1) Read dictionary to get R,S attributes(2) Read R file, for each line:

(a) Read S file, for each line: (i) Create join tuple A,B from R,S (ii) Check condition (iii) Display if OK

Page 19: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 19

What’s wrong with this Implementation?

Tuple layout on diske.g.,- Change string from ‘Cat’ to ‘Cats’ and we

have to rewrite the entire file - ASCII storage is expensive

wastes a factor of ~256/10 of space for integers

- Deletions are expensive

Page 20: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 20

What’s wrong with this Implementation?

Search very expensivee.g.,- Cannot find tuple with given key quickly

- Always have to read full relation

Page 21: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 21

What’s wrong with this Implementation?

Inefficient query processinge.g.,

select *from R,Swhere R.A = S.A and S.B > 1000

Simple implementation has quadratic runtime complexity- Do selection first?- More efficient join?

Page 22: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 22

What’s wrong with this Implementation?

No buffer managerIn particular, need caching

No concurrency controlNo concept of transactions

Need to enforce ACID properties

No APINo interaction with other DBS

Page 23: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 23

DBS Architecture

Buffer Manager

Query Parser User

User Transaction Transaction Manager

Strategy Selector

Recovery ManagerConcurrency Control

File Manager LogLock Table M.M. Buffer

Statistical Data Indexes

User Data System Data

Page 24: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 24

Outline Database Systems IISecondary storage management

disks, records and files, . . .Index structures

B-trees, hash tables, multi-dimensional

indexes Query execution

one-pass algorithms, two-pass algorithms,

index-based algorithmsQuery compiler

parsing and preprocessing, query optimization, cost estimation

Page 25: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 25

Outline Database Systems IICrash recovery

disk failures, stable storage, logging,…Concurrency Control

correctness, locks, scheduling, …Transaction Processing

logs, deadlocks, serializability,… Data Mining

knowledge discovery in databases, association rules

Page 26: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 26

Marking SchemeAssignments 40%

paper and pencil,no programming

Midterm exam 15%covering all material up to and including

query optimization Final exam 45%

covering all the materialNo alternative marking scheme

Page 27: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 27

Tentative ScheduleOctober 21

other instructor or class canceledOctober 28

midterm exam December 2

last classDecember 16

final exam

Page 28: CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 1 Database Systems II Introduction

CMPT 454, Simon Fraser University, Fall 2009, Martin Ester 28

ReferencesTextbook- Database Systems: The Complete Book, Garcia-Molina, Ullman, and Widom, Prentice Hall, 2008: 2nd edition

- relevant sections listed in schedule on class website, study these sections in advance!Recommended book

Database Management Systems, Ramakrishnan and Gehrke, McGraw Hill, 2003: 3rd edition Lecture slides

- based on slides by Hector Garcia-Molina

and Martin Theobald, - posted on the class website.