5
Database Management Systems Prof. Weining Zhang Dept. of Computer Science University of Texas at San Antonio W. Zhang Introduction 2 Overview We study the internal of DBMSs Principles of relational DBMSs Emphasize on query & transaction processing techniques Advanced database systems & applications OODBMS, XML database, data warehousing, OLAP, data mining Course work includes Homework, 2 midterm exams, no final exam Programming assignments in Java W. Zhang Introduction 3 Teaching Staff Instructor: Prof. Weining Zhang Office: SB 4.01.19 Phone: 458-5557 Email: [email protected] Office hour: MW 5:00 – 6:00 pm T 4:00-5:00 pm and by appointment W. Zhang Introduction 4 Communication Web page: http://www.cs.utsa.edu/~wzhang/cs5443/home Contains everything about the course: syllabus, announcement, assignments, project, lecture notes, etc. You should check course web pages regularly. Mailing list: [email protected] Include your CS email address; you may need to forward emails to your regular email address W. Zhang Introduction 5 Textbooks Required textbook: Database Management systems, 3 rd ed., by Ramakrishnan & Gehrke Recommended textbook: Principles of Distributed Database Systems, by M. Ozsu & P. Valduriez Database System: the Complete Book, by Garcia- Molina, Ullman & Widom Database system concepts, 5 th ed., by Silberschatz, Korth & Sudarshan W. Zhang Introduction 6 Other Textbooks Fundamentals of Database Systems, 5 th ed., by Elmasri & Navathe Other database books in the Main Library

01 intro

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 01 intro

1

Database Management Systems

Prof. Weining ZhangDept. of Computer Science

University of Texas at San Antonio

W. Zhang Introduction 2

OverviewWe study the internal of DBMSs

Principles of relational DBMSs• Emphasize on query & transaction processing

techniquesAdvanced database systems & applications

• OODBMS, XML database, data warehousing, OLAP, data mining

Course work includes Homework, 2 midterm exams, no final examProgramming assignments in Java

W. Zhang Introduction 3

Teaching Staff

Instructor: Prof. Weining ZhangOffice: SB 4.01.19Phone: 458-5557Email: [email protected] hour: MW 5:00 – 6:00 pm

T 4:00-5:00 pmand by appointment

W. Zhang Introduction 4

CommunicationWeb page: http://www.cs.utsa.edu/~wzhang/cs5443/home

Contains everything about the course: syllabus, announcement, assignments, project, lecture notes, etc.

You should check course web pages regularly.Mailing list: [email protected]

Include your CS email address; you may need to forward emails to your regular email address

W. Zhang Introduction 5

Textbooks

Required textbook:Database Management systems, 3rd ed., by Ramakrishnan & Gehrke

Recommended textbook:Principles of Distributed Database Systems, by M. Ozsu & P. ValduriezDatabase System: the Complete Book, by Garcia-Molina, Ullman & WidomDatabase system concepts, 5th ed., by Silberschatz, Korth & Sudarshan

W. Zhang Introduction 6

Other Textbooks

Fundamentals of Database Systems, 5th ed., by Elmasri & NavatheOther database books in the Main Library

Page 2: 01 intro

2

W. Zhang Introduction 7

PrerequisiteCS3743 or equivalent, or extensive experience with database & DB applicationStrong Java programming skills Data structures, algorithms, OO programming, etc.Mathematics including logic, sets, algebra, …

W. Zhang Introduction 8

Grading

Programming assignments 20%Homework 20%Midterm I 25%Midterm II 25%Intangibles 10%

W. Zhang Introduction 9

Programming Assignments

Implement several components of a simple DBMS called Minibase (Java version), such as,

Buffer ManagerHeap FileHash-based IndexRelational operatorsQuery processing

Work in groups of 2Programming in Java, on Linux or Windows, recommend using Eclipse IDE

W. Zhang Introduction 10

Introduction to Database SystemsA database system consists of

Database management system: the softwareDatabases: the data

A DBMS needs to providepersistent data storagedeclarative query language for efficient data retrieval shared access to data by different applicationsdata securitydata integrity …

W. Zhang Introduction 11

An RDBMS Architecture

Web forms Application front end SQL interface

SQL Commands

Sys. catalogIndex filesData files

Parser Optimizer

Plan Executor Operator Evaluator

QueryEvaluationEngine

File & Access Methods

Buffer Manager & Disk Manager

RecoveryManager

Xction Man

Lock Man

ConcurrencyControl

DBMS

W. Zhang Introduction 12

Storage ManagementData is stored on disks, and processed in the main memorySince disk I/Os are costly, search structures, such as, indexes, must be used to achieve efficient data accessDBMS components that manage different types of storage include

Disk Manager: manages pages on disk driveBuffer Manager: manages pages in main memory buffer

Page 3: 01 intro

3

W. Zhang Introduction 13

File OrganizationData records are logically organized in files and physically stored on disk pagesFile organization must consider the format and size of data recordsIn addition to simple files of raw data, DBMS also maintains search structures, such as,

OrderingHashingIndexing

to reduce access costs

W. Zhang Introduction 14

Query ProcessingDBMS evaluates declarative queries by executing an optimal query plan that is expressed using relational algebraic operations. A DBMS must evaluate algebraic operations efficiently.The algorithms and the costs of relational algebraic operations, such as, selection and join, depend critically on

types of query conditionspecifics of file organizations

W. Zhang Introduction 15

Query Optimization

For easy of use, query languages are declarative. The system must figure out an efficient evaluation planThe goal is to answer a query with as few disk I/O as possibleThe system uses statistics of the data & heuristics to decide how to process the query

W. Zhang Introduction 16

Transaction ProcessingA transaction models the execution of a database application, which typically updates the data in databases. Transaction management must deal with concurrent transactions and possible system failures.

W. Zhang Introduction 17

RecoveryThe recovery manager protects data integrity in case of system crash. The system guarantees that either all operations of a transaction or none of them are performed, and updates made by completed transactions are persistent.

W. Zhang Introduction 18

Concurrency Control Concurrent execution of application programs is essential for good DBMS performance.

Need to keep CPU busy while performing I/O operations (frequent & relatively slow).

Interleaving actions of different user programs can lead to inconsistency: e.g., check is cleared while account balance is being computed.Concurrency control subsystem ensures such problems don’t arise: users can pretend they are using a single-user system.

Page 4: 01 intro

4

W. Zhang Introduction 19

Advanced Hashing & IndexingRelational DBMS support hashing & B+ tree indexingNew DBMSs & DB applications need more sophisticated search structures

Hashing with variable size hash table or multiple keysIndexes for spatial, multidimensional data (common in multimedia DBSs, Data warehousing, OLAP, …)

W. Zhang Introduction 20

Distributed DBMSModern corporations have data, control, & application distributed globallyMultiple databases at geographically dispersed locations need to cooperate to answer queries with distributed dataConcurrent transaction processing and recovery are still major issues

W. Zhang Introduction 21

Parallel DBMSBoth centralized and distributed databases may use multiple processors to evaluate queriesParallel system architecture requires new algorithms for query evaluation and optimizationPerformance concerns include

Ability to scale upAbility to speed up

W. Zhang Introduction 22

XML & Semistructured DBMSData in RDB, OODB, & ORDB are structured (with rigid schemas)Data on the Web (and other applications) are semistructured

HTML, XML, Text, …

Need new concepts and techniquesData model, query languageQuery processing & optimizationStorage managementUpdate, transaction processing, CC, …

W. Zhang Introduction 23

Data Warehousing & OLAPCorporations need to put all available data into use when making vital business decisionsNeed to have technology to integrate data from all sources, and keep them up to dateNeed advanced tools to analyze, summarize, and view data in various ways Issues:

Data cube modelOLAP operationsQuery processing, indexing, views, …

W. Zhang Introduction 24

Data MiningData contains important patterns useful for making sound business decisionsDatabases need tools to discover knowledge embedded in data

AssociationsClustersClassifications

Useful for business trend analysis, fraud detection, diagnosis, market prediction, …

Page 5: 01 intro

5

W. Zhang Introduction 25

TopicsRelational algebra and calculusStorage & File Management

Disk manager, buffer manager, Indexing, hashing

Query Evaluation & OptimizationAccess methods, selection, joins, etc.Query optimization methods

Transaction ProcessingCrash RecoveryConcurrency Control

W. Zhang Introduction 26

Topics (cont.)Distributed Database Systems

Database designQuery processing & optimizationConcurrency control & recovery

Parallel Database systemsXML databasesData Warehousing and OLAP Data Mining, …