Upload
others
View
9
Download
0
Embed Size (px)
Citation preview
Database Systems IFoundations of Databases
Summer term 2010
Melanie [email protected]
Database Systems Group, University of Tübingen
1
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Chapter 1Introduction
1. Introduction2. ER-Modeling3. Relational model(ing)4. Relational algebra5. SQL6. Programming
• After completing this chapter, you should be able to:
‣ Explain basic notions: database state, schema, query, update, data model, DDL, DML,
‣ Explain the role of the DBMS,
‣ Explain data independence, declarativity, and the three-schema architecture,
‣ Name different classes of users of a database application system,
‣ Name some DBMS tools.
2
Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen
Chapter 1Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
3
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Task of a DatabaseQuery Answering
• What is a database? Difficult question. There is no precise and generally accepted definition.
• Naïve approach: The main task of a database system (DBS) is to answer certain questions (queries) about a subset of the real world, e.g.,
Questioning a DBS
Which homeworks has Ann Smith completed?
12Database System
4
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Task of a DatabaseData Storage
• The DBS acts only as a storage for information. The information must first be entered and kept current.
• In a sense, a DBS is a computerized version of a card-index box / filing cabinet (but more powerful and efficient).
Keeping a DBS current
Ann Smith has done Homework 3 and
received 10 points for it.OKDatabase System
5
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Task of a DatabaseEfficient Data Retrieval and Combination
• Normal database systems do not perform particularly complicated computations on the stored data in order to answer questions.
• However, a DBS can retrieve the requested data quickly from a huge set of data (giga bytes, tera bytes, ≫ main memory).
• A DBS can also aggregate / combine several pieces of stored data to answer more complex questions (“Compute the average points for Homework 3”).
6
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Task of a DatabaseSQL
• Above, the question “Which homework has Ann Smith completed?” was shown in natural (English) language.
• Making machines understand natural language is a tough task (and bears a large potential for misunderstanding).
• Therefore, questions (or queries) are normally written in a formal language, these days typically SQL.
SQL
• SQL: Structured Query Language, developement started in 1986, current version SQL:2003.
• Pronounced S-Q-L or Sequel.
7
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
State, Query, Update
• The set of stored data is called the database state.
• Entering, modifying, or deleting information (i.e., making updates) changes the database state.
State & query
Update
currentstate
SQL queryAnswer
SELECT Homework FROM SolvedWHERE Student = ‘Ann Smith’
currentstate
Update
INSERT INTO SolvedVALUES (‘Ann Smith’, 3, 10)
newstate
8
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Structured Information
• Each database can store only information of a predeclared structure (a limited domain of disclosure):
• Because the data are structured, not simply text, complex query formulations are possible, e.g., “How many homeworks has each student done?”.
Update
Today’s special in the cafeteria is pizza. ErrorHomework DBS
9
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Structured Information
• Actually, a database system stores only plain data (character, strings, numbers), and not information.
• Data becomes information by interpretation.
• Therefore, real-world concepts like students, homework, cafeterias, etc., need to be defined/declared before the database can be used.
A pure text database?
Which types of questions could we pose on a DBS storing text (character strings) only with no further structure provided?
10
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
State vs. Schema
Database schema
• Formal definition of the structure of the database contents.
• Determines the possible database states.
• Defined only once (when DB is created).
• In a programming language, this corresponds to variable declarations (assigning a type to a variable)
Variable declaration
Example of a variable declaration in C: short int i;Possible states of variable i: -32768 ! i ! 32767
11
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
State vs. Schema
Database state (instance of the schema)
• Contains the actual data, structured according to the schema.
• Changes often (whenever database information is updated).
• Corresponds to current contents/values of a programming language variable.
Variable state change
In state s, variable i has value 41. Now, perform state change (s to s’) via assignment i = i + 1.
12
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
State vs. Schema
•In the relational model, the data is structured in form of tables (relations).
•Each table has a name, a sequence of named columns (attributes), and a set of rows (tuples).
A table
Student Homework Points
Ann Smith 1 10
Ann Smith 2 8
Michael Jones 1 9
Michael Jones 2 9
SOLVEDDB schema
DB state
13
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data Model
•Defines a formal language (syntax and semantics) for
‣Declaring database schema
‣Querying the current database state
‣Changing the database state
‣ “Data model” is, regrettably, widely used for “database schema”
•Examples
• (Network model, hierarchical model,) relational model, entity relationship model, object-oriented models, UML, XML.
14
Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen
Chapter 1Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
15
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DBMS
•A Database Management System (DBMS) is an application–independent software system that implements a data model, i.e., allows for
• Definition of a DB schema for some concrete application,
• Storage of an instance of this schema on, e.g., a disk,
• Querying the current instance (database state),
• Changing the database state.
16
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DBMS
• Normal users do not need to use SQL for their daily tasks of data entry or data lookup.
• These users use application programs that have been developed specifically for this task and offer a more accessible user interface.
• Internally, these application programs translate the user requests into SQL statements (queries, updates) in order to communicate with the DBMS.
17
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DBMS
• Often, several different application programs are used to access the same centralized database.
• For example, the Homework DBS might provide:
‣ A read–only web interface for students.
‣ A program used by the TA (Hiwi) to load homework and exam points.
‣ A program that prints a report for the professor used to assign grades.
• The interactive SQL interface (SQL console) that comes with the DBMS is simply yet another way to access the DBMS.
18
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DBMS
Accessing a DBMS
User 1 User 2
Application Program
DBMS Tool(e.g., SQL console)
Database Management System (DBMS)
DBschema
DBstate
19
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DB Application Systems
• Often, different users access the same database concurrently (i.e., at the same time, touching the same data).
• The DBMS is usually implemented as a background server process (or set of such processes) that is accessed over the network by application programs (clients).
• One can also view the DBMS as an extension of the operating system (a more powerful file system).
20
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DB Application Systems
Client-Server Architecture
ClientUser 1
(Application)
ClientUser 2
(SQL console)
...Network
Server(DBMS)
21
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DB Application Systems
Three-Tier Architecture
Server(DBMS)
Application Server(App. Web Server)
Thin client(Browser)
22
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
DB Application SystemsA Recap of Database Vocabulary
• A database (DB) consists of a DB schema and a DB state.
• A database management system (DBMS) is a software system that implements a data model (e.g., a Relational DBMS (RDBMS) implements the relational model).
• A database system (DBS) consists of a DBMS and a database.
• A database application system consists of a DBS and a set of application programs.
23
Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen
Chapter 1Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
24
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Persistent Storage
When the output is a function of the input only, no persistent storage is necessary.
Computing the factorial (n n!)
Today
Tomorrow
5 120factorial
5 120factorial
25
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Persistent Storage
But the output can also be a function of the input and and a persistent state.
Determining how many points Ann obtained for all her homework.
Today
Tomorrow
Ann 20Homework points
30Homework pointsAnn
26
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Persistent Storage
A DBS provides persistent state
Ann 30Homework points
Input Output
Persistent state
Persistent information
• Information that lives longer than a single process (program execution).
• Survives power outage and a reboot of the operating system.
27
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Persistent Storage
Which of the following processes/devices need persistent storage? If so, for which particular task?
1.Web browser
2.Pocket calculator
3.Mobile phone
4.Screen saver
5.DVD recorder
28
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Typed Persistent Data Classical Way to Implement Persistence
•Information needed in subsequent program invocations is saved into a file.
•The operating system (OS) maintains the file on disk.
•Disks provide persistent memory: the contents is not lost if the machine is switched off or the OS is rebooted.
•File systems are predecessors of modern DBMS.
29
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Typed Persistent DataImplementing Persistence with Files
• OS files are usually nothing but sequences of bytes.
• A record structure must be defined on top of this (much like in Assembler languages):
• The record and file structure is contained only in the programmers’ heads.
• The OS file system cannot prevent misinterpretation, overflows, etc., because it is not aware of the file structure.
Typed Persistent Data (1)
37
• Classical way to implement persistence:
� Information needed in subsequent program invocations issaved into a file.
� The operating system (OS) maintains the file on disk.
� Disks provide persistent memory: the contents is not lost ifthe machine is switched off or the OS is rebooted.
OS files and persistence
The above statement is basically true but care should
be taken nevertheless. Why?
� File systems are predecessors of modern DBMS.
Typed Persistent Data (2)
38
• Implementing persistence with files:
� OS files are usually nothing but sequences of bytes.
� A record structure must be defined on top of this (muchlike in Assembler languages):
0 40 42 44
A n n S m i t h . . . 0 3 1 0
� The record and file structure is contained only in theprogrammers’ heads.
�� The OS file system cannot prevent misinterpretation,overflows, etc., because it is not aware of the file structure
30
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Typed Persistent DataImplementing Persistence with a DBMS
• The structure of the information to be stored must be defined in a way the DBMS understands:
• The file structure is formally documented.
• The system can detect type errors in application programs.
• Simplified programming (higher abstraction level).
SQL DDL command
CREATE TABLE SOLVED (STUDENT VARCHAR(40), HOMEWORK NUMERIC(2), POINTS! NUMERIC(2)
)
31
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
A Subprogram Library
• Most DBMSs use OS files to store the data. (Some use raw disk device access.)
• One can view a DBMS as a subprogram library that can be used for file access.
• Compared with the direct OS system calls for file access, the DBMS offers higher level operations.
• The DBMS offers a wide varietry of algorithms that one would otherwise have to program.
32
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
A Subprogram Library
•For instance, a typical Relational DBMS contains routines for
‣ sorting (e.g., external merge sort),
‣ searching (e.g., B-trees),
‣ file space management, buffer management,
‣ aggregation, statistical evaluation.
•The algorithms are optimized for large data sets (that do not fit into main memory).
•The DBMS also offers multi-user support (locking) and safety measures to protect data against system crashes.
33
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data Independence
•The DBMS is a layer of software above the OS files. The files can be accessed only via the DBMS.
•The DBMS may change the file structure internally (reorder records, splits files, etc.) for performance reasons.
•This goes unnoticed by the application program.
•Compare with the idea of abstract data types: The implementation changes, the interface is kept stable.
34
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data IndependenceTypical Example
•At the beginning, a professor used the homeworks DB only for his courses in the current term.
•Since the DB was small and there were relatively few accesses, it was sufficient to store the data as a heap file.
•Later, the entire university used the DB, and information of previous courses had to be kept for some time.
•DB size grows significantly, DB access much more frequently.
•An index file (e.g., a B-tree) is now needed to provide fast access.
35
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data IndependenceWithout DBMS
•Using the new B-tree index to access the file must be explicitly built into the lookup (query) commands.
•Thus, application programs need to be changed if the mode of file access is changed.
•If one forgets to change a seldolmly used application program, and this program does not update the index when the data has been updated, the DB becomes inconsistent.
36
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data IndependenceWith Relational DBMS
•Already at the interface, the system completely hides the(non-)existence of indexes on files.
•Queries and updates do not have to and cannot refer to indexes.
•The system automatically
1.Modifies the index in case of data updates,
2.Uses the index to evaluate queries against the indexed data when advantageous.
37
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data Independence
Conceptual Schema (“interface”):
•Only logical information content of the database, relevant to the subset of the real world modelled in the DB.
•Simplified view of the DB: physical storage details hidden.
Internal/Physical Schema (“implementation”):
•Indexes,
•Division of tables among disks,
•Storage management if tables grow or shrink,
•Placement of new rows in a table (sort order, clustering).
38
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data Independence
•The user enters a query (e.g., in SQL) that only refers to the conceptual schema.
•The DBMS translates this into a query/program (execution plan) which refers to the the internal schema. This is done by the the query optimizer.
•The DBMS executes the translated query on the persistent instance of the internal schema.
•The DBMS translates the result back to the conceptual level.
39
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Data Independence
Changing the internal schema
Conceptual schema
Old internal schema (no B-tree index)
New internal schema (with B-tree index)
Same conceptual schema
40
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Declarative Languages
•Physical data independence requires that the query language (SQL) cannot refer to indexes.
•Declarative query languages go one step further:
‣ Queries should only describe what information is sought,
‣ but should not prescribe any particular method how to compute/retrieve the desired information.
Kowalski
Algorithm = Logic + Control
Imperative/procedural languages: explicit control, implicit logicDeclarative/descriptive languages: implicit control, explicit logic
41
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Declarative Languages
•SQL is a declarative language. The user describes conditions the requested data is required to fulfill:
•Often, simpler formulations of the same query are possible, with SQL users do not have to think about efficient execution.
•More concise than imperative programming: less expensive program development and maintenance.
SQL Query
SELECT S.POINTSFROM SOLVED SWHERE S.STUDENT = ‘ANN SMITH’ AND S.HOMEWORK = 3
42
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Declarative Languages
•Declarative query languages
‣allow powerful optimizers (no evaluation method is prescribed)
‣need powerful optimizers (na "ıve evaluation is almost always too inefficient).
•Independence of current hardware technology and software quality: Today’s queries will use tomorrow’s DBMS setup and algorithms when a new version of the DBMS is released.
43
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Logical Data Independence
•Logical data independence allows for changes to the logical information content of the database.
•Such changes are obviously restricted to additions to the logical information content.
•Example: add column SUBMISSION_DATE to table SOLVED.
•Such additions may be required for new applications.
•It should not be necessary to change old applications only because records now contain additional information.
44
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Logical Data Independence
•Logical data independence is important when there are application programs with distinct, but overlapping information needs.
•Logical data independence also helps to integrate previously distinct databases.
•In earlier times, every department of a company had its own DB/data files.
•Now, businesses generally aim at one central DB.
45
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Logical Data IndependenceRedundancy
•If a company uses more than one DB, the information in these databases will normally overlap, i.e., some pieces of information will be stored several times.
•Data is called redundant if it can be derived from other data and knowledge internal to the application.
•Problems with redundancy:
•Duplicates data entry and update efforts.
•Sooner or later, data copies will get out-of-sync and thus inconsistent.
•Wastes storage space, also on backup media.
46
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Logical Data IndependenceExternal Schemas/Views
•Logical data independence requires a third level of database schemas, the external schemas or views.
•Each user (department, . . . ) may have an individual view of the data.
•An external view contains a subset of the information in the database, maybe slightly restructured. Views may also be vital because of security reasons.
•In contrast, the conceptual schema describes the complete information content of the database.
47
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Three-Schema Architecture
Three-schema architecture [ANSI/Sparc 1978]
User 1 User n
External schema 1 External schema n
Conceptual schema
Internal schema
...
Storeddata
48
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
More DBMS FunctionsTransactions
•Transactions are sequences of DB commands (queries and updates) are executed as an atomic unit (“all or nothing”).
•DBMS may crash during/after a sequence of commands is/has been executed. The DBMS then performs undo /redo.
•Support for backup and recovery.
•Support of concurrent users.
•Each user is given the illusion to be the only DB user at any time. DBMS performs locking and conflict detection.
49
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
More DBMS FunctionsSecurity & Integrity
Security
•Access rights: Who may perform which operations on which table?
•Auditing: DBMS remembers who did what/when.
Integrity
•The DBMS checks that the entered data is plausible/complete (such checks may also span several tables).
•DBMS rejects updates (insertions and deletions) which would violate defined business rules.
50
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
More DBMS FunctionsData Dictionary
Metadata (“data about data”, schema, user list, access rights) is availble in system tables, for instance:
System tables
Table_Name Owner
SOLVED Melanie
SYS_TABLES SYS
SYS_COLUMNS SYS
SYS_TABLESTable_Name Seq Col_Name
SOLVED 1 Student
SOLVED 2 Homework
SOLVED 3 Points
SYS_TABLES 1 Table_Name
SYS_TABLES 2 Owner
SYS_COLUMNS 1 Table_Name
SYS_COLUMNS 2 Seq
SYS_COLUMNS 3 Col_Name
SYS_COLUMNS
51
Foundations of Databases | Summer term 2010 Melanie Herschel | University of Tübingen
Chapter 1Introduction
• Basic Database Notions
• Database Management Systems (DBMS)
• Programmer’s View, Data Independence
• Database Users and Database Tools
52
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Database UsersDatabase Administrator (DBA)
•Should know about all schemas, may change the conceptual and the internal schema (creates tables, creates/drops indexes). Can damage everything.
•Gives access rights to users. Ensures security.
•Monitors system performance (Transaction throughput #TX/s, #concurrent users, index sizes, ...)
•Monitors available disk space and installs new disks.
•Ensures that backup copies of the data are made. Does recovery after disk failures, etc.
53
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Database UsersApplication Programmer
•Writes programs for standard, all-day tasks, to be used by the naïve users (see below):
•safe data entry,
•report generation,
•data browsing.
•Knows SQL well, plus programming languages and development tools.
•Usually supervised by DBA.
•Might do conceptual schema design (knows which table the application will need to access/create).
54
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Database UsersEnd Users
Sophisticated User (one kind of “end user”)
•Knows SQL and/or some query tools, may use SQL console.
•Does non-standard aggregations/evaluations of the data without help from application programmers.
•May generate complex queries.
Naïve User (the other kind of “end user”)
•Uses DB only via application programs, often unaware of existence of DBMS back-end.
•Primarily data entry user, simple browsing-style queries against external views.
55
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Database Tools
•Interactive SQL console
•Graphical/menu-based query tools
•Interface for DB access from standard programing lanugages (C, C++, Java)
•Tools for form-based DB application (4GL)
•Report generators
•Web interface
•Tools for data import/export, backup & recovery, performance monitoring, . . .
56
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Summary
•Functions of database systems
‣Persistence
‣ Integration/redundancy avoidance
‣Physical and logical data independence
‣Subprogram library: many algorithms built-in, especially tuned for external memory access (disks)
‣Query and update evaluation
57
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Summary
•Functions of database systems (continued)
‣High data safety and availability (Backup & Recovery)
‣Combinations of operations into atomic transactions
‣Multi-user support: synchronization of concurrent accesses
‣ Integrity enforcement
‣View management
‣Security via data access control
‣System catalog management (metadata)
58
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Summary
•The main goal of the DBMS is to give the user a simplified view on the persistent storage, i.e., to hide any complications introduced by the DBMS physical layer.
•The user does not worry about
‣physical storage details
‣different information needs of other users
‣efficient query formulation
‣possibility of system crashes/disk failures
‣presence of concurrent users accessing identical data subsets.
59
Foundations of Databases | Summer term 2010 | Melanie Herschel | University of Tübingen
Excercise
Data-intensive programming
•Suppose homework points data is stored in a line-structured text file with the format
Student Name:Homework Number:Points
e.g.,
Ann Smith:3:10
•Suppose you have to write a C program that prints the total number of points per student (students sorted alphabetically).
•How would you judge the programming effort (in terms of lines and time)?
• In SQL, this takes 4 lines and approx. 1 1/2 minutes.
60