32
1 Lec1 - Introduction CSCI-4380 Database Systems

Lec1 - Introduction

Embed Size (px)

DESCRIPTION

Lec1 - Introduction. CSCI-4380 Database Systems. Content of CSCI4380. Course Wiki Page. Why Database Systems?. Massive amounts of data being collected in different business and scientific disciplines Banking, Airlines, Human Resources, Finance, Health - PowerPoint PPT Presentation

Citation preview

Page 1: Lec1 - Introduction

1

Lec1 - Introduction

CSCI-4380 Database Systems

Page 2: Lec1 - Introduction

2

Content of CSCI4380

Course Wiki Page

Page 3: Lec1 - Introduction

Why Database Systems? Massive amounts of data being collected in different

business and scientific disciplines Banking, Airlines, Human Resources, Finance, Health WWW, Web Search, Online Stores (Amazon, eBay, etc) Biology, Chemistry, Materials science, Astronomy, Ecology,

Geology, Economics, and many more Provide for a systematic way to address the challenges

across/at the intersection of the diverse fields Enable Data-science & informatics: web science

and bio-, chem-, eco-, geo-, astro-, materials- informatics

Page 4: Lec1 - Introduction

Economics & Finance

Page 5: Lec1 - Introduction

World Wide Web

Page 6: Lec1 - Introduction

Bioinformatics Datasets:

Genomes Protein

structure DNA/Protein

arrays Interaction

Networks Pathways Metagenomics

Integrative Science Systems Biology Network Biology

Page 7: Lec1 - Introduction

Astro-Informatics: US National Virtual

Observatory (NVO) New Astronomy Local vs.

Distant Universe

Rare/exotic objects

Census of active galactic nuclei

Search extra-solar planets

Turn anyone into an astronomer

Page 8: Lec1 - Introduction

Ecological Informatics

Analyze complex ecological data from a highly-distributed set of field stations, laboratories, research sites, and individual researchers

Page 9: Lec1 - Introduction

Geo-Informatics

Page 10: Lec1 - Introduction

Cheminformatics

N

N

Cl

O

AAACCTCATAGGAAGCATACCAGGAATTACATCA…

Structural Descriptors

Physiochemical Descriptors

Topological Descriptors

Geometrical Descriptors

Page 11: Lec1 - Introduction

Materials Informatics

Page 12: Lec1 - Introduction

Massive and distributed datasets: tera-/peta-scale

Various modalities: Tables Images Video Audio Text, hyper-text, “semantic” text Networks Spreadsheets Multi-lingual

Database Challenges

Page 13: Lec1 - Introduction

Bigger Picture: New Science of Data

New data models: dynamic, streaming, etc.

New mining, learning, and statistical algorithms that offer timely and reliable inference and information extraction: online, approximate

Self-aware, intelligent continuous data monitoring and management

Data and model compression Data provenance Data security and privacy Data sensation: visual, aural, tactile Knowledge validation: domain experts

Page 14: Lec1 - Introduction

DATABASE MANAGEMENT SYSTEMS (DBMS)

14

Page 15: Lec1 - Introduction

File Processing

Data are stored in files with interface between programs and files.

Various access methods exist (e.g., sequential, indexed, random)

One file corresponds to one or several programs.

PROGRAM 1

DataManagement FILE 1

FILE 2 Red

unda

nt D

ata

PROGRAM 2

DataManagement

PROGRAM 3

DataManagement

FileSystemServices

Page 16: Lec1 - Introduction

Problems With File Systems

Data are highly redundant sharing limited and at the file level

Data is unstructured “flat” files

High maintenance costs data dependence; accessing data is difficult ensuring data consistency and controlling

access to data Sharing granularity is very coarse Difficulties in developing new applications

Page 17: Lec1 - Introduction

USER 1

PROGRAM 1

PROGRAM 2

IntegratedDatabase

DBMS

Query Processor

Transaction Mgr

Database Approach

Page 18: Lec1 - Introduction

What is a Database?

A database (DB) is a structured collection of data about the entities that exist in the environment that is being modeled.

The structure of the database is determined by the abstract data model that is used.

A database management system (DBMS) is the generalized tool that facilitates the management of and access to the database.

Page 19: Lec1 - Introduction

Typical DBMS ArchitectureNaïve users Application

programmersCasual users Database

administrator

FormsApplicationFront ends DML Interface CLI DDL

Indexes SystemCatalog

Datafiles

DDLCompiler

Disk Space Manager

Buffer Manager

File & Access Methods

Query Evaluation Engine

SQL Commands

Rec

ove

ry M

anag

er

Transaction&

LockManager

DBMS

Page 20: Lec1 - Introduction

Data Model Formalism that defines what the

structure of the data are within a file between files

File systems can at best specify data organization within one file

Alternatives for business/scientific data Hierarchical; Network Relational; Object-oriented Semi-structured

This structure is usually called the schema A schema can have many instances

Page 21: Lec1 - Introduction

Conceptual Data Model

Conceptual models describe what data is to be stored and how.

What are the rules that define integrity?

What are the common processes?

What actions should be taken if integrity is compromised?

Who is allowed to access what information?

Page 22: Lec1 - Introduction

Physical Data Model

Physical models describe how data is stored on disk and what access methods are available to it

Which data type is assigned to which attribute ?

What indices are available

Data independence is the separation of the physical implementation from the conceptual view of the database.

Data independence is enhanced by:

Query optimization

Transaction processing

Page 23: Lec1 - Introduction

Invisibility (transparency) of the details of conceptual organization, storage structure and access strategy to the users Logical

• transparency of the conceptual organization• transparency of logical access strategy

Physical• transparency of the physical storage

organization• transparency of physical access paths

Data Independence

Page 24: Lec1 - Introduction

DBMS Languages

Data Definition Language (DDL) Defines conceptual schema, external schema, and

internal schema, as well as mappings between them Language used for each level may be different The definitions and the generated information is

stored in system catalog

Data Manipulation Language (DML) Can be

• embedded query language in a host language• “stand-alone” query language

Can be• Procedural: specify where and how (navigational)• Declarative: specify what

Page 25: Lec1 - Introduction

Database Functionality Integrated schema

Users have uniform view of data They see things only as relations (tables)

in the relational model Declarative integrity and consistency

enforcement 24000 Salary 250000 No employee can have a salary greater

than his/her manager. User specifies and system enforces.

Individualized views Restrictions to certain relations Reorganization of relations for certain

users

Page 26: Lec1 - Introduction

Database Functionality (cont’d)

Declarative access (relational model) Query language - SQL

• Find the names of all electrical engineers.SELECT ENAMEFROM EMPWHERE TITLE = “Elect. Eng.”

• Find the names of all employees who have worked on a project as a manager for more than 12 months.SELECT EMP.ENAMEFROM EMP,WORKSWHERE RESP = “Manager”AND DUR > 12AND EMP.ENO = WORKS.ENO

System determined execution Query processor & optimizer

Page 27: Lec1 - Introduction

Database Functionality (cont’d)

Transactions Execute user requests as atomic units May contain one query or multiple queries Provide

• Concurrency transparency– Multiple users may access the database, but they each see the

database as their own personal data– Concurrency control

• Failure transparency– Even when system failure occurs, database consistency is not

violated– Logging and recovery

Page 28: Lec1 - Introduction

Database Functionality (cont’d)

ACID Transaction Properties Atomicity

• All-or-nothing property Consistency

• Each transaction is correct and does not violate database consistency

Isolation• Concurrent transactions do not interfere with

each other Durability

• Once the transaction completes its work (commits), its effects are guaranteed to be reflected in the database regardless of what may occur

Page 29: Lec1 - Introduction

Types of DBMS

Relational – data model based on tables

Network – data model based on graphs with records as nodes and relationships between records as edges

Hierarchical – data model based on trees

Semi-structured – XML based on trees/graphs

Object-Oriented – data model based on the object-oriented programming paradigm

Object-Relational – model objects via tables

Distributed – composed of several independent DBMSs running at the nodes of a communications network

Page 30: Lec1 - Introduction

Brief History of Databases 1960s:

Early 1960s: Charles Bachmann developed first DBMS at Honeywell (IDS)• Network model where data relationships are

represented as a graph. Late 1960s: First commercially successful

DBMS developed at IBM (IMS)• Hierarchical model where data relationships are

represented as a tree• Still in use today (SABRE reservations;

Travelocity) Late 1960s: Conference On DAta Systems

Languages (CODASYL) model defined. This is the network model, but more standardized.

Page 31: Lec1 - Introduction

Brief History of Databases 1970s:

1970: Ted Codd defined the relational data model at IBM San Jose Laboratory (now IBM Almaden)

Two major projects start (both were operational in late 1970s)• INGRES at University of California,

Berkeley– Became commercial INGRES, followed-up by

POSTGRES which was incorporated into Informix

• System R at IBM San Jose Laboratory– Became DB2

1976: Peter Chen defined the Entity-Relationship (ER) model

Page 32: Lec1 - Introduction

Brief History of Databases 1980s

Maturation of relational database technology SQL standardization (mid-to-late 1980s)

through ISO The real growth period

1990s Continued expansion of relational technology

and improvement of performance Distribution becomes a reality New data models: object-oriented, deductive Late 1990s: incorporation of object-orientation

in relational DBMSs Object-Relational DBMS New application areas: Data warehousing and

OLAP, Web and Internet, interest in text and multimedia