40
Introduction to Relational Databases La Serena School for Data Science: Applied Tools for Data-driven Sciences August 2019 Mauro San Martín [email protected] Universidad de La Serena

Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

  • Upload
    others

  • View
    17

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

Introduction to Relational Databases

La Serena School for Data Science:Applied Tools for Data-driven Sciences

August 2019

Mauro San Martí[email protected]

Universidad de La Serena

Page 2: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

Contents

• Introduction

• Part I. Relational Databases Concepts

• Part II. Relational Databases Hands-On

• Part III. NOSQL

Page 3: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

Introduction

Page 4: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

Collections of filesDatabases

SQL

Scientific Data Management(collect, organize, store, transform, update and query)

Custom made programs and Tools

Data sources

NoSQL

Scientific Data Management

4Scientific Data Management

Page 5: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

A key choice…

5Scientific Data Management

Collection of Files Database Systems

- Single user- Static data- Dataset fits in main memory

- Centralized storage (and updates)

- Complete and consistent answers

- Distributed storage (and updates)

- Partial answers and eventual consistency

SQL

NoSQL

Page 6: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

Why Databases?

Because is not trivial to satisfy our information-needs under our current computing and storage models and resources.

6Why Databases?

Page 7: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

But…

How an information-need is fulfilled?

7Information Needs

Page 8: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

Two steps- Locate.

We need at least a notion of where each piece of data/information item should be.

- Combine and Select.We must combine several pieces of information and choose among these items.

This might be easy and fast (if we have a system) or VERY time consuming (if not).

8Bag analogy

Page 9: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

But…

How an information-need is stated?

9Information Needs

Page 10: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

When formulating an information need. What would you prefer?

- Elaborate a detailed retrieval plan in terms of the organization of the storagee.g. file locations and formats

or- Express it in terms of the information entities

from the problem domain e.g. conditions data elements must satisfy to be part of the answer

10Bag analogy

Page 11: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

A relevant technical remark

Not all memory is made equalThe memory in a computer system memory is organized as a hierarchy (this constrains our storage and retrieval models)

11Two Technical Remarks

CPU Lvl 1

Lvl 2

Lvl n…

CacheMain

MemorySecondary

Memory …

non-persistent persistent

SpeedCost (size)

Fast and Expensive Slow and Cheap

Page 12: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction

DatabasesRequirements

- To query and keep updated a shared and meaningful collection of data,

- which is too big to fit in main memory and requires persistence.

Definitions- Database: An organized and self-describing collection of

data, with an intended meaning, and maintained with a purpose.

- Database Management System (DBMS): Software system designed and implemented to define, maintain

12Definitions

Page 13: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2019 Introduction to RDBs Introduction 13Definitions

There are several types of DBMSEach one addressing different use cases: types of data and information needs. - Relational / SQL- Graph- NOSQL and NewSQL (column stores, key-value

stores, hstore, etc.)

Interesting example: - Qserv (LSST)

Page 14: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

I Relational Databases

Concepts(an extremely brief introduction)

Page 15: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts RDBs at a Glance

RDBs at a glance

• E. F. Codd 1970"A Relational Model of Data for Large Shared Data Banks"

• Main characteristics- One simple data structure: relation (table)- Solid mathematical foundations- Several comprehensive implementations available

(PostgreSQL, MySQL, Oracle, SQL Server, etc.)

• Industry standard since the 80’s

15

Page 16: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts

Update Query

Modeling Data

Relational Data Model

The relational data model- data structure

relations/tables: collections of tuples

- operations (update + query)Structured Query Language (SQL), based on Relational Algebra and Calculus

- integrity constraintsData type, not null, referential integrity

16

Page 17: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts

Information Need

SQL query

Query result (Table)

RDBMS maintains data secureand consistent

User issues a queryand receives an answer.Focus is on app logic, not data management

What is an RDBMS?

What is an RDBMS?A Relational DataBase Management System is the software that implements a Relational Database

17

Stored Database(Instance)

Schema(Metadata) Runtime Query

Evaluator Concurrency controlDB Backup

DB Recovery

QueryOptimizer

QueryCompiler

SQL Query

Page 18: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts RDBMS Comfort Zone

RDBMS Comfort Zone

18

RelationalDatabases(SQL)

An RDBMS performs better when …

• Data is complete, homogeneous and well defined.

• All data is together (in the same database).

• Answers must be complete and fully consistent.

• Vertical scaling is possible.

Page 19: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts RDBMS Objects

RDBMS Objects

• Tables Represent data: collection of records Record: set of attributes (columns) that represents a fact in the real world.

• Views: named queries

• Indices: improve search and access time

• Functions: extend query language

19

ObjectIDID1 3.4 aID2 4.0 bID2 2.1 c

A B

Page 20: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts SQL

Building a DB

• Design a SchemaTables (columns, types, and keys), integrity constraints, and other objects. Avoid data duplication, null values, and update anomalies.

- SQL as Data Definition Languagecreate table myTable(number int, letter char)drop table myTable

• Load data into the DB: - Bulk loading from SQL dumps, csv files, etc.- Insert individual records (SQL)

20

Page 21: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts Querying the Database

Querying the DBMap data from DB to the information needed

21

Query Evaluation

Database Intermediate Result

Data Collection

Sort Group

Aggregate

Result

Cost:reading

data

Cost:storing

temp data

Cost:sortingdata

Page 22: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts

SQL: Querying the DB

• Basic Query StructureSELECT: definition of the output table FROM: identification of source tables WHERE: optional condition (filter or join)

• Additional blocks GROUP BY: group defining criteria HAVING: optional condition on aggregate values ORDER BY: sorting criteria for the result

Querying the DB 22

Query

Page 23: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts Querying the DB

Query Evaluation

Note that query results are also tables ⇒ query composition

23

Query

Database Intermediate Result

Data Collection

Sort Group

Aggregate

Result

SelectFrom Where

Page 24: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts Query Complexity

Query Complexity (cost)

• Data Volume- I/O based cost model

number of reads from and writes to persistent storage

• Query Complexity- table size: n, number of tables: k- projections, and selections (search): O(1) to

O(log n) to O(n)- joins: O(n) to O(nk)- group, and aggregates (sort): O(n log n)

24

Page 25: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts Updates

Updates

• Update: add and modify data.- Warning: Updates may render the database

inconsistent

• Transactions and ACID- Atomicity- Consistency- Isolation- Durability

25

Operation 1

Operation 2

Operation 3

...

Operation n

Transaction

Page 26: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts SQL

SQL: Updating the DB

• SQL as Data Manipulation Language- Inserting new records in tables

insert into myTable values(1, ‘a’)

- Updating data in existing recordsupdate myTable set letter = ‘b’ where number = 1

- Removing records from tablesdelete from myTable where number = 1

26

Update

Page 27: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs I. Relational Databases - Concepts Query Complexity

Update Complexity (cost)

• Data Volume- I/O based cost model

number of reads from and writes to persistent storage

• Update Complexity- table size: n- search: O(1) to O(log n) to O(n)- integrity constraints must be checked, referential

integrity constraints may propagate the task across the database

27

Page 28: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

II Relational Databases

Practice

Page 29: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part II: Relational Databases - Practice Executing Queries

Executing Queries

• Parametric

• SQL- System console- Applications and web interfaces

• From code- Parametric from programmer’s perspective- Languages + libraries

29

Page 30: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part II: Relational Databases - Practice Query Examples

Sloan Web Interface• Example Database

- Source: Sloan Digital Sky Survey, DR15 (dozens of tables and views, millions of records),see their SQL tutorial: http://skyserver.sdss.org/dr12/en/help/howto/search/searchhowtohome.aspx

- Interactive web interface:Small answers, exploratory purposes.http://skyserver.sdss.org/dr15/en/tools/search/sql.aspx

- CasJobs: Web interface for batch jobshttp://skyserver.sdss.org/casjobs/

30

Page 31: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part II: Relational Databases - Practice Query Examples

Query Practice• Practice Database

- Data collected in the last hours from two devices:BME680: temperature, humidity, barometric pressure and air

quality.TSL2561: luminosity sensor (broadband, infrared, and illuminance

- lux).

- Example Schema:Two tables (one per device) and several thousands of records.bme680(time, temperature, voc, humidity, pressure,

altitude)tsl2561(time, broadband, infrared, lux)

- You can follow the examples in the notebook provided (Update server IP address!!)

31

Page 32: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part II: Relational Databases - Practice SQL

Building a DB (1/2)

• Design the Schema- Tables: columns, types and primary keys- Good design: avoid data duplication and NULLs- Basic design improving strategy: divide

offending tables (new groups of columns)

• Implement the schemacreate table myTable(number int primary key,letter char)

32

Page 33: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part II: Relational Databases - Practice SQL

Building a DB (2/2)

• Insert and remove a record from a table:insert into myTable values(1, ‘a’)delete from myTable where number = 1

• Bulk load data into the schema- A sequence of insertions is slow, specially if

integrity constraints are present.- Prefer bulk loading functions like copy

33

Page 34: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

Part III. NoSQL

Not only SQL

Page 35: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part III: NoSQL Beyond RDBMS

Beyond RDBMSMaybe a RDBMS is not a good match to my problem ...

• RDBMS limitations- Cost of ACID- Horizontal scaling

• Relaxing DBMS requirements- NoSQL

• Direct Access to Data

35

Page 36: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part III: NoSQL

NoSQL

NoSQL Comfort Zone

NoSQL Comfort Zone

• Data is massive, heterogeneous, and distributed.

• Partial and eventually consistent answers are acceptable.

• Data must be always available.

• Horizontal scaling is preferred (or vertical scaling is not practical).

36

Page 37: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part III: NoSQL NoSQL Databases

NoSQL Databases

• Aggregate Key: identify each aggregate Data: heterogeneous collections of attributes as name/value pairs.

• Main Types • Key-Value Stores

fast to retrieve data with unknown structure

• Document Databases(mostly) tree structured data

• Column-Family Stores

37

Page 38: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part III: NoSQL CAP Theorem

CAP Theorem

• Consistency

• Availability

• Partition Tolerance

Choose two!

38

Page 39: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2019 Introduction to RDBs Part III: NoSQL Query Evaluation

Query Evaluation

• Map-Reduce Parallel (cluster) data-processing pattern.

• Two steps • Map

Input is an aggregate, output is a bunch of key-value pairs.

Each map is independent (across aggregates in all the cluster).

• ReduceMap results are collected, sorted and combined.

39

Page 40: Introduction to Relational Databases - AURA-O · SQL NoSQL. LSSDS2019 Introduction to RDBs Introduction Why Databases? Because is not trivial to satisfy our information-needs under

LSSDS2013: Applied Tools for AstronomyLSSDS2018: Introduction to RDBs Summary

Summary

• RDBMS- Tables: collections of records with keys, and

integrity constraints.- SQL Queries: basic, join, groups and aggregates.

• An RDBMS is usually better than a collection of files.

• An RDBMS is not always the best solution ¿Management in main memory? ¿NoSQL?

40