24
October 1, 2017 Sam Siewert CS317 File and Database Systems Lecture 6 – DBMS Development Lifecycle http://dilbert.com/strips/comic/1998-03-23/

CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

  • Upload
    others

  • View
    32

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

October 1, 2017 Sam Siewert

CS317 File and Database Systems

Lecture 6 – DBMS Development Lifecycle

http://dilbert.com/strips/comic/1998-03-23/

Page 2: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

DBMS Analysis and Design

DBMS Development Lifecycle

Sam Siewert

2

Page 3: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

For Discussion… Software Engineering vs. DBMS Analysis and Design

1. Modern SWE Lifecycles Include Feedback – E.g. DevOps,

Spiral, Extreme Programming in Agile, and even Waterfall with Feedback

2. Software Engineering Initiated by 1968 NATO Conference on the Software Crisis [Paper on Blackboard]

3. No Mention of Databases, Data Processing More So – Focus on Programming and Programming Languages [COBOL mentioned – CODASYL (Conference on Data Systems Languages, 1959 – Has DDL and DML]

4. SA/SD, Yourdon/DeMarco – Dataflow [Source, Sink, Store, Flow, Process]

5. ER Models (Chen) Useful for RDBMS - http://mysqlworkbench.org/ , Relational part of UML OOD

Sam Siewert 3

Page 4: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Dataflow – Data Processing [SA/SD] Simple Voice / IP – One End-Point Shown Hardware Sources/Sinks, Stored Audio Buffers Dataflow Between Source/Sink, Record, Playback, Streaming, Network Transport Interface

Sam Siewert 4 RTECS, Sam Siewert, 2006 - ISBN-13: 978-1584504689

Page 5: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Sakila ER Model - Logical Design

Sam Siewert 5

Tables, Views, SQL/PSM Routines, Triggers in ER Diagram

Page 6: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

CASE Tools Computer Aided Software Engineering - Schemas – MySQL Workbench – DBMS Logical Design (Schema) – Modelio UML & SysML - SE310 – Visio UML Templates - Design Edit Only!

Software Design Automation – Requirements, Architecture, High-Level Design, Detailed-Design

[Sometimes Executable Simulations with State Machines], Test Cases for Verification and Validation

– Rational Software Tools, Telelogic - Acquired by IBM

Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented]

Sam Siewert 6

Page 7: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Embedded Databases Mobile Devices that Synchronize with Cloud Service Provisioning and Billing Systems – E.g. Telecommunications, Digital Cable, ISPs E.g. Oracle Berkeley DB E.g. http://sqlite.org/ Android SQLite - http://www.androidhive.info/2011/11/android-sqlite-database-tutorial/

Sam Siewert 7

Page 8: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

DBMS Design

DBMS Development Lifecycle

Sam Siewert

8

Page 9: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

DBMS Design Lifecycle Planning - Goals and Objectives System Definition

– Customer interviews – Requirements analysis and capture – ER and Schema design prototyping with user feedback

Requirements review and refinement [SE300, SE310] Logical DBE - Schema design, deployment, test data, normalization, referential integrity, client interfaces [views and connectors] Physical DBE - Selection of DBMS, indexing, installation and scaling, performance Data conversion and load Testing - QA [SE420] Maintain and improve

Sam Siewert 9

Page 10: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

10

Planning

System Definition (Architecture, Views, IM)

Requirements Refinement SRS

Logical DBE (Schema - CASE)

Physical DBE (DBMS physical storage, indexing, scaling)

Strategy for DB Development

Client App Design

Select DBMS (RDBMS, OODBMS,

NoSQL)

Load Data

Test

Maintain

Conceptual

Logical

Physical

Page 11: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Planning Outline Goals and Objectives Basic Schedule Detail Work Breakdown and Tasks Cost - TCO = CAPEX + OPEX Scaling, Disaster Recovery, Data center, Storage and Server Technologies Security - Physical, Logical, and Best Practices

Sam Siewert 11

Page 12: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

System Definition Hosting (Architecture) – Public or Private Cloud – Private SAN, NAS, or DAS - Scale-Up or Scale-Out – Co-Location for Data center? – Private Data center - 3-tier, 4-tier, N-tier

User Views – Collect through user views by funciton (engineering, accounting,

management, manufacturing, sales, etc.) – By job (sustaining engineer, data entry, point-of-sale, supervisor,

etc.)

Views define Information Models IM helps to define major requirements

Sam Siewert 12

Page 13: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

13

Representation of a Database System with Multiple User Views

CB - Ref.

Page 14: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

14

Centralized Approach to Managing Multiple User Views

CB - Ref.

Page 15: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

15

View Integration Approach to Managing Multiple User Views

CB - Ref.

Page 16: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Conceptual Design - Analysis Stick to High Level Information Model – ER, (Traditional Relational approach), EER adds class hierarchy – UML, SysML (OO approach) - Class diagram is EER+methods,

but UML has many more diagrams and models – Top down - define major entities and key relations or classes

Views (keyword in SQL), but also Use Cases – Interview stakeholders – Draft a User’s Guide – Write a data dictionary (bottom up approach for data domain,

attribute analysis for entities)

Conceptual IM should be easily discussed with all stakeholders

Sam Siewert 16

Page 17: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Requirements Focus on Information Required by User – Data descriptions (dictionaries) – Data generation and ingest – Data use (e.g. reports, documents, mobile query, analytics,

decision support, compliance, meta-data for files, other?)

Client Application Requirements (Parallel Development) File system requirements (Parallel Development) Use Centralized (Single IM) or View Integration (IM by Use Case) to be integrated Sam Siewert 17

Page 18: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Start Parallel Efforts - Select DBMS Kick-off Traditional Software Analysis, Design, Development for Client Applications Kick-off File systems to coordinate with RDBMS Select DBMS by Type (Candidates) – Relational (SQL) – OODBMS (C++ or other OOPL + Data = Persistent Objects) – NoSQL - Key / Value, Documents, No Schema (per se)

Sam Siewert 18

Page 19: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Logical Design RDBMS - Relational Schema (ER/EER) – Start work on Schema using High Level IM – Domains, Attributes, Tables, Relations, Keys, etc.

OODBMS - Class Hierarchy and Object Interaction – Use UML (SE310) – Consider SysML (Extension to UML for Systems)

NoSQL - Key / Value Searches and Indexing – Open research topic – Columnar design – E.g. Google Big Query [https://cloud.google.com/bigquery] – Web-based REST (Representational State Transfer)

Sam Siewert 19

Page 20: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Physical Design [Part 2] Choose Storage and Technology – Battery Backed RAM – Phase Change Memory – Host-Bus Flash – SSD – HDD

Install on Block Storage Partitions – SAN or DAS (No File system) – Scale with SAN or DAS Host Channels (scale-up) – Block RAID

Install on File system – NAS, PNFS, GPFS, etc. - Scale out! – Local (scale-up) – File RAID

Indexing Method Selection [Part 2 of our course] Selected DBMS Physical Features

Sam Siewert 20

Page 22: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

DBMS Selection Criteria

Sam Siewert 22

Performance (Transactions/second, Latency, TPC) Reliability, Availability, RPO/RTO – Recovery Point/Time Objective Features (E.g. De-duplication, Logging, Import/Export, …) Ease of Use (SQL Compliance, GUI, Installation)

Page 23: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Load Data Dump Existing Database to Files or SQL For RDBMS to RDBMS, Migrate from Old Schema to New For Files, No-SQL, or OODBMS to RDBMS – Parse files (data processing), generate SQL DML inserts – Data entry (customer driven, or enterprise driven)

Make before Break [Parallel, Shut-down, cut-over] Big Bang [Re-build enterprise data] Evolutionary [Migrate data to new from old over time]

Sam Siewert 23

Page 24: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented] Sam Siewert

Testing Data Integrity (Bad data entry prevention) Referential Integrity (Update, Delete, Insert focus) Performance (Transactions per second) – Custom benchmarks – SPC (RDBMS or No-SQL) or TPC (RDBMS, Big Data)

Resilience (Data loss protection) – Single, Double, or Triple Fault – Availability During Recovery (Hot, Warm, Cold Spares)

Disaster Recovery (loss of client connectivity or data center) RPO, RTO - Recovery Point and Recovery Time Objectives – Transactions lost that need restart (replay) – Time until transactions can be services or re-run

Upgrades and scaling while in service

Sam Siewert 24