24
October 1, 2017 Sam Siewert CS317 File and Database Systems Lecture 6 – DBMS Development Lifecycle http://dilbert.com/strips/comic/1998-03-23/

CS317 File and Database Systems - mercury.pr.erau.edumercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/Lecture... · Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC

  • Upload
    votuyen

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

October 1, 2017 Sam Siewert

CS317File and Database Systems

Lecture 6 – DBMS Development Lifecycle

http://dilbert.com/strips/comic/1998-03-23/

DBMS Analysis and Design

DBMS Development Lifecycle

Sam Siewert

2

For Discussion…Software Engineering vs. DBMS Analysis and Design

1. Modern SWE Lifecycles Include Feedback – E.g. DevOps, Spiral, Extreme Programming in Agile, and even Waterfall with Feedback

2. Software Engineering Initiated by 1968 NATO Conference on the Software Crisis [Paper on Blackboard]

3. No Mention of Databases, Data Processing More So – Focus on Programming and Programming Languages [COBOL mentioned – CODASYL (Conference on Data Systems Languages, 1959 – Has DDL and DML]

4. SA/SD, Yourdon/DeMarco – Dataflow [Source, Sink, Store, Flow, Process]

5. ER Models (Chen) Useful for RDBMS -http://mysqlworkbench.org/ , Relational part of UML OOD

Sam Siewert 3

Dataflow – Data Processing [SA/SD]Simple Voice / IP – One End-Point ShownHardware Sources/Sinks, Stored Audio BuffersDataflow Between Source/Sink, Record, Playback, Streaming, Network Transport Interface

Sam Siewert 4RTECS, Sam Siewert, 2006 - ISBN-13: 978-1584504689

Sakila ER Model - Logical Design

Sam Siewert 5

Tables, Views, SQL/PSM Routines, Triggers in ER Diagram

CASE ToolsComputer Aided Software Engineering - Schemas– MySQL Workbench – DBMS Logical Design (Schema)– Modelio UML & SysML - SE310– Visio UML Templates - Design Edit Only!

Software Design Automation– Requirements, Architecture, High-Level Design, Detailed-Design

[Sometimes Executable Simulations with State Machines], Test Cases for Verification and Validation

– Rational Software Tools, Telelogic - Acquired by IBM

Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented]

Sam Siewert 6

Embedded DatabasesMobile Devices that Synchronize with Cloud

Service Provisioning and Billing Systems – E.g. Telecommunications, Digital Cable, ISPs

E.g. Oracle Berkeley DB

E.g. http://sqlite.org/

Android SQLite -http://www.androidhive.info/2011/11/android-sqlite-database-tutorial/

Sam Siewert 7

DBMS Design

DBMS Development Lifecycle

Sam Siewert

8

DBMS Design LifecyclePlanning - Goals and Objectives

System Definition– Customer interviews– Requirements analysis and capture– ER and Schema design prototyping with user feedback

Requirements review and refinement [SE300, SE310]

Logical DBE - Schema design, deployment, test data, normalization, referential integrity, client interfaces [views and connectors]

Physical DBE - Selection of DBMS, indexing, installation and scaling, performance

Data conversion and load

Testing - QA [SE420]

Maintain and improve

Sam Siewert 9

10

Planning

System Definition(Architecture, Views, IM)

Requirements RefinementSRS

Logical DBE(Schema - CASE)

Physical DBE(DBMS physical storage, indexing, scaling)

Strategy for DB Development

Client AppDesign

Select DBMS(RDBMS, OODBMS,

NoSQL)

Load Data

Test

Maintain

Conceptual

Logical

Physical

PlanningOutline Goals and Objectives

Basic Schedule

Detail Work Breakdown and Tasks

Cost - TCO = CAPEX + OPEX

Scaling, Disaster Recovery, Data center, Storage and Server Technologies

Security - Physical, Logical, and Best Practices

Sam Siewert 11

System DefinitionHosting (Architecture)– Public or Private Cloud– Private SAN, NAS, or DAS - Scale-Up or Scale-Out– Co-Location for Data center?– Private Data center - 3-tier, 4-tier, N-tier

User Views– Collect through user views by funciton (engineering, accounting,

management, manufacturing, sales, etc.)– By job (sustaining engineer, data entry, point-of-sale, supervisor,

etc.)

Views define Information Models

IM helps to define major requirements

Sam Siewert 12

13

Representation of a Database System with Multiple User Views

CB - Ref.

14

Centralized Approach to Managing Multiple User Views

CB - Ref.

15

View Integration Approach to Managing Multiple User Views

CB - Ref.

Conceptual Design - AnalysisStick to High Level Information Model– ER, (Traditional Relational approach), EER adds class hierarchy– UML, SysML (OO approach) - Class diagram is EER+methods,

but UML has many more diagrams and models– Top down - define major entities and key relations or classes

Views (keyword in SQL), but also Use Cases– Interview stakeholders– Draft a User’s Guide– Write a data dictionary (bottom up approach for data domain,

attribute analysis for entities)

Conceptual IM should be easily discussed with all stakeholders

Sam Siewert 16

RequirementsFocus on Information Required by User– Data descriptions (dictionaries)– Data generation and ingest– Data use (e.g. reports, documents, mobile query, analytics,

decision support, compliance, meta-data for files, other?)

Client Application Requirements (Parallel Development)

File system requirements (Parallel Development)

Use Centralized (Single IM) or View Integration (IM by Use Case) to be integrated

Sam Siewert 17

Start Parallel Efforts - Select DBMSKick-off Traditional Software Analysis, Design, Development for Client Applications

Kick-off File systems to coordinate with RDBMS

Select DBMS by Type (Candidates)– Relational (SQL)– OODBMS (C++ or other OOPL + Data = Persistent Objects)– NoSQL - Key / Value, Documents, No Schema (per se)

Sam Siewert 18

Logical DesignRDBMS - Relational Schema (ER/EER)– Start work on Schema using High Level IM– Domains, Attributes, Tables, Relations, Keys, etc.

OODBMS - Class Hierarchy and Object Interaction– Use UML (SE310)– Consider SysML (Extension to UML for Systems)

NoSQL - Key / Value Searches and Indexing– Open research topic– Columnar design– E.g. Google Big Query [https://cloud.google.com/bigquery]– Web-based REST (Representational State Transfer)

Sam Siewert 19

Physical Design [Part 2]Choose Storage and Technology– Battery Backed RAM– Phase Change Memory– Host-Bus Flash– SSD– HDD

Install on Block Storage Partitions– SAN or DAS (No File system)– Scale with SAN or DAS Host Channels (scale-up)– Block RAID

Install on File system– NAS, PNFS, GPFS, etc. - Scale out!– Local (scale-up)– File RAID

Indexing Method Selection [Part 2 of our course]

Selected DBMS Physical Features

Sam Siewert 20

21

Three-Level ANSI-SPARC Architecture and Phases of Database Design

CB - Ref.

DBMS Selection Criteria

Sam Siewert 22

Performance (Transactions/second, Latency, TPC)Reliability, Availability, RPO/RTO – Recovery Point/Time ObjectiveFeatures (E.g. De-duplication, Logging, Import/Export, …)Ease of Use (SQL Compliance, GUI, Installation)

Load DataDump Existing Database to Files or SQL

For RDBMS to RDBMS, Migrate from Old Schema to New

For Files, No-SQL, or OODBMS to RDBMS– Parse files (data processing), generate SQL DML inserts– Data entry (customer driven, or enterprise driven)

Make before Break [Parallel, Shut-down, cut-over]Big Bang [Re-build enterprise data]Evolutionary [Migrate data to new from old over time]

Sam Siewert 23

TestingData Integrity (Bad data entry prevention)Referential Integrity (Update, Delete, Insert focus)Performance (Transactions per second)– Custom benchmarks– SPC (RDBMS or No-SQL) or TPC (RDBMS, Big Data)

Resilience (Data loss protection)– Single, Double, or Triple Fault– Availability During Recovery (Hot, Warm, Cold Spares)

Disaster Recovery (loss of client connectivity or data center)RPO, RTO - Recovery Point and Recovery Time Objectives– Transactions lost that need restart (replay)– Time until transactions can be services or re-run

Upgrades and scaling while in service

Sam Siewert 24