Upload
others
View
32
Download
0
Embed Size (px)
Citation preview
October 1, 2017 Sam Siewert
CS317 File and Database Systems
Lecture 6 – DBMS Development Lifecycle
http://dilbert.com/strips/comic/1998-03-23/
DBMS Analysis and Design
DBMS Development Lifecycle
Sam Siewert
2
For Discussion… Software Engineering vs. DBMS Analysis and Design
1. Modern SWE Lifecycles Include Feedback – E.g. DevOps,
Spiral, Extreme Programming in Agile, and even Waterfall with Feedback
2. Software Engineering Initiated by 1968 NATO Conference on the Software Crisis [Paper on Blackboard]
3. No Mention of Databases, Data Processing More So – Focus on Programming and Programming Languages [COBOL mentioned – CODASYL (Conference on Data Systems Languages, 1959 – Has DDL and DML]
4. SA/SD, Yourdon/DeMarco – Dataflow [Source, Sink, Store, Flow, Process]
5. ER Models (Chen) Useful for RDBMS - http://mysqlworkbench.org/ , Relational part of UML OOD
Sam Siewert 3
Dataflow – Data Processing [SA/SD] Simple Voice / IP – One End-Point Shown Hardware Sources/Sinks, Stored Audio Buffers Dataflow Between Source/Sink, Record, Playback, Streaming, Network Transport Interface
Sam Siewert 4 RTECS, Sam Siewert, 2006 - ISBN-13: 978-1584504689
Sakila ER Model - Logical Design
Sam Siewert 5
Tables, Views, SQL/PSM Routines, Triggers in ER Diagram
CASE Tools Computer Aided Software Engineering - Schemas – MySQL Workbench – DBMS Logical Design (Schema) – Modelio UML & SysML - SE310 – Visio UML Templates - Design Edit Only!
Software Design Automation – Requirements, Architecture, High-Level Design, Detailed-Design
[Sometimes Executable Simulations with State Machines], Test Cases for Verification and Validation
– Rational Software Tools, Telelogic - Acquired by IBM
Discrete Event Simulation - MATLAB SimEvents, SimPy, SystemC [Hardware/System Oriented]
Sam Siewert 6
Embedded Databases Mobile Devices that Synchronize with Cloud Service Provisioning and Billing Systems – E.g. Telecommunications, Digital Cable, ISPs E.g. Oracle Berkeley DB E.g. http://sqlite.org/ Android SQLite - http://www.androidhive.info/2011/11/android-sqlite-database-tutorial/
Sam Siewert 7
DBMS Design
DBMS Development Lifecycle
Sam Siewert
8
DBMS Design Lifecycle Planning - Goals and Objectives System Definition
– Customer interviews – Requirements analysis and capture – ER and Schema design prototyping with user feedback
Requirements review and refinement [SE300, SE310] Logical DBE - Schema design, deployment, test data, normalization, referential integrity, client interfaces [views and connectors] Physical DBE - Selection of DBMS, indexing, installation and scaling, performance Data conversion and load Testing - QA [SE420] Maintain and improve
Sam Siewert 9
10
Planning
System Definition (Architecture, Views, IM)
Requirements Refinement SRS
Logical DBE (Schema - CASE)
Physical DBE (DBMS physical storage, indexing, scaling)
Strategy for DB Development
Client App Design
Select DBMS (RDBMS, OODBMS,
NoSQL)
Load Data
Test
Maintain
Conceptual
Logical
Physical
Planning Outline Goals and Objectives Basic Schedule Detail Work Breakdown and Tasks Cost - TCO = CAPEX + OPEX Scaling, Disaster Recovery, Data center, Storage and Server Technologies Security - Physical, Logical, and Best Practices
Sam Siewert 11
System Definition Hosting (Architecture) – Public or Private Cloud – Private SAN, NAS, or DAS - Scale-Up or Scale-Out – Co-Location for Data center? – Private Data center - 3-tier, 4-tier, N-tier
User Views – Collect through user views by funciton (engineering, accounting,
management, manufacturing, sales, etc.) – By job (sustaining engineer, data entry, point-of-sale, supervisor,
etc.)
Views define Information Models IM helps to define major requirements
Sam Siewert 12
13
Representation of a Database System with Multiple User Views
CB - Ref.
14
Centralized Approach to Managing Multiple User Views
CB - Ref.
15
View Integration Approach to Managing Multiple User Views
CB - Ref.
Conceptual Design - Analysis Stick to High Level Information Model – ER, (Traditional Relational approach), EER adds class hierarchy – UML, SysML (OO approach) - Class diagram is EER+methods,
but UML has many more diagrams and models – Top down - define major entities and key relations or classes
Views (keyword in SQL), but also Use Cases – Interview stakeholders – Draft a User’s Guide – Write a data dictionary (bottom up approach for data domain,
attribute analysis for entities)
Conceptual IM should be easily discussed with all stakeholders
Sam Siewert 16
Requirements Focus on Information Required by User – Data descriptions (dictionaries) – Data generation and ingest – Data use (e.g. reports, documents, mobile query, analytics,
decision support, compliance, meta-data for files, other?)
Client Application Requirements (Parallel Development) File system requirements (Parallel Development) Use Centralized (Single IM) or View Integration (IM by Use Case) to be integrated Sam Siewert 17
Start Parallel Efforts - Select DBMS Kick-off Traditional Software Analysis, Design, Development for Client Applications Kick-off File systems to coordinate with RDBMS Select DBMS by Type (Candidates) – Relational (SQL) – OODBMS (C++ or other OOPL + Data = Persistent Objects) – NoSQL - Key / Value, Documents, No Schema (per se)
Sam Siewert 18
Logical Design RDBMS - Relational Schema (ER/EER) – Start work on Schema using High Level IM – Domains, Attributes, Tables, Relations, Keys, etc.
OODBMS - Class Hierarchy and Object Interaction – Use UML (SE310) – Consider SysML (Extension to UML for Systems)
NoSQL - Key / Value Searches and Indexing – Open research topic – Columnar design – E.g. Google Big Query [https://cloud.google.com/bigquery] – Web-based REST (Representational State Transfer)
Sam Siewert 19
Physical Design [Part 2] Choose Storage and Technology – Battery Backed RAM – Phase Change Memory – Host-Bus Flash – SSD – HDD
Install on Block Storage Partitions – SAN or DAS (No File system) – Scale with SAN or DAS Host Channels (scale-up) – Block RAID
Install on File system – NAS, PNFS, GPFS, etc. - Scale out! – Local (scale-up) – File RAID
Indexing Method Selection [Part 2 of our course] Selected DBMS Physical Features
Sam Siewert 20
21
Three-Level ANSI-SPARC Architecture and Phases of Database Design
CB - Ref.
DBMS Selection Criteria
Sam Siewert 22
Performance (Transactions/second, Latency, TPC) Reliability, Availability, RPO/RTO – Recovery Point/Time Objective Features (E.g. De-duplication, Logging, Import/Export, …) Ease of Use (SQL Compliance, GUI, Installation)
Load Data Dump Existing Database to Files or SQL For RDBMS to RDBMS, Migrate from Old Schema to New For Files, No-SQL, or OODBMS to RDBMS – Parse files (data processing), generate SQL DML inserts – Data entry (customer driven, or enterprise driven)
Make before Break [Parallel, Shut-down, cut-over] Big Bang [Re-build enterprise data] Evolutionary [Migrate data to new from old over time]
Sam Siewert 23
Testing Data Integrity (Bad data entry prevention) Referential Integrity (Update, Delete, Insert focus) Performance (Transactions per second) – Custom benchmarks – SPC (RDBMS or No-SQL) or TPC (RDBMS, Big Data)
Resilience (Data loss protection) – Single, Double, or Triple Fault – Availability During Recovery (Hot, Warm, Cold Spares)
Disaster Recovery (loss of client connectivity or data center) RPO, RTO - Recovery Point and Recovery Time Objectives – Transactions lost that need restart (replay) – Time until transactions can be services or re-run
Upgrades and scaling while in service
Sam Siewert 24