31
September 23, 2015 Sam Siewert CS317 File and Database Systems Lecture 5, Part-2 – ORDBMS http://www.ibmbigdatahub.com/video/ibm-big-data-minute-drowning-petabytes

CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

September 23, 2015 Sam Siewert

CS317File and Database Systems

Lecture 5, Part-2 – ORDBMShttp://www.ibmbigdatahub.com/video/ibm-big-data-minute-drowning-petabytes

Page 2: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

SQL Theory and Standards

DBMS Design(Connolly-Begg Chapter 10)

Part-2Development Lifecycle

Sam Siewert

2

Page 3: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

For Discussion…Big Data – Velocity, volume, variety, veracity [2014]

1. Daily – 2.5 quintillion bytes (2,500,000,000,000,000,000) or 2 Exabytes, or 46,566,128 50GB Blu-Ray Discs, IBM Estimate

2. Annually – 7.5 billion in global population, produce/consume 2.25 unique Blu-Rays per Year, or 23 DVDs (assuming even distribution – unlikely)

3. Annually – If produced/consumed by US population alone – 53 Blu-Rays per Year or 564 DVDs per person

4. Data in Total is 40 trillion gigabytes or 800 billion Blu-Rays for just over 100 (unique) Blu-Rays per person globally

5. Data by Powers of 10 and 2 – 264 is 16 Exabytes of Addressable Data [PC limit]

6. Data Max Veolicity is 100 Gbps is Fastest Ethernet [8b/10b – 10 billion bytes per second]

7. How much is Truly Unique Data vs. Duplicated

8. What is the Quality (Veracity) of this Data? Sam Siewert 3

Page 4: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Big DataVolume and Velocity Can Be Estimated as Shown– Disk drives shipped and in use– Online data only, or removable and archive media as well?– Bit-rot (media eventually fails, limited storage lifetime)

Variety, Depends on Level of Data Duplication– Enterprise Storage System Deduplication – E.g. EMC Deduplication– Internet Archive [petabytes] and Wayback machine,

http://www.loc.gov/about/general-information/ [traditional volumes], Stanford Digital Repository, National Archives, National A/V Conservation

Veracity, perhaps Most Challenging Part– Is the Data Correct – Not Corrupted– Is it Valid – From a Known, Trusted Source, Corresponding to

Metadata Description– Has the Data Been Processed and if so, How?– Is it Raw Data (from a sensor, user, other)?– Veracity is difficult – E.g. http://berkeleyearth.org/about-data-set

Sam Siewert 4

Page 5: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Quiz #2

Let’s Go Over it …

Sam Siewert

5

Page 6: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Quiz #2Average was 68.3, Std. Deviation was 17.5 - Primarily Need to Study Book More

Quiz #1 – 81.5, 8.5 (Ideal) – Mostly from In-Class Notes

Let’s Go Over Solutions Now with Book Citations

Solutions Provide References Back to the Book – Posted on Canvas as Well

Sam Siewert 6

Page 7: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Quiz #2 - Review

Sam Siewert 7

Equi-join is a specific type ofTheta-Join where the Predicate tests for EQUIVALENCE ONLY

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Page 8: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Quiz #2 - Review

Sam Siewert 8

See p. 119, 132,1) Selection [Restriction],2) Projection [Projection],3) Union [Join – Specific Union],4) Set Difference [Codd Omits],5) Cartesian Product [Permutation]

Encouraged! See Class Notes and Example of TC,RA, andUse of DISTINCT

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Page 9: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Required [Except Intersection]

Pearson Education © 2014 9

intersection can be composed as R – (R – S)

Page 10: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Nice to Have! - Relational Algebra Operations – Composed from Required

Pearson Education © 2014 10

Page 11: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Quiz #2 - Review

Sam Siewert 11

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Page 12: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

PK, FK EQUIVALENCEBook Says that EQUIVALENCE for Equi-Join is Predicate that Uses “=“ – p. 126 (bottom)This is Simplistic, especially for Multi-table Joins and PKs formed from more than One AttributeE.g. if(X == Y) Can in Fact Involve a Complex Comparison– E.g. if X is a vector = [1, 1, 3] and Y is a vector, then

EQUIVALENCE requires Comparison of Each Component– If((X[0] == Y[0]) && (X[1] == Y[1]) && (X[2] == Y[2]))

Likewise, Consider Simple Tuples of FirstName, LastName, DoB [PK=FirstName, LastName]Another Relation [FK=FirstName, LastName] with Street Address, City, Zipcode Sam Siewert 12

Page 13: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Join Cheat Sheethttp://www.codeproject.com/KB/database/Visual_SQL_Joins/Visual_SQL_JOINS_orig.jpg

Sam Siewert 13

Page 14: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

JOINS You Must KnowMySQL Join Support – Inner, Cross, Left, Right, Outer, Natural, Multi-table with Predicates (Theta and Equi-Join)Cross-Join [p. 171, Matches Theory p. 126]Theta-Join [p. 170 – 3 Table Join]Equi-Join [p. 168-169]Natural-Join (Rarely Used, but Matches Theory on p. 127)Inner-Join (Not in Book! But, Common in MySQL)Alternative Form – Nested Queries [p. 164]Other Joins You are Not Responsible For (Less Useful)

Sam Siewert 14

Page 15: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Connolly-Begg Chapter 9

ORDBMS Extensions to SQL(SQL:2011)

Part -2

Sam Siewert

15

Page 16: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Unstructured DataBLOBs - Binary Large Objects– Images– Digital Video and Audio – Digital Media– Binary Data (Documents and Code), Perhaps Proprietary– http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/Moose-to-Skeleton.png– http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/Sled-Dogs.jpg– http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/korean-air-profile.jpg

CLOBs – Character Large Objects– Log files and Traces (IT)– Transaction Logs– XML, HTML, XDS, etc. [Web documents typically via HTTP,

HTTPS]

Sam Siewert 16

Page 17: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

OO Concepts – “Real World”OOA – Object Oriented Analysis– Define Class Hierarchies (Abstract Classes with Attributes) and

Interfaces (Public, Private) and Methods (Operations)– Inheritance and Multiple Inheritance

OOD – OO Design– Encapsulation of Methods with Data (Attributes) for Abstract and

Derived Classes– Instantiation and Use of Objects [Use Cases]

OOP – Object Oriented Programming (Java, C++, …)– Programming Language – Direct Implementation of OOD– Implementation of Re-useable OO Code Libraries

Boost - http://www.boost.org/OpenCV [C++ version]Many More … in other OOPLs

Sam Siewert 17

Page 18: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Classes Useful in Real WorldE.g. Biology – Kingdom, Phylum, Class, Order, Genus, Species [Multiple Inheritance Examples], Proven Use

Parts – Components compose Sub-system(s) compose System(s) compose System of Systems

Supports Re-Use of Objects Instantiated from Class Hierarchy

Multiple Inheritance – Odd?

Can be Abstract, Derived and Concrete

– E.g. Mathematical, Data Structures, Image Processing

– Organization of Information (Classes in Ontological Web Language)

– Simulation of Physical Systems– Most Often Software Libraries

Sam Siewert 18

http://en.wikipedia.org/wiki/Platypus#mediaviewer/File:Wild_Platypus_4.jpg

https://www.youtube.com/watch?v=kDay5OWDPn4#t=26

Page 19: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Quick Review of OO [not just C++]Encapsulation of Data and Methods in an Instantiated Object

Objects are Instances from a Class Hierarchy– Classes Define Encapsulated Data and Methods

Virtual Functions can Be RefinedPure Virtual Functions in Abstract Classes Defined must be Refined

– Can Inherit Data and Methods from Parent Classes– Can In Fact Have Multiple Inheritance– Instantiated Objects Call Dynamically Bound Methods [Determined at Runtime]

Enables Semantic Overload [Can be Done without OO too]– Overloaded Functions (Methods), Resolved by Type Signatures or Subtype/Sub-

class– Overloaded Operators (E.g. math operators work not only on integers and real

numbers, but also vectors, matrices, and complex numbers)– Derived Data Types from Base types

Polymorphism– Parametric – Re-useable Templates (E.g. Ada and Java Generic, C++ Template)– Functional Semantic Overloading– Dynamic or Subtype or Subclass Polymorphism using Late Binding

OOPs – Smalltalk to more current Java, C++, Ada95, … CLOS Sam Siewert 19

Page 20: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Operator and Function OverloadingWhat is Required to Be OO?

Common Consensus is –Encapsulation, Class Hierarchy, Polymorphism(Parametric & Subtype or Subclass with Late Binding), Inheritance

Operator Overloading Not Required (E.g. Java Frowns Upon, No Support)

Some PLs have OO Features, but not All Sam Siewert 20http://en.wikipedia.org/wiki/Operator_overloading

Page 21: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Storing Objects in Relational Databases

One approach to achieving persistence with an OOPL is touse an RDBMS as the underlying storage engine.– O2 – merged with Informix and acquired by IBM– ObjectStore - http://www.objectstore.com/– Objectivity - http://www.objectivity.com/products/objectivitydb– Versant - http://www.actian.com/products/operational-databases/

Requires mapping class instances (i.e. objects) to one ormore tuples distributed over one or more relations.

To handle class hierarchy, have two basics tasks to perform:(1) design relations to represent class hierarchy;(2) design how objects will be accessed.

Pearson Education © 2009 21

Page 22: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Storing Objects in Relational Databases

Pearson Education © 2009 22

Page 23: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Mapping Classes to RelationsNumber of strategies for mapping classes torelations, although each results in a loss ofsemantic information.

(1) Map each class or subclass to a relation:

Staff (staffNo, fName, lName, position, sex, DOB, salary)Manager (staffNo, bonus, mgrStartDate)SalesPersonnel (staffNo, salesArea, carAllowance)Secretary (staffNo, typingSpeed)

Pearson Education © 2009 23

Page 24: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Mapping Classes to Relations(2) Map each subclass to a relation

Manager (staffNo, fName, lName, position, sex, DOB,salary, bonus, mgrStartDate)SalesPersonnel (staffNo, fName, lName, position, sex,DOB, salary, salesArea, carAllowance)Secretary (staffNo, fName, lName, position, sex, DOB,salary, typingSpeed)

(3) Map the hierarchy to a single relationStaff (staffNo, fName, lName, position, sex, DOB, salary,bonus, mgrStartDate, salesArea, carAllowance,typingSpeed, typeFlag)

Pearson Education © 2009 24

Page 25: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

ORDBMSsRDBMSs currently dominant database technology withestimated sales of US$24billion in 2011, expected togrow to US$37billion by 2016 .

Vendors of RDBMSs conscious of threat and promise ofOODBMS.

Agree that RDBMSs not currently suited to advanceddatabase applications, and added functionality isrequired.

Reject claim that extended RDBMSs will not providesufficient functionality or will be too slow to copeadequately with new complexity.

Can remedy shortcomings of relational model by extending model with OO features.

Pearson Education © 2014 25

Page 26: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

ORDBMSs - FeaturesOO features being added include:– user-extensible types,– encapsulation,– inheritance,– polymorphism,– dynamic binding of methods,– complex objects including non-1NF objects,– object identity.

Pearson Education © 2014 26

Page 27: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

ORDBMSs - FeaturesHowever, no single extended relational model.

All models:– share basic relational tables and query

language,– all have some concept of ‘object’,– some can store methods (or procedures or

triggers).

Some analysts predict ORDBMS will have 50%larger share of market than RDBMS.

Pearson Education © 2014 27

Page 28: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Stonebraker’s View

Pearson Education © 2014 28

Page 29: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Advantages of ORDBMSsResolves many of known weaknesses of RDBMS.Reuse and sharing:– reuse comes from ability to extend server to

perform standard functionality centrally;– gives rise to increased productivity both for

developer and end-user.Preserves significant body of knowledge andexperience gone into developing relationalapplications.

Pearson Education © 2014 29

Page 30: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

Disadvantages of ORDBMSsComplexity.Increased costs.Proponents of relational approach believe simplicityand purity of relational model are lost.Some believe RDBMS is being extended for what willbe a minority of applications.OO purists not attracted by extensions either.SQL now extremely complex.

Pearson Education © 2014 30

Page 31: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/...Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited

SQL:2011 - New OO FeaturesType constructors for row types and reference types.User-defined types (distinct types and structuredtypes) that can participate in supertype/subtyperelationships.User-defined procedures, functions, methods, andoperators.Type constructors for collection types (arrays, sets,lists, and multisets).Support for large objects – BLOBs and CLOBs.Recursion.

Pearson Education © 2014 31