Upload
others
View
34
Download
0
Embed Size (px)
Citation preview
A Comparison of SQLA Comparison of SQLandand NoSQLNoSQL DatabasesDatabases
Keith W. HareKeith W. HareJCC Consulting, Inc.JCC Consulting, Inc.
Convenor, ISO/IEC JTC1 SC32 WG3Convenor, ISO/IEC JTC1 SC32 WG3
13 May 2011 Metadata Open Forum 1
ISO/IEC JTC1/SC32/WG2 N1537
AbstractAbstract
NoSQLNoSQL databases (either nodatabases (either no--SQL or Not OnlySQL or Not OnlySQL) are currently a hot topic in some parts ofSQL) are currently a hot topic in some parts ofcomputing. In fact, one website lists over acomputing. In fact, one website lists over ahundred differenthundred different NoSQLNoSQL databases.databases.
This presentation reviews the features common toThis presentation reviews the features common tothethe NoSQLNoSQL databases and compares those featuresdatabases and compares those featuresto the features and capabilities of SQL databases.to the features and capabilities of SQL databases.
13 May 2011 Metadata Open Forum 2
Who Am I?Who Am I?
Muskingum College, 1980, BS in Biology andMuskingum College, 1980, BS in Biology andComputer ScienceComputer Science
Senior Consultant with JCC Consulting, Inc.Senior Consultant with JCC Consulting, Inc.since 1985since 1985 –– high performance database systemshigh performance database systems
Ohio StateOhio State –– Masters in Computer &Masters in Computer &Information Science, 1985Information Science, 1985
SQL Standards committees since 1988SQL Standards committees since 1988
Vice Chair, INCITS H2 since 2003Vice Chair, INCITS H2 since 2003
Convenor, ISO/IEC JTC1 SC32 WG3 sinceConvenor, ISO/IEC JTC1 SC32 WG3 since20052005
13 May 2011 Metadata Open Forum 3
TopicsTopics
SQLSQL DatabasesDatabases
SQL StandardSQL Standard
SQL CharacteristicsSQL Characteristics
SQL Database ExamplesSQL Database Examples
NoSQLNoSQL DatabasesDatabases
NoSQLNoSQL DefintionDefintion
General CharacteristicsGeneral Characteristics
NoSQLNoSQL Database TypesDatabase Types
NoSQLNoSQL Database ExamplesDatabase Examples
13 May 2011 Metadata Open Forum 4
Standard SQLStandard SQLThe following is a short, incomplete history of the SQLThe following is a short, incomplete history of the SQL
StandardsStandards –– ISO/IEC 9075ISO/IEC 9075
19871987 –– Initial ISO/IEC StandardInitial ISO/IEC Standard
19891989 –– Referential IntegrityReferential Integrity
19921992 –– SQL2SQL2 1995 SQL/CLI (ODBC)1995 SQL/CLI (ODBC)
1996 SQL/PSM1996 SQL/PSM –– Procedural LanguageProcedural Language extensionsextensions
19991999 –– User Defined TypesUser Defined Types
20032003 –– SQL/XMLSQL/XML
20082008 –– Expansions andExpansions and correctionscorrections
2011 (or 2012) System Versioned and Application Time2011 (or 2012) System Versioned and Application TimePeriod TablesPeriod Tables
13 May 2011 Metadata Open Forum 5
SQL CharacteristicsSQL Characteristics
Data stored in columns and tablesData stored in columns and tables
Relationships represented by dataRelationships represented by data
Data Manipulation LanguageData Manipulation Language
Data Definition LanguageData Definition Language
TransactionsTransactions
Abstraction from physical layerAbstraction from physical layer
13 May 2011 Metadata Open Forum 6
SQL Physical Layer AbstractionSQL Physical Layer Abstraction
Applications specify what, not howApplications specify what, not how
Query optimization engineQuery optimization engine
Physical layer can change without modifyingPhysical layer can change without modifyingapplicationsapplications
Create indexes to support queriesCreate indexes to support queries
In Memory databasesIn Memory databases
13 May 2011 Metadata Open Forum 7
Data Manipulation Language (DML)Data Manipulation Language (DML)
Data manipulated with Select, Insert, Update, &Data manipulated with Select, Insert, Update, &Delete statementsDelete statements
Select T1.Column1, T2.Column2 …Select T1.Column1, T2.Column2 …From Table1, Table2 …From Table1, Table2 …Where T1.Column1 = T2.Column1 …Where T1.Column1 = T2.Column1 …
Data AggregationData Aggregation
Compound statementsCompound statements
Functions andFunctions and ProceduresProcedures
Explicit transaction controlExplicit transaction control
13 May 2011 Metadata Open Forum 8
Data Definition LanguageData Definition Language SchemaSchema defineddefined at the startat the start CreateCreate Table (Column1 Datatype1, Column2Table (Column1 Datatype1, Column2 DatatypeDatatype
2, …)2, …) Constraints to define and enforce relationshipsConstraints to define and enforce relationships
Primary KeyPrimary Key Foreign KeyForeign Key Etc.Etc.
Triggers to respond to Insert, Update , & DeleteTriggers to respond to Insert, Update , & Delete Stored ModulesStored Modules Alter …Alter … Drop …Drop … Security and Access ControlSecurity and Access Control
13 May 2011 Metadata Open Forum 9
TransactionsTransactions –– ACID PropertiesACID Properties
AAtomictomic –– All of the work in a transaction completesAll of the work in a transaction completes(commit) or none of it completes(commit) or none of it completes
CConsistentonsistent –– A transaction transforms the databaseA transaction transforms the databasefrom one consistent state to another consistentfrom one consistent state to another consistentstate. Consistency is defined in terms of constraints.state. Consistency is defined in terms of constraints.
IIsolatedsolated –– The results of any changes made during aThe results of any changes made during atransaction are not visible until the transaction hastransaction are not visible until the transaction hascommitted.committed.
DDurableurable –– The results of a committed transactionThe results of a committed transactionsurvive failuressurvive failures
13 May 2011 Metadata Open Forum 10
SQL Database ExamplesSQL Database Examples
CommercialCommercial IBM DB2IBM DB2
Oracle RDMSOracle RDMS
Microsoft SQL ServerMicrosoft SQL Server
Sybase SQL AnywhereSybase SQL Anywhere
Open Source (with commercial options)Open Source (with commercial options) MySQLMySQL
IngresIngres
Significant portions of theSignificant portions of theworld’s economy use SQL databases!world’s economy use SQL databases!
13 May 2011 Metadata Open Forum 11
NoSQLNoSQL DefinitionDefinition
From www.nosqlFrom www.nosql--database.org:database.org:
NextNext Generation Databases mostly addressing some ofGeneration Databases mostly addressing some ofthe points: beingthe points: being nonnon--relational,relational, distributeddistributed,, openopen--sourcesource andand horizontal scalablehorizontal scalable. The original intention. The original intentionhas beenhas been modern webmodern web--scale databasesscale databases. The. Themovement began early 2009 and is growing rapidly.movement began early 2009 and is growing rapidly.Often more characteristics apply as:Often more characteristics apply as: schemaschema--free,free,easy replication support, simple API, eventuallyeasy replication support, simple API, eventuallyconsistentconsistent // BASEBASE (not ACID), a(not ACID), a huge datahuge dataamountamount, and more., and more.
13 May 2011 Metadata Open Forum 12
NoSQLNoSQL Products/ProjectsProducts/Projects
http://www.nosqlhttp://www.nosql--database.org/database.org/ lists 122lists 122 NoSQLNoSQLDatabasesDatabases
CassandraCassandra
CouchDBCouchDB
HadoopHadoop && HbaseHbase
MongoDBMongoDB
StupidDBStupidDB
Etc.Etc.
13 May 2011 Metadata Open Forum 13
NoSQLNoSQL Distinguishing CharacteristicsDistinguishing Characteristics
LargeLarge datadata volumesvolumes Google’s “big data”Google’s “big data”
Scalable replication and distributionScalable replication and distribution Potentially thousands of machinesPotentially thousands of machines Potentially distributed around the worldPotentially distributed around the world
QueriesQueries need to return answers quicklyneed to return answers quickly MostlyMostly query, fewquery, few updatesupdates Asynchronous Inserts & UpdatesAsynchronous Inserts & Updates SchemaSchema--lessless ACIDACID transaction properties are nottransaction properties are not neededneeded –– BASEBASE CAP TheoremCAP Theorem Open source developmentOpen source development
13 May 2011 Metadata Open Forum 14
BASE TransactionsBASE Transactions
AcronymAcronym ccontrived to be the opposite of ACIDontrived to be the opposite of ACID BBasicallyasically AAvailablevailable,,
SSoftoft state,state,
EEventually Consistentventually Consistent
CharacteristicsCharacteristics WeakWeak consistencyconsistency –– stale data OKstale data OK
AvailabilityAvailability firstfirst
BestBest efforteffort
ApproximateApproximate answers OKanswers OK
AggressiveAggressive (optimistic)(optimistic)
SimplerSimpler and fasterand faster
13 May 2011 Metadata Open Forum 15
Brewer’s CAP TheoremBrewer’s CAP Theorem
A distributed system can support only two of theA distributed system can support only two of thefollowing characteristics:following characteristics:
ConsistencyConsistency
AvailabilityAvailability
Partition tolerancePartition tolerance
The slides from Brewer’s July 2000 talk do notThe slides from Brewer’s July 2000 talk do notdefine these characteristics.define these characteristics.
13 May 2011 Metadata Open Forum 16
ConsistencyConsistency
all nodes see the same data at the same timeall nodes see the same data at the same time ––WikipediaWikipedia
client perceives that a set of operations hasclient perceives that a set of operations hasoccurred all at onceoccurred all at once –– PritchettPritchett
More like Atomic in ACID transactionMore like Atomic in ACID transactionpropertiesproperties
13 May 2011 17Metadata Open Forum
AvailabilityAvailability
node failures do not prevent survivors fromnode failures do not prevent survivors fromcontinuing to operatecontinuing to operate –– WikipediaWikipedia
Every operation must terminate inEvery operation must terminate in an intendedan intendedresponseresponse –– PritchettPritchett
13 May 2011 18Metadata Open Forum
Partition TolerancePartition Tolerance
the system continues to operate despite arbitrarythe system continues to operate despite arbitrarymessage lossmessage loss –– WikipediaWikipedia
Operations will complete, even if individualOperations will complete, even if individualcomponents are unavailablecomponents are unavailable –– PritchettPritchett
13 May 2011 19Metadata Open Forum
NoSQLNoSQL Database TypesDatabase Types
DiscussingDiscussing NoSQLNoSQL databases is complicateddatabases is complicatedbecause there are a variety of types:because there are a variety of types:
Column StoreColumn Store –– Each storage block containsEach storage block containsdata from only one columndata from only one column
Document StoreDocument Store –– stores documents made up ofstores documents made up oftagged elementstagged elements
KeyKey--Value StoreValue Store –– Hash table of keysHash table of keys
13 May 2011 Metadata Open Forum 20
Other NonOther Non--SQL DatabasesSQL Databases
XML DatabasesXML Databases
Graph DatabasesGraph Databases
CodasylCodasyl DatabasesDatabases
Object Oriented DatabasesObject Oriented Databases
Etc…Etc…
Will not address these todayWill not address these today
13 May 2011 Metadata Open Forum 21
NoSQLNoSQL Example: Column StoreExample: Column Store
Each storage block contains data from only oneEach storage block contains data from only onecolumncolumn
Example:Example: HadoopHadoop//HbaseHbase
http://hadoop.apache.org/http://hadoop.apache.org/
Yahoo, FacebookYahoo, Facebook
Example: IngresExample: Ingres VectorWiseVectorWise
Column Store integrated with an SQL databaseColumn Store integrated with an SQL database
http://www.ingres.com/products/vectorwisehttp://www.ingres.com/products/vectorwise
13 May 2011 Metadata Open Forum 22
Column Store CommentsColumn Store Comments
More efficient than row (or document) store if:More efficient than row (or document) store if:
Multiple row/record/documents are inserted at theMultiple row/record/documents are inserted at thesame time so updates of column blocks can besame time so updates of column blocks can beaggregatedaggregated
Retrievals access only some of the columns in aRetrievals access only some of the columns in arow/record/documentrow/record/document
13 May 2011 Metadata Open Forum 23
NoSQLNoSQL Example: Document StoreExample: Document Store
Example:Example: CouchDBCouchDB
http://couchdb.apache.orghttp://couchdb.apache.org//
BBCBBC
Example:Example: MongoDBMongoDB
http://www.mongodb.orghttp://www.mongodb.org//
Foursquare,Foursquare, ShutterflyShutterfly
JSONJSON –– JavaScript Object NotationJavaScript Object Notation
13 May 2011 Metadata Open Forum 24
CouchDBCouchDB JSON ExampleJSON Example{{
"_id": ""_id": "guidguid goes here",goes here",
"_rev": "314159","_rev": "314159",
"type": "abstract","type": "abstract",
"author": "Keith W. Hare""author": "Keith W. Hare"
"title": "SQL Standard and"title": "SQL Standard and NoSQLNoSQL Databases",Databases",
"body": ""body": "NoSQLNoSQL databases (either nodatabases (either no--SQL or Not Only SQL)SQL or Not Only SQL)
are currently a hot topic in some partsare currently a hot topic in some parts ofof
computing.",computing.",
""creation_timestampcreation_timestamp": "2011/05/10 13:30:00 +0004"": "2011/05/10 13:30:00 +0004"
}}
13 May 2011 Metadata Open Forum 25
CouchDBCouchDB JSON TagsJSON Tags
"_"_id"id"
GUIDGUID –– Global Unique IdentifierGlobal Unique Identifier
Passed in or generated byPassed in or generated by CouchDBCouchDB
"_rev""_rev"
Revision numberRevision number
Versioning mechanismVersioning mechanism
"type", "author","type", "author", ""title", etc.title", etc.
Arbitrary tagsArbitrary tags
SchemaSchema--lessless
Could be validated after the fact by userCould be validated after the fact by user--written routinewritten routine
13 May 2011 Metadata Open Forum 26
NoSQLNoSQL Examples: KeyExamples: Key--Value StoreValue Store
Hash tables of KeysHash tables of Keys
Values stored with KeysValues stored with Keys
Fast access to small data valuesFast access to small data values
ExampleExample –– ProjectProject--VoldemortVoldemort
httphttp://www.project://www.project--voldemort.comvoldemort.com//
LinkedinLinkedin
ExampleExample –– MemCacheDBMemCacheDB
httphttp://memcachedb.org://memcachedb.org//
Backend storage is BerkeleyBackend storage is Berkeley--DBDB
13 May 2011 Metadata Open Forum 27
Map ReduceMap Reduce
Technique for indexing andTechnique for indexing and searching large datasearching large datavolumesvolumes
Two Phases, Map and ReduceTwo Phases, Map and Reduce
MapMap
Extract sets of KeyExtract sets of Key--Value pairs from underlying dataValue pairs from underlying data
Potentially in Parallel on multiple machinesPotentially in Parallel on multiple machines
ReduceReduce
Merge and sort sets of KeyMerge and sort sets of Key--Value pairsValue pairs
Results may be useful for other searchesResults may be useful for other searches
13 May 2011 Metadata Open Forum 28
Map ReduceMap Reduce
Map Reduce techniques differ across productsMap Reduce techniques differ across products
Implemented by application developers, not byImplemented by application developers, not byunderlying softwareunderlying software
13 May 2011 Metadata Open Forum 29
Map Reduce PatentMap Reduce PatentGoogle granted US Patent 7,650,331, January 2010Google granted US Patent 7,650,331, January 2010
System and method for efficient largeSystem and method for efficient large--scale data processingscale data processing
AA largelarge--scale data processing system and method includes onescale data processing system and method includes oneor more applicationor more application--independent map modules configured toindependent map modules configured toread input data and to apply at least oneread input data and to apply at least one applicationapplication--specificspecificmap operationmap operation to the input data to produce intermediate datato the input data to produce intermediate datavalues, wherein the map operation is automatically parallelizedvalues, wherein the map operation is automatically parallelizedacross multiple processors in the parallel processingacross multiple processors in the parallel processingenvironment. A plurality of intermediate data structures areenvironment. A plurality of intermediate data structures areused to store the intermediate data values. One or moreused to store the intermediate data values. One or moreapplicationapplication--independent reduce modules are configured toindependent reduce modules are configured toretrieve the intermediate data values and to apply at least oneretrieve the intermediate data values and to apply at least oneapplicationapplication--specific reduce operationspecific reduce operation to the intermediateto the intermediatedata values to provide output data.data values to provide output data.
13 May 2011 Metadata Open Forum 30
Storing and Modifying DataStoring and Modifying Data
Syntax variesSyntax varies
HTMLHTML
Java ScriptJava Script
Etc.Etc.
AsynchronousAsynchronous –– Inserts and updates do not waitInserts and updates do not waitfor confirmationfor confirmation
VersionedVersioned
Optimistic ConcurrencyOptimistic Concurrency
13 May 2011 Metadata Open Forum 31
Retrieving DataRetrieving Data
Syntax VariesSyntax Varies
No setNo set--based query languagebased query language
Procedural program languages such as Java, C, etc.Procedural program languages such as Java, C, etc.
Application specifies retrieval pathApplication specifies retrieval path
No query optimizerNo query optimizer
Quick answer is importantQuick answer is important
May not be a single “right” answerMay not be a single “right” answer
13 May 2011 Metadata Open Forum 32
Open SourceOpen Source
Small upfront software costsSmall upfront software costs
Suitable for large scale distribution onSuitable for large scale distribution oncommodity hardwarecommodity hardware
13 May 2011 Metadata Open Forum 33
NoSQLNoSQL SummarySummary
NoSQLNoSQL databases reject:databases reject:
Overhead of ACID transactionsOverhead of ACID transactions
“Complexity” of SQL“Complexity” of SQL
Burden of upBurden of up--front schema designfront schema design
Declarative query expressionDeclarative query expression
Yesterday’s technologyYesterday’s technology
Programmer responsible forProgrammer responsible for
StepStep--byby--step procedural languagestep procedural language
Navigating access pathNavigating access path
13 May 2011 Metadata Open Forum 34
SummarySummary
SQL DatabasesSQL Databases Predefined SchemaPredefined Schema
Standard definition and interface languageStandard definition and interface language
Tight consistencyTight consistency
Well defined semanticsWell defined semantics
NoSQLNoSQL DatabaseDatabase No predefined SchemaNo predefined Schema
PerPer--product definition and interface languageproduct definition and interface language
Getting an answer quickly is more important thanGetting an answer quickly is more important thangetting a correct answergetting a correct answer
13 May 2011 Metadata Open Forum 35
13 May 2011 Metadata Open Forum 36
Questions?Questions?
13 May 2011 Metadata Open Forum 37
Web ReferencesWeb References ““NoSQLNoSQL ---- YourYour Ultimate Guide toUltimate Guide to the Nonthe Non -- RelationalRelational
UniverseUniverse!”!”httphttp://://nosqlnosql--database.org/links.htmldatabase.org/links.html
““NoSQLNoSQL (RDBMS(RDBMS)”)”httphttp://en.wikipedia.org/wiki/NoSQL://en.wikipedia.org/wiki/NoSQL
PODC Keynote, July 19, 2000.PODC Keynote, July 19, 2000. Towards RobustTowards Robust.. Distributed SystemsDistributed Systems..Dr. Eric A.Dr. Eric A. BrewerBrewer. Professor, UC Berkeley. Co. Professor, UC Berkeley. Co--Founder & ChiefFounder & ChiefScientist,Scientist, InktomiInktomi ..www.eecs.berkeley.edu/~www.eecs.berkeley.edu/~brewerbrewer/cs262b/cs262b--2004/PODC2004/PODC--keynote.pdfkeynote.pdf
“Brewer's CAP Theorem” posted by Julian Browne, January 11,“Brewer's CAP Theorem” posted by Julian Browne, January 11,2009.2009. http://www.julianbrowne.com/article/viewer/brewershttp://www.julianbrowne.com/article/viewer/brewers--capcap--theoremtheorem
“How to write a CV” Geek & Poke Cartoon“How to write a CV” Geek & Poke Cartoonhttp://geekandpoke.typepad.com/geekandpoke/2011/01/nosqlhttp://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html.html
13 May 2011 Metadata Open Forum 38
Web ReferencesWeb References “Exploring“Exploring CouchDBCouchDB: A document: A document--oriented database for Weboriented database for Web
applications”, Joe Lennon, Software developer, Coreapplications”, Joe Lennon, Software developer, CoreInternational.International.http://www.ibm.com/developerworks/opensource/library/oshttp://www.ibm.com/developerworks/opensource/library/os--couchdb/index.htmlcouchdb/index.html
“Graph Databases, NOSQL and Neo4j” Posted by Peter“Graph Databases, NOSQL and Neo4j” Posted by PeterNeubauerNeubauer on May 12, 2010on May 12, 2010 at:at:http://www.infoq.com/articles/graphhttp://www.infoq.com/articles/graph--nosqlnosql--neo4jneo4j
“Cassandra“Cassandra vsvs MongoDBMongoDB vsvs CouchDBCouchDB vsvs RedisRedis vsvs RiakRiak vsvsHBaseHBase comparison”,comparison”, KristófKristóf KovácsKovács..http://kkovacs.eu/cassandrahttp://kkovacs.eu/cassandra--vsvs--mongodbmongodb--vsvs--couchdbcouchdb--vsvs--redisredis
“Distinguishing Two Major Types of Column“Distinguishing Two Major Types of Column--Stores” Posted byStores” Posted byDanielDaniel AbadiAbadi onMarchonMarch 29, 201029, 2010http://dbmsmusings.blogspot.com/2010/03/distinguishinghttp://dbmsmusings.blogspot.com/2010/03/distinguishing--twotwo--majormajor--typestypes--of_29.htmlof_29.html
13 May 2011 Metadata Open Forum 39
Web ReferencesWeb References
““MapReduceMapReduce: Simplified Data Processing on Large: Simplified Data Processing on Large Clusters”,Clusters”,JeffreyJeffrey Dean and SanjayDean and Sanjay GhemawatGhemawat, December 2004., December 2004.http://http://labs.google.com/papers/mapreduce.htmllabs.google.com/papers/mapreduce.html
“Scalable SQL”, ACM Queue, Michael Rys, April 19, 2011“Scalable SQL”, ACM Queue, Michael Rys, April 19, 2011http://queue.acm.org/detail.cfm?id=1971597http://queue.acm.org/detail.cfm?id=1971597
“a practical guide to“a practical guide to noSQLnoSQL”, Posted by Denise Miura on March”, Posted by Denise Miura on March17, 2011 at17, 2011 at http://blogs.marklogic.com/2011/03/17/ahttp://blogs.marklogic.com/2011/03/17/a--practicalpractical--guideguide--toto--nosql/nosql/
13 May 2011 Metadata Open Forum 40
BooksBooks
““CouchDBCouchDB The Definitive GuideThe Definitive Guide”, J. Chris Anderson, Jan”, J. Chris Anderson, Jan LehnardtLehnardtand Noah Slater. O’Reilly Media Inc.,and Noah Slater. O’Reilly Media Inc., SebastopoolSebastopool, CA, USA., CA, USA.20102010
““HadoopHadoop The Definitive GuideThe Definitive Guide”, Tom White.”, Tom White. O’Reilly Media Inc.,O’Reilly Media Inc.,SebastopoolSebastopool, CA, USA., CA, USA. 20112011
““MongoDBMongoDB The Definitive GuideThe Definitive Guide”, Kristina”, Kristina ChodorowChodorow andandMichaelMichael DirolfDirolf.. O’Reilly Media Inc.,O’Reilly Media Inc., SebastopoolSebastopool, CA, USA., CA, USA.20102010
13 May 2011 Metadata Open Forum 41