View
30
Download
0
Category
Preview:
Citation preview
NoSQLDatabases
VincentLeroy
1
Database
• Large-scaledataprocessing– First2classes:Hadoop,Spark– PerformsomecomputaCon/transformaConoverafulldataset
– Processalldata• SelecCvequery– Accessaspecificpartofthedataset– Manipulateonlydataneeded(1recordamongmillions)àDatabasesystem
2
Loaddata
Writeresults
Writeresults
Serve
queries
Processing/DatabaseLink
3
Database
BatchJob(Hadoop,Spark)
StreamJob(Spark,Storm)
ApplicaCon1 ApplicaCon2 ApplicaCon3
e.g.senCmentanalysis
e.g.TwiSertrendspage
Insert
records
Differenttypesofdatabases
• SofarweusedHDFS– Afilesystemcanbeseenasaverybasicdatabase– Directories/filestoorganizedata– Verysimplequeries(filesystempath)– Verygoodscalability,faulttolerance…
• Otherendofthespectrum:RelaConalDatabases– SQLquerylanguage,veryexpressive– Limitedscalability(generally1server)
4
Size/Complexity
5Size
Complexity
GraphDB
RelaConalDB Document
DBColumnDB
Key/ValueDB
Filesystem
TheNoSQLJungle
6
Goaloftheseslides
• PresentanoverviewoftheNoSQLlandscape– Trade-offinchoosingasoluCon– Theoremsandprinciples
• NotamanualtolearnspecificDBs– Toomanyofthem– Notthatcomplicated(especiallyK/Vstores)– FocusonNeo4jgraphDBinlabwork
7
RelaConalDatabases:SQL
• SQLlanguageborn1974– SCllusedbymostdataprocessingsystems(includingSpark)
à Learnit!Don’tbeavicCmoftheNoSQLhype!
8
RelaConalDatabasesmodel• Dataorganizedastables
– Row=record– Column=aSribute
• RelaConsbetweentables– Integrityconstraints
9
SelectCtlefromcoursesnaturaljointakes_coursesgroupbyClassIDhavingcount(*)>10
ACIDproperCes• Atomicity
– TransacConareallornothing(e.g.whenaddingabi-direcConalfriendshiprelaCon,it’saddedbothwaysornotatall)
• Consistency– OnlyvaliddatawriSen(e.g.cannotsayastudenttakesacoursethatisnotinthecoursestable)
• IsolaCon– WhenmulCpletransacConsexecutesimultaneously,theyappearasiftheywereexecutedsequenCally(akaserializability)
• Durability– WhendatahasbeenwriSenandvalidated,itispermanent(i.e.nodataloss,eveninthecaseofsomefailures)
10
àEasylifeforthedeveloper
WhyNoSQLthen?• WhatdoesNoSQLmean?
– NoSQL– NewSQL– NotonlySQL…
• SQLstrongproperCeslimititsabilitytoscaletoverylargedatasets– RelaxsomeproperCestodealwithlargerdatasets(ACID)– Butatwhatcost?
• SQLisverystructured(eachrecordhasthesamecolumns…),Webdataooenisnot– Semi-structureddata– Unstructureddata– Graphdata
11
CAP
• Consistency– WhenmulCpleoperaConsexecutesimultaneously,itappearsasiftheywereexecutedoneaoertheother(AofACID)
• Availability– Everyrequestreceivedbyanonfailednodemustbeanswered
• ParCContolerance– Systemmustrespondcorrectlyevenifnetworkfails
12
CAPtheorem
• Impossibletohave3simultaneously– ChooseCA,CP,orAP– Inacentralizedsystem,noneedforP• RelaConaldatabaseshaveCA
– Inadistributedsystem,youcannotignoreP• DistributeddatabaseschooseCPorAP
13
CAPintuiCon
14
A:2
B:5
A:3
B:6
A:3
ParCCon
Client1
Client2
2soluCons:• RefusetoanswerincaseofparCCon• Answerbutriskinconsistencies
NoSQLandCAP
15
Weakerconsistencymodels• Eventualconsistency
– WhenthereisnoparCCon,DBisconsistent– IncaseofparCCon,DBcanreturnstaledata– OnceparCConisgone,thereisaCmelimitonhowlongittakesforconsistencytoreturn
• Differentlevelsofconsistency(consistency/costtrade-off)– Causalconsistency– Read-your-writesconsistency– Sessionconsistency– Monotonicreadconsistency– MonotonicwriteconsistencyàAgain,manychoices,somanydifferentsystems
16
Vectorclocks&conflictdetecCon
17
Vectorclocks&conflictdetecCon
18
Vectorclocks&conflictdetecCon
19
Vectorclocks&conflictdetecCon
20
Vectorclocks&conflictdetecCon
21
Vectorclocks&conflictdetecCon
22
Vectorclocks&conflictdetecCon
23
Vectorclocks&conflictdetecCon
24
Vectorclocks&conflictdetecCon
25
Key/Valuestore
• 2basicoperaCons,similartotheHashMapdatastructure– Put(K,V)– Get(K)
• OoenusedforcachinginformaConinmemory– Facebookusesthemalot
26
Column/TabularDB
• Dataorganizedasrowswithaprimarykey– Flexibleformat,eachrowcanhavedifferentfieldsinacolumnfamily
– ReliesonHDFSforfaulttolerance
27
DocumentDB
• Datastoredasdocuments(ooenJSON)• RicherthanK/Vstores– Insert– Delete– Update– Find– AggregaConfuncCons(Map,Reduce…)– Indexes
28
DocumentDB
29
DocumentDB
30
GraphDB
• Representdataasgraphs– Nodes/relaConshipswithproperCesasK/Vpairs
31
GraphDB:Neo4j
• Richdataformat– QuerylanguageaspaSernmatching– Limitedscalability• ReplicaContoscalereads,writesneedtobedonetoeveryreplica
32
Recommended