NoSQL Databases - imaglig-membres.imag.fr/.../uploads/sites/125/2017/11/NoSQL.pdfDatabases – SQL...

Preview:

Citation preview

NoSQLDatabases

VincentLeroy

1

Database

•  Large-scaledataprocessing–  First2classes:Hadoop,Spark–  PerformsomecomputaCon/transformaConoverafulldataset

–  Processalldata•  SelecCvequery– Accessaspecificpartofthedataset– Manipulateonlydataneeded(1recordamongmillions)àDatabasesystem

2

Loaddata

Writeresults

Writeresults

Serve

queries

Processing/DatabaseLink

3

Database

BatchJob(Hadoop,Spark)

StreamJob(Spark,Storm)

ApplicaCon1 ApplicaCon2 ApplicaCon3

e.g.senCmentanalysis

e.g.TwiSertrendspage

Insert

records

Differenttypesofdatabases

•  SofarweusedHDFS– Afilesystemcanbeseenasaverybasicdatabase– Directories/filestoorganizedata– Verysimplequeries(filesystempath)– Verygoodscalability,faulttolerance…

•  Otherendofthespectrum:RelaConalDatabases– SQLquerylanguage,veryexpressive– Limitedscalability(generally1server)

4

Size/Complexity

5Size

Complexity

GraphDB

RelaConalDB Document

DBColumnDB

Key/ValueDB

Filesystem

TheNoSQLJungle

6

Goaloftheseslides

•  PresentanoverviewoftheNoSQLlandscape– Trade-offinchoosingasoluCon– Theoremsandprinciples

•  NotamanualtolearnspecificDBs– Toomanyofthem– Notthatcomplicated(especiallyK/Vstores)– FocusonNeo4jgraphDBinlabwork

7

RelaConalDatabases:SQL

•  SQLlanguageborn1974– SCllusedbymostdataprocessingsystems(includingSpark)

à Learnit!Don’tbeavicCmoftheNoSQLhype!

8

RelaConalDatabasesmodel•  Dataorganizedastables

–  Row=record–  Column=aSribute

•  RelaConsbetweentables–  Integrityconstraints

9

SelectCtlefromcoursesnaturaljointakes_coursesgroupbyClassIDhavingcount(*)>10

ACIDproperCes•  Atomicity

–  TransacConareallornothing(e.g.whenaddingabi-direcConalfriendshiprelaCon,it’saddedbothwaysornotatall)

•  Consistency–  OnlyvaliddatawriSen(e.g.cannotsayastudenttakesacoursethatisnotinthecoursestable)

•  IsolaCon–  WhenmulCpletransacConsexecutesimultaneously,theyappearasiftheywereexecutedsequenCally(akaserializability)

•  Durability–  WhendatahasbeenwriSenandvalidated,itispermanent(i.e.nodataloss,eveninthecaseofsomefailures)

10

àEasylifeforthedeveloper

WhyNoSQLthen?•  WhatdoesNoSQLmean?

–  NoSQL–  NewSQL–  NotonlySQL…

•  SQLstrongproperCeslimititsabilitytoscaletoverylargedatasets–  RelaxsomeproperCestodealwithlargerdatasets(ACID)–  Butatwhatcost?

•  SQLisverystructured(eachrecordhasthesamecolumns…),Webdataooenisnot–  Semi-structureddata–  Unstructureddata–  Graphdata

11

CAP

•  Consistency– WhenmulCpleoperaConsexecutesimultaneously,itappearsasiftheywereexecutedoneaoertheother(AofACID)

•  Availability–  Everyrequestreceivedbyanonfailednodemustbeanswered

•  ParCContolerance–  Systemmustrespondcorrectlyevenifnetworkfails

12

CAPtheorem

•  Impossibletohave3simultaneously– ChooseCA,CP,orAP–  Inacentralizedsystem,noneedforP•  RelaConaldatabaseshaveCA

–  Inadistributedsystem,youcannotignoreP•  DistributeddatabaseschooseCPorAP

13

CAPintuiCon

14

A:2

B:5

A:3

B:6

A:3

ParCCon

Client1

Client2

2soluCons:•  RefusetoanswerincaseofparCCon•  Answerbutriskinconsistencies

NoSQLandCAP

15

Weakerconsistencymodels•  Eventualconsistency

–  WhenthereisnoparCCon,DBisconsistent–  IncaseofparCCon,DBcanreturnstaledata–  OnceparCConisgone,thereisaCmelimitonhowlongittakesforconsistencytoreturn

•  Differentlevelsofconsistency(consistency/costtrade-off)–  Causalconsistency–  Read-your-writesconsistency–  Sessionconsistency–  Monotonicreadconsistency–  MonotonicwriteconsistencyàAgain,manychoices,somanydifferentsystems

16

Vectorclocks&conflictdetecCon

17

Vectorclocks&conflictdetecCon

18

Vectorclocks&conflictdetecCon

19

Vectorclocks&conflictdetecCon

20

Vectorclocks&conflictdetecCon

21

Vectorclocks&conflictdetecCon

22

Vectorclocks&conflictdetecCon

23

Vectorclocks&conflictdetecCon

24

Vectorclocks&conflictdetecCon

25

Key/Valuestore

•  2basicoperaCons,similartotheHashMapdatastructure– Put(K,V)– Get(K)

•  OoenusedforcachinginformaConinmemory– Facebookusesthemalot

26

Column/TabularDB

•  Dataorganizedasrowswithaprimarykey– Flexibleformat,eachrowcanhavedifferentfieldsinacolumnfamily

– ReliesonHDFSforfaulttolerance

27

DocumentDB

•  Datastoredasdocuments(ooenJSON)•  RicherthanK/Vstores–  Insert– Delete– Update– Find– AggregaConfuncCons(Map,Reduce…)–  Indexes

28

DocumentDB

29

DocumentDB

30

GraphDB

•  Representdataasgraphs– Nodes/relaConshipswithproperCesasK/Vpairs

31

GraphDB:Neo4j

•  Richdataformat– QuerylanguageaspaSernmatching– Limitedscalability•  ReplicaContoscalereads,writesneedtobedonetoeveryreplica

32

Recommended