(Distributed) (Structured) Storage Systems

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Mark Feltner. (Distributed) (Structured) Storage Systems. Big Data. 2.5 Petabytes/day: Wal-Mart's transaction database 40 Terabytes/second: CERN 1 Terabyte/day: NYSE Trading data 10 billion: Facebook photos. Overview. Theory Algorithms Implementations & Technology. - PowerPoint PPT Presentation

Transcript

Distributed, Document-Store Databases

(Distributed) (Structured) Storage SystemsMark Feltner

Big Data2.5 Petabytes/day: Wal-Mart's transaction database40 Terabytes/second: CERN1 Terabyte/day: NYSE Trading data10 billion: Facebook photos

OverviewTheoryAlgorithmsImplementations & TechnologyRelational databases

ACIDAtomictyAll-or-nothingConsistencyData is always in a valid stateIsolationSerially executed transactions result in same state as concurrent transactionsDurabilityCOMMIT means transaction is permanent across all clientsNon-relational databases

Key-value

Document-oriented

Graphs

Distributed Systems

Fallacies of Distributed ComputingThenetworkis reliable.Latencyis zero.Bandwidthis infinite.The network issecure.Topologydoesn't change.There is oneadministrator.Transport cost is zero.The network is homogeneous.

CAP Theorem

ConsistencyEventual consistencythere must exist a total order on all operations such that eachoperation looks as if it were completed at a single instant. This is equivalentto requiring requests of the distributed shared memory to act as if they wereexecuting on a single node, responding to operations one at a time. (Gilbert, Lynch)AvailabilityFor a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response (Gilbert, Lynch)Partition ToleranceIn order to model partition tolerance, the network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost(Gilbert, Lynch)(CA || CP || AP) ?

AlgorithmsRow- versus column- orientationTitleArtistAlbumYearBreaking the LawJudas PriestBritish Steel1980Aces HighIron MaidenPowerslave1984Kickstart My HeatMotley CrueDr. Feelgood1989Raining BloodSlayerReign in Blood1986I Wanna Be SomebodyW.A.S.P.W.A.S.P.1984Row-orientedData Storage Model:Breaking the LawJudas PriestBritish Steel1980Aces HighIron MaidenPowerslave1984Kickstart My heartMotley CrueDr. Feelgood1989Raining BloodSlayerReign in Blood1986I Wanna Be SomebodyW.A.S.P.W.A.S.P.1984Column-orientedData Storage Model:Breaking the LawAces HighKickstart My HeartRaining BloodI Wanna Be SomebodyJudas PriestIron MadienMotley CrueSlayerW.A.S.P.British SteelPowerslaveDr. FeelgoodReign in BloodW.A.S.P.19801984198919861984Comparison of Row- vs. Column-OrientationCREATESELECTMAX, MIN, SUM, AVG, MapReduce

TechnologyImplementationsBigTableHigh performanceMapReducePowers: Google Reader, Maps,Book Search, YouTube, Gmail,

HadoopMapReduceYahoo!World Record Holder!

CassandraKey-valueMapReduceFacebookEventual consistencyScalable, fault-tolerant

MySQLRelationalLAMP

RedisKey-valueWhat is lacks in durability, it makes up for in speed / simplicity.

HBaseMapReduceHadoop + HDFSJava and REST APIColumn-orientedExcellent fault-toleranceReplicationStreaming

Neo4JGraph Database

MongoDBDocument-oriented

ConclusionsPick the right tool for the job.