Mark Feltner. (Distributed) (Structured) Storage Systems. Big Data. 2.5 Petabytes/day: Wal-Mart's transaction database 40 Terabytes/second: CERN 1 Terabyte/day: NYSE Trading data 10 billion: Facebook photos. Overview. Theory Algorithms Implementations & Technology. - PowerPoint PPT Presentation
Distributed, Document-Store Databases
(Distributed) (Structured) Storage SystemsMark Feltner
Big Data2.5 Petabytes/day: Wal-Mart's transaction database40 Terabytes/second: CERN1 Terabyte/day: NYSE Trading data10 billion: Facebook photos
OverviewTheoryAlgorithmsImplementations & TechnologyRelational databases
ACIDAtomictyAll-or-nothingConsistencyData is always in a valid stateIsolationSerially executed transactions result in same state as concurrent transactionsDurabilityCOMMIT means transaction is permanent across all clientsNon-relational databases
Fallacies of Distributed ComputingThenetworkis reliable.Latencyis zero.Bandwidthis infinite.The network issecure.Topologydoesn't change.There is oneadministrator.Transport cost is zero.The network is homogeneous.
ConsistencyEventual consistencythere must exist a total order on all operations such that eachoperation looks as if it were completed at a single instant. This is equivalentto requiring requests of the distributed shared memory to act as if they wereexecuting on a single node, responding to operations one at a time. (Gilbert, Lynch)AvailabilityFor a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response (Gilbert, Lynch)Partition ToleranceIn order to model partition tolerance, the network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost(Gilbert, Lynch)(CA || CP || AP) ?
AlgorithmsRow- versus column- orientationTitleArtistAlbumYearBreaking the LawJudas PriestBritish Steel1980Aces HighIron MaidenPowerslave1984Kickstart My HeatMotley CrueDr. Feelgood1989Raining BloodSlayerReign in Blood1986I Wanna Be SomebodyW.A.S.P.W.A.S.P.1984Row-orientedData Storage Model:Breaking the LawJudas PriestBritish Steel1980Aces HighIron MaidenPowerslave1984Kickstart My heartMotley CrueDr. Feelgood1989Raining BloodSlayerReign in Blood1986I Wanna Be SomebodyW.A.S.P.W.A.S.P.1984Column-orientedData Storage Model:Breaking the LawAces HighKickstart My HeartRaining BloodI Wanna Be SomebodyJudas PriestIron MadienMotley CrueSlayerW.A.S.P.British SteelPowerslaveDr. FeelgoodReign in BloodW.A.S.P.19801984198919861984Comparison of Row- vs. Column-OrientationCREATESELECTMAX, MIN, SUM, AVG, MapReduce
TechnologyImplementationsBigTableHigh performanceMapReducePowers: Google Reader, Maps,Book Search, YouTube, Gmail,
HadoopMapReduceYahoo!World Record Holder!
CassandraKey-valueMapReduceFacebookEventual consistencyScalable, fault-tolerant
RedisKey-valueWhat is lacks in durability, it makes up for in speed / simplicity.
HBaseMapReduceHadoop + HDFSJava and REST APIColumn-orientedExcellent fault-toleranceReplicationStreaming
ConclusionsPick the right tool for the job.