39
(Distributed) (Structured) Storage Systems Mark Feltner

(Distributed) (Structured) Storage Systems

  • Upload
    lola

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Mark Feltner. (Distributed) (Structured) Storage Systems. Big Data. 2.5 Petabytes/day: Wal-Mart's transaction database 40 Terabytes/second: CERN 1 Terabyte/day: NYSE Trading data 10 billion: Facebook photos. Overview. Theory Algorithms Implementations & Technology. - PowerPoint PPT Presentation

Citation preview

Page 1: (Distributed) (Structured) Storage Systems

(Distributed) (Structured) Storage SystemsMark Feltner

Page 2: (Distributed) (Structured) Storage Systems
Page 3: (Distributed) (Structured) Storage Systems

Big Data

2.5 Petabytes/day: Wal-Mart's transaction database

40 Terabytes/second: CERN 1 Terabyte/day: NYSE Trading data 10 billion: Facebook photos

Page 4: (Distributed) (Structured) Storage Systems

Overview

Theory Algorithms Implementations & Technology

Page 5: (Distributed) (Structured) Storage Systems

Relational databases

Page 6: (Distributed) (Structured) Storage Systems

ACID

Page 7: (Distributed) (Structured) Storage Systems

Atomicty

All-or-nothing

Page 8: (Distributed) (Structured) Storage Systems

Consistency

Data is always in a valid state

Page 9: (Distributed) (Structured) Storage Systems

Isolation

Serially executed transactions result in same state as concurrent transactions

Page 10: (Distributed) (Structured) Storage Systems

Durability

COMMIT means transaction is permanent across all clients

Page 11: (Distributed) (Structured) Storage Systems

Non-relational databases

Page 12: (Distributed) (Structured) Storage Systems

Key-value

Page 13: (Distributed) (Structured) Storage Systems

Document-oriented

Page 14: (Distributed) (Structured) Storage Systems

Graphs

Page 15: (Distributed) (Structured) Storage Systems

Distributed Systems

Page 16: (Distributed) (Structured) Storage Systems

Fallacies of Distributed Computing

1. The network is reliable.2. Latency is zero.3. Bandwidth is infinite.4. The network is secure.5. Topology doesn't change.6. There is one administrator.7. Transport cost is zero.8. The network is homogeneous.

Page 17: (Distributed) (Structured) Storage Systems

CAP Theorem

Page 18: (Distributed) (Structured) Storage Systems

Consistency

Eventual consistency

“…there must exist a total order on all operations such that eachoperation looks as if it were completed at a single instant. This is equivalentto requiring requests of the distributed shared memory to act as if they wereexecuting on a single node, responding to operations one at a time.” (Gilbert, Lynch)

Page 19: (Distributed) (Structured) Storage Systems

Availability“For a distributed system to be continuously available, every request received by a non-failing node in the system must result in a response” (Gilbert, Lynch)

Page 20: (Distributed) (Structured) Storage Systems

Partition Tolerance“In order to model partition tolerance, the network will be allowed to lose arbitrarily many messages sent from one node to another. When a network is partitioned, all messages sent from nodes in one component of the partition to nodes in another component are lost”(Gilbert, Lynch)

Page 21: (Distributed) (Structured) Storage Systems

(CA || CP || AP) ?

Page 22: (Distributed) (Structured) Storage Systems

Algorithms

Page 23: (Distributed) (Structured) Storage Systems

Row- versus column- orientationTitle Artist Album Year

Breaking the Law Judas Priest British Steel 1980

Aces High Iron Maiden Powerslave 1984

Kickstart My Heat Motley Crue Dr. Feelgood 1989

Raining Blood Slayer Reign in Blood 1986

I Wanna Be Somebody W.A.S.P. W.A.S.P. 1984

Page 24: (Distributed) (Structured) Storage Systems

Row-orientedData Storage Model:Breaking the LawJudas PriestBritish Steel1980Aces HighIron MaidenPowerslave1984Kickstart My heartMotley CrueDr. Feelgood1989Raining BloodSlayerReign in Blood1986I Wanna Be SomebodyW.A.S.P.W.A.S.P.1984

Page 25: (Distributed) (Structured) Storage Systems

Column-orientedData Storage Model:Breaking the LawAces HighKickstart My HeartRaining BloodI Wanna Be SomebodyJudas PriestIron MadienMotley CrueSlayerW.A.S.P.British SteelPowerslaveDr. FeelgoodReign in BloodW.A.S.P.19801984198919861984

Page 26: (Distributed) (Structured) Storage Systems

Comparison of Row- vs. Column-Orientation

CREATE SELECT MAX, MIN, SUM, AVG, …

Page 27: (Distributed) (Structured) Storage Systems

MapReduce

Page 28: (Distributed) (Structured) Storage Systems

Technology

Page 29: (Distributed) (Structured) Storage Systems

Implementations

Page 30: (Distributed) (Structured) Storage Systems

BigTable

High performance MapReduce Powers: Google Reader, Maps,

Book Search, YouTube, Gmail, …

Page 31: (Distributed) (Structured) Storage Systems

Hadoop

MapReduce Yahoo! World Record Holder!

Page 32: (Distributed) (Structured) Storage Systems

Cassandra

Key-value MapReduce Facebook Eventual consistency Scalable, fault-tolerant

Page 33: (Distributed) (Structured) Storage Systems

MySQL

Relational LAMP

Page 34: (Distributed) (Structured) Storage Systems

Redis

Key-value What is lacks in durability, it makes

up for in speed / simplicity.

Page 35: (Distributed) (Structured) Storage Systems

HBase

MapReduce Hadoop + HDFS Java and REST API Column-oriented Excellent fault-tolerance Replication Streaming

Page 36: (Distributed) (Structured) Storage Systems

Neo4J

Graph Database

Page 37: (Distributed) (Structured) Storage Systems

MongoDB

Document-oriented

Page 38: (Distributed) (Structured) Storage Systems

Conclusions

Pick the right tool for the job.

Page 39: (Distributed) (Structured) Storage Systems