63
Distributed Systems dumpFS Distributed Systems Carnegie Mellon University Project for Distributed Systems dumpFS A Distributed Storage Solution Bruno Garrancho Eugénio Pinto Nuno Loureiro 1 Tuesday, December 21, 2010

DumpFS - A Distributed Storage Solution

Embed Size (px)

DESCRIPTION

DumpFS - A Distributed Storage SolutionProject Page:http://softwarelivre.sapo.pt/dumpfs

Citation preview

Page 1: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Carnegie Mellon UniversityProject for Distributed Systems

dumpFSA Distributed Storage Solution

• Bruno Garrancho• Eugénio Pinto • Nuno Loureiro

1Tuesday, December 21, 2010

Page 2: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Prof. António Casimiro

• Prof. Bill Nace

Acknowledgements

•2Tuesday, December 21, 2010

Page 3: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 3Tuesday, December 21, 2010

Page 4: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Current demand for massive storage

• Commodity Hardware• Simple semantics of web context• Alternative solutions: too

generic, too complex, extra overhead, too expensive

• Not end user demand

Motivation

•4Tuesday, December 21, 2010

Page 5: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Availability

• Performance

• Scalability

Goals

•5Tuesday, December 21, 2010

Page 6: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Black box Storage

• API/Middleware for developers

• Web, Web & Web...

• Streams, Streams & Streams...

• WORM

How it works

•6Tuesday, December 21, 2010

Page 7: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 7

Architecture

Tuesday, December 21, 2010

Page 8: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 7

Architecture

Tuesday, December 21, 2010

Page 9: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 10: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End UserCerebrum(...)

Tuesday, December 21, 2010

Page 11: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End UserCerebrum(...)

Storage(...)

Tuesday, December 21, 2010

Page 12: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End UserCerebrum(...)

Storage(...)

Monitor

Tuesday, December 21, 2010

Page 13: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End UserCerebrum(...)

Storage(...)

Monitor

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Tuesday, December 21, 2010

Page 14: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End UserCerebrum(...)

Storage(...)

Monitor

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

API

Tuesday, December 21, 2010

Page 15: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End UserCerebrum(...)

Storage(...)

Monitor

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

API

Application

API

Tuesday, December 21, 2010

Page 16: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 8

Architecture

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End UserCerebrum(...)

Storage(...)

Monitor

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

API

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 17: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 9

Architecture - PUT

Tuesday, December 21, 2010

Page 18: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 9

Architecture - PUT

Tuesday, December 21, 2010

Page 19: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 9

Architecture - PUT

Tuesday, December 21, 2010

Page 20: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 9

Architecture - PUT

Tuesday, December 21, 2010

Page 21: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 9

Architecture - PUT

Tuesday, December 21, 2010

Page 22: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 9

Architecture - PUT

Tuesday, December 21, 2010

Page 23: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 10

Architecture - GET

Tuesday, December 21, 2010

Page 24: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 10

Architecture - GET

Tuesday, December 21, 2010

Page 25: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 10

Architecture - GET

Tuesday, December 21, 2010

Page 26: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 10

Architecture - GET

Tuesday, December 21, 2010

Page 27: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 10

Architecture - GET

Tuesday, December 21, 2010

Page 28: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 10

Architecture - GET

Tuesday, December 21, 2010

Page 29: DumpFS - A Distributed Storage Solution

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Distributed Systems

dumpFS

Distributed Systems 10

Architecture - GET

Tuesday, December 21, 2010

Page 30: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Availability

• Performance

• Scalability

Revisiting the goals

•11Tuesday, December 21, 2010

Page 31: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Availability

• Performance

• Scalability

Revisiting the goals

•11

How do we

provide these properties?

Tuesday, December 21, 2010

Page 32: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Heartbeat (between all nodes)Detection of Failures

• Distributed System State (local node state sent to cerebrums)

CPU LoadDisk Space

Monitoring

•12Tuesday, December 21, 2010

Page 33: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 13

Distributed System State

Cerebrum

Monitor

Server

HTTP API

Cerebrum

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

Tuesday, December 21, 2010

Page 34: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 13

Distributed System State

Cerebrum

Monitor

Server

HTTP API

Cerebrum

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

5 secs {load; disk}

Tuesday, December 21, 2010

Page 35: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 13

Distributed System State

Cerebrum

Monitor

Server

HTTP API

Cerebrum

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

5 secs {load; disk}

Tuesday, December 21, 2010

Page 36: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 13

Distributed System State

Cerebrum

Monitor

Server

HTTP API

Cerebrum

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

5 secs {load; disk}

Tuesday, December 21, 2010

Page 37: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 13

Distributed System State

Cerebrum

Monitor

Server

HTTP API

Cerebrum

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

Storage

Monitor

Server

HTTP API

5 secs {load; disk}

0255075

100

0255075

100

Tuesday, December 21, 2010

Page 38: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Crash Failures & Broken Links

Heartbeat- Only online nodes are selected

Replicated Files

Replicated Components

Tolerance to failures

Availability

•14Tuesday, December 21, 2010

Page 39: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•15

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 40: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•15

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 41: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•15

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 42: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•15

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 43: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•15

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 44: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•15

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 45: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•15

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 46: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•16

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 47: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•16

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 48: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•16

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 49: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•16

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 50: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•16

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

Tuesday, December 21, 2010

Page 51: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•17

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

LB

Tuesday, December 21, 2010

Page 52: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•17

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

LB

Tuesday, December 21, 2010

Page 53: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•17

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

LB

Tuesday, December 21, 2010

Page 54: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•17

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

LB

Tuesday, December 21, 2010

Page 55: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

Tolerance to failures

•17

Cerebrum

Storage(...)

(...)

dumpFS

Monitor

Application

API

End User

End User

End User

End User

End User

LB

Tuesday, December 21, 2010

Page 56: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Cerebrums provide only localization

to the API, not data

• The primary storage node replicates

file in parallel while receiving data (PUT)

• Probabilistic weighted node selection

for PUT and GET operations

Performance

•18Tuesday, December 21, 2010

Page 57: DumpFS - A Distributed Storage Solution

19Distributed Systems

dumpFS

Distributed Systems

Probabilistic weighted node selection• PUT uses Available Disk Space

• GET uses CPU Load

Performance

16

Node AAvl. Disk space: 57%

Node BAvl. Disk space: 47%

Should node A always be selected in PUT operations?

Tuesday, December 21, 2010

Page 58: DumpFS - A Distributed Storage Solution

20Distributed Systems

dumpFS

Distributed Systems

Probabilistic weighted node selection

Performance

17

Node AAvl. Disk space: 57%

Node BAvl. Disk space: 47%

Rand(A) = Rand(1..57)Rand(B) = Rand(1..47)

Rand(B) can be greater than Rand(A)But the probability that it happens is < 50%

Use Rand(Node) instead of the direct value!

Tuesday, December 21, 2010

Page 59: DumpFS - A Distributed Storage Solution

21Distributed Systems

dumpFS

Distributed Systems

DumpFS allows:• Redundant DB

• Partitioning for “infinite” growth

• Straightforward storage addition

• Clusters of Clusters

Scalability

18Tuesday, December 21, 2010

Page 60: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• REST / HTTP

• Erlang !!! - Server

• .Net - Client API

Technology

•22Tuesday, December 21, 2010

Page 61: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Our graphic design skills

• HDD I/O

• Time

What didn’t work

•23Tuesday, December 21, 2010

Page 62: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems

• Delete & Garbage collection

• Read Operations at arbitrary

locations in files

Future work

•24Tuesday, December 21, 2010

Page 63: DumpFS - A Distributed Storage Solution

Distributed Systems

dumpFS

Distributed Systems 25

The END!Questions?

Tuesday, December 21, 2010