34
Open the New Door Yosuke Hara Oct 26, 2013 (rev 2.2) The Lion of Storage Systems 1

RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Embed Size (px)

DESCRIPTION

Rakuten Technology Conference 2013 "LeoFS - Open the New Door" Yosuke Hara (Rakuten)

Citation preview

Page 1: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Open the New Door

Yosuke Hara Oct 26, 2013 (rev 2.2)

The Lion of Storage Systems

1

Page 2: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Started OSS-project on July 4, 2012www.leofs.org

LeoFS is "Unstructured Big Data Storage for the Web"and a highly available, distributed, eventually consistentstorage system.

Organizations can use LeoFS to store lots of dataefficently, safely and inexpensively.

2

Page 3: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Motivation

3

Page 4: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

1. High Costs (Initial Costs, Running Costs)2. Possibility of "SPOF"3. NOT Easily Scale

Storage Expansion is difficult during periods of increasing data

Expensive Storage Problems:

Motivation

?Get Away From Using "Expensive H/W Based Storages"

As of 2010

4

Page 5: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

REST-API / AWS S3-API

The Lion of Storage Systems

HIGH Availability

HIGH Cost Performance Ratio

HIGH Scalability

LeoFS Non Stop

Velocity: Low LatencyMinimum Resources

Volume: Petabyte / ExabyteVariety: Photo, Movie, Unstructured-data

3 Vs in 3 HIGHs

5

Page 6: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Overview

6

Page 7: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

metadata Object Store

Storage Engine/Router

metadata Object Store

Storage Engine/Router

LeoFS-Manager

REST over HTTP (80/443) RPC

(4369)

Request fromWeb Applications/ Browsers

w/REST-API / S3-API

metadata Object Store

Storage Engine/Router

Load Balancer

Monitor

GUI Console

(4000,4010,4020)

(10020, 10021)

RPC (4369)

No Master No SPOF

LeoFS Overview

LeoFS-Gateway

LeoFS-Storage

7

Page 8: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Gateway

8

Page 9: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

LeoFS Overview - Gateway

Stateless Proxy + Object Cache

REST-API / S3-API

Use Consistent Hashingfor decision of a primary node

[ Memory Cache, Disc Cache ]

Storage C

lusterG

ateway(s)

Clients

Handle HTTP Request and ResponseBuilt in "Object Cache Mechanism"

Storage Cluster

Choosing Replica Target Node(s)

RING2 ^ 128 (MD5)

# of replicas = 3

KEY = “bucket/leofs.key”Hash = md5(Filename)

Secondary-1

Secondary-2

Primary Node

9

Page 10: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Storage

10

Page 11: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Storage (S

torage Cluster)

Gatew

ay

Automatically Replicatean Object and a Metadata to Remote Node(s)

LeoFS Overview - Storage

Use "Consistent Hashing"for Replication

in the Storage Cluster

Choosing Replica Target Node(s)

RING2 ^ 128 (MD5)

# of replicas = 3

KEY = “bucket/leofs.key”Hash = md5(Filename)

Secondary-1

Secondary-2

Primary Node

"P2P"

11

Page 12: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Request From Gateway

LeoFS Overview - Storage

...

LeoFS Storage

Metadata : Keeps an in-memory index of all dataObject Container : Manages "Log Structured File"

ReplicatorRepairer w/Queue

...

Storage Engine Workers

Storage E

ngine, Metadata + O

bject Container

Gatew

ay

Storage Engine consits of "Object Storage" and "Metadata Storage"Built in "Replicator", "Recoverer" w/Queue for the Eventual Consistency

12

Page 13: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

LeoFS Storage Engine - Retrieve an object from the storage

< META DATA >IDFilenameOffsetSizeChecksum

Header

File

Footer

< META DATA >IdFilenameOffset, SizeChecksum (MD5)Version#

Storage Engine Worker

Object Container Metadata Storage

Storage Engine Worker

13

Page 14: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

LeoFS Storage Engine - Retrieve an object from the storage

< META DATA >IDFilenameOffsetSizeChecksum

Header

File

Footer

< META DATA >IdFilenameOffset, SizeChecksum (MD5)Version#

Object Container Metadata Storage

Storage Engine Worker

Insert a metadata

Append an objectinto the object container

Storage Engine Worker

14

Page 15: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

LeoFS Storage Engine - Remove unnecessary objects from the storage

Compact

Old Object Container/Metadata

Storage Engine Worker

New Object Container/Metadata

Storage Engine Worker

15

Page 16: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Offset Version Time-stamp{VNodeId, Key}

<Metadata>

Checksum

for Sync

KeySize CustomMeta Size File Size

for Retrieve an File (Object)

Footer (8B)

Checksum KeySize DataSize Offset Version Time-stamp

{VNodeId,Key} User-Meta Footer

Header (Metadata - Fixed length) Body (Variable Length)

User-MetaSize

ActualFile

<Needle>

Supe

r-bl

ock

Nee

dle-

1

Nee

dle-

2

Nee

dle-

3

<Object Container>

Nee

dle-

4

Nee

dle-

5

LeoFS Overview - Storage - Data Structure/Relationship an object

16

Page 17: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

To Equalize Disk Usage of Every Storage NodeTo Realize High I/O efficiency and High Availability

LeoFS Overview - Storage - Large Object Support

chunk-0

chunk-1

chunk-2

chunk-3

An Original Object’s Metadata

Original Object NameOriginal Object Size# of Chunks

Storage ClusterGatewayClient(s)

[ WRITE Operation ]

Chunked Objects

Every chunked object and metadata are replicated

in the cluster

17

Page 18: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Manager

18

Page 19: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Storage Cluster

LeoFS Overview - Manager

Monitor

Operate

RING, Node State

status, suspend,resume, detach, whereis, ...

Gateway(s)

Storage C

lusterG

ateway(s)

Manager(s)

Operate LeoFS - Gateway and Storage Cluster"RING Monitor" and "NodeState Monitor"

19

Page 20: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

New Features

20

Page 21: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

"Insight"

21

Page 22: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Give Insight into the State of LeoFS 1. To control requests from Clients to LeoFS2. To check and see "Traffic info" and "State of Every Node"

for Keeping Availability

New Features - LeoInsight (v1.0)

22

Page 23: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Storage Cluster

ManagerGateway

The Lion of Storage Systems

TimeSeriesDB (Savannah)

Persistent calculated statistics-data

REST-API (JSON)

Operate LeoFS

Notifier

Distributed Queue (ElkDB)

Traffic-Info from Gateway Consume MSG

Retrieve

Proves of a Node from Gateway/Storage/Manager

Notify

New Features - LeoInsight (v1.0)

23

Page 24: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

More Scalability&

More Availability

24

Page 25: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

TokyoEurope

US

New Features - Multi Data Center Data Replication (v1.0)

HIGH-ScalabilityHIGH-Availability

Easy Operation for Admins+

NO SPOFNO Performance Degration

Singapore

25

Page 26: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

DC-3DC-2

v1.0 - Multi Data Center Data Replication

Storage cluster

Manager cluster

Client

DC-1

Monitor and Replicate each “RING” and “System Configuration”

"Leo Storage Platform"

[replicas:1] [replicas:1]

Request tothe Target Region

Application(s)

[ 3 Regions & 5 Replicas ]Method of MDC-Replication:Async: Bulked TransferSync+Tran: Consensus Algorithm

DC-1 Configuration:- Method of Replication:- Consistency Level: - local-quorum:[N=3, W=2, R=1, D=2] - # of target DC(s):2 - # of replicas a DC:1 >> Total of Replicas: 5

[replicas:3]

26

Page 27: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

1) 3 replicas are written in "Local Region"

DC-3DC-2

v1.0 - Multi Data Center Data Replication

Storage cluster

Manager cluster

Client

DC-1

Monitor and Replicate each “RING” and “System Configuration”

"Leo Storage Platform"

[replicas:1] [replicas:1]

Request tothe Target Region

Application(s)

[ 3 Regions & 5 Replicas ]Method of MDC-Replication:Async: Bulked TransferSync+Tran: Consensus Algorithm

DC-1 Configuration:- Method of Replication:- Consistency Level: - local-quorum:[N=3, W=2, R=1, D=2] - # of target DC(s):2 - # of replicas a DC:1 >> Total of Replicas: 5

[replicas:3]

27

Page 28: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

2) Sync (or Async) Rplicaion to Other Region(s)

DC-3DC-2

v1.0 - Multi Data Center Data Replication

Storage cluster

Manager cluster

Client

DC-1

Monitor and Replicate each “RING” and “System Configuration”

"Leo Storage Platform"

Request tothe Target Region

Application(s)

[ 3 Regions & 5 Replicas ]

[replicas:3]

Leader Follower

DC1.node_0 - PrimaryDC1.node_1DC1.node_2DC2.node_3DC3.node_4

Local-follower

Remote-follower

[replicas:1] [replicas:1]

28

Page 29: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

v1.0 - Multi Data Center Data Replication

Storage cluster

Manager cluster

Client

"Leo Storage Platform"

DC-3

US

DC-2

Singapore

DC-1

Tokyo

Monitor and Replicate each “RING” and “System Configuration”

[replicas:3] [replicas:1] [replicas:1]

DC-4

Europe

Request tothe Target Region

Application(s)

[ 3 Regions & 5 Replicas ]

3) Replication for Geographical Optimization

Local Region Remote-1 Remote-2Tokyo Singapore US

Singapore Tokyo Europe

Europe US Singapore

US Europe Tokyo

29

Page 30: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

"Center"

30

Page 31: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Web-based administrative console for inspecting and manipulatingLeoFS Storage Clusters and LeoFS Gateway

Operate LeoFS

New Features - LeoCenter

Admin Tools

Access Log Analysis

31

Page 32: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Access Log Analysis (β)

32

Page 33: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �� � � � � � � � � � � � � � � � � � � �

REST-API / AWS S3-API

The Lion of Storage Systems

HIGH Availability

HIGH Cost Performance Ratio

HIGH Scalability

LeoFS Non Stop

Velocity: Low LatencyMinimum Resources

Volume: Petabyte / ExabyteVariety: Photo, Movie, Unstructured-data

3 Vs in 3 HIGHs

33

Page 34: RakutenTechConf2013] [D-3_1] LeoFS - Open the New Door

Set Sail for “Cloud Storage”Website: www.leofs.orgTwitter: @LeoFastStorageFacebook: www.facebook.com/org.leofs

34