28
BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geoffrey M. Voelker University of California, San Diego Computer Science and Engineering Department February 16, 2012 Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 1 / 16

BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

BlueSky: A Cloud-Backed File System for the Enterprise

Michael Vrable Stefan Savage Geoffrey M. Voelker

University of California, San DiegoComputer Science and Engineering Department

February 16, 2012

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 1 / 16

Page 2: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Computing Services for the Enterprise

I Our work is focused primarily on small/medium-sized organizations

I These organizations run a number of computing services, such ase-mail and shared file systems

I Often brings significant cost:I Purchasing hardwareI Operating hardwareI Managing services

I Outsourcing these services to the cloud offers the possibility to lowercosts

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 2 / 16

Page 3: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

. . . Migrated to the Cloud

Some services are already migrating to the cloud. . .

Network file systems have not yet migrated, but still have potentialbenefits:

I File system size entirely elastic: simpler provisioning

I Cloud provides durability for file system data

I Hardware reliability less important

I Integration with cloud backup

We build and analyze a prototype system, BlueSky, to investigate how todo so

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 3 / 16

Page 4: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

. . . Migrated to the Cloud

Some services are already migrating to the cloud. . .

Network file systems have not yet migrated, but still have potentialbenefits:

I File system size entirely elastic: simpler provisioning

I Cloud provides durability for file system data

I Hardware reliability less important

I Integration with cloud backup

We build and analyze a prototype system, BlueSky, to investigate how todo so

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 3 / 16

Page 5: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Cloud Computing Offerings

Spectrum of service models:

I Software-as-a-Service: Complete integrated service from a provider

I Platform/Infrastructure-as-a-Service: Building blocks for customapplications

In both cases:

I Infrastructure moved within network

I Reduce/eliminate need for hardware maintenance

I Reduce need for ahead-of-time capacity planning

SaaS: Easy to set upPaaS/IaaS: More choice among service providers, potentially lower cost

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 4 / 16

Page 6: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Cloud Computing Offerings

Spectrum of service models:

I Software-as-a-Service: Complete integrated service from a provider

I Platform/Infrastructure-as-a-Service: Building blocks for customapplications

In both cases:

I Infrastructure moved within network

I Reduce/eliminate need for hardware maintenance

I Reduce need for ahead-of-time capacity planning

SaaS: Easy to set upPaaS/IaaS: More choice among service providers, potentially lower cost

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 4 / 16

Page 7: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Cloud Computing Offerings

Spectrum of service models:

I Software-as-a-Service: Complete integrated service from a provider

I Platform/Infrastructure-as-a-Service: Building blocks for customapplications

In both cases:

I Infrastructure moved within network

I Reduce/eliminate need for hardware maintenance

I Reduce need for ahead-of-time capacity planning

SaaS: Easy to set upPaaS/IaaS: More choice among service providers, potentially lower cost

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 4 / 16

Page 8: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Cloud Computing Offerings

Spectrum of service models:

I Software-as-a-Service: Complete integrated service from a provider

I Platform/Infrastructure-as-a-Service: Building blocks for customapplications

In both cases:

I Infrastructure moved within network

I Reduce/eliminate need for hardware maintenance

I Reduce need for ahead-of-time capacity planning

SaaS: Easy to set upPaaS/IaaS: More choice among service providers, potentially lower cost

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 4 / 16

Page 9: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Challenges

Cloud storage (e.g., Amazon S3) acts much like another level in thestorage hierarchy but brings new design constraints:

I New interfaceI Only supports writing complete objectsI Does support random read access

I PerformanceI High latency from network round tripsI Random access adds little penalty

I SecurityI Data privacy is a concern

I CostI Cost is very explicitI Unlimited capacity, but need to delete to save money

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 5 / 16

Page 10: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

BlueSky: Approach

I For ease of deployment, do not changesoftware stack on clients

I Clients simply pointed at a new server,continue to speak NFS/CIFS

I Deploy a local proxy to translate requestsbefore sending to the cloud

I Provides lower-latency responses toclients when possible by caching data

I Implements write-back cachingI Encrypts data before storage to cloud

for confidentiality

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 6 / 16

Page 11: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

BlueSky: Approach

I BlueSky adopts a log-structured designI Each log segment uploaded all at onceI Random access allowed for downloads

I Log cleaner can be run in the cloud (e.g.,on Amazon EC2) for faster, cheaperaccess to storage

I Log cleaner can run concurrently withactive proxy

I Cleaner not given full access to filesystem data

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 7 / 16

Page 12: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

File System Design

Checkpoint

Last segments seen: cleaner: 3 proxy: 12Inode maps: [0, 4095] [4096, 8191]

Inode map [0, 4095]

235611200

Inode 6

Type: regular fileOwner: rootSize: 48 KBData blocks: 0 1

Data Block

Inode number: 6Length: 32 KB

Data Block

Inode number: 6Length: 16 KB

Unencrypted Objects Encrypted Objects

Cloud Log Directories:

Proxy:

Cleaner:

Segment #11 #12

Segment #2 #3 #4

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 8 / 16

Page 13: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Architecture

SegmentWrites

RangeReads

Disk J ournalWrites

Disk CacheReads

ClientRequests

ClientResponses

NFS

CIFS

S3

WASEncryption

Disk

Network

Memory

FrontEnds

BackEnds

ResourceManagers

Proxy

I Proxy internally buffers updates briefly in memory

I File system updates are serialized and journaled to local disk

I File system is periodically checkpointed: log items are aggregated intosegments and stored to cloud

I On cache miss, log items fetched back from cloud and stored on localdisk

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 9 / 16

Page 14: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Cloud Storage Performance

I We are assuming that users will have fast connectivity to cloudproviders (if not now, then in the near future)

I Latency is a fundamental problem (unless cloud data centers builtnear to customers)

0.0001

0.001

0.01

0.1

1

10

100

1000

1 100 10000 1e+06 1e+08

Eff

ect

ive

Up

loa

d B

an

dw

idth

(M

bp

s)

Object Size (bytes)

1248

1632

I Network RTT: 30 ms tostandard (US-East) S3region, 12 ms to US-Westregion

I Proxy can fully utilizebandwidth to cloud

I Results argue for largerobjects, parallel uploads

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 10 / 16

Page 15: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Application Performance

Simple benchmark: unpack Linux kernel sources, checksum kernel sources,compile a kernel

Unpack Check Compile(write) (read) (R/W)

Local NFS server 10:50 0:26 4:23NFS server in EC2

65:39 26:26 74:11

BlueSky/S3-Westwarm proxy cache

5:10 0:33 5:50

cold proxy cache

26:12 7:10

full segment prefetch

1:49 6:45

BlueSky/S3-Eastwarm proxy

5:08 0:35 5:53

cold proxy cache

57:26 8:35

full segment prefetch

3:50 8:07

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 11 / 16

Page 16: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Application Performance

Simple benchmark: unpack Linux kernel sources, checksum kernel sources,compile a kernel

Unpack Check Compile(write) (read) (R/W)

Local NFS server 10:50 0:26 4:23NFS server in EC2 65:39 26:26 74:11BlueSky/S3-West

warm proxy cache

5:10 0:33 5:50

cold proxy cache

26:12 7:10

full segment prefetch

1:49 6:45

BlueSky/S3-Eastwarm proxy

5:08 0:35 5:53

cold proxy cache

57:26 8:35

full segment prefetch

3:50 8:07

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 11 / 16

Page 17: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Application Performance

Simple benchmark: unpack Linux kernel sources, checksum kernel sources,compile a kernel

Unpack Check Compile(write) (read) (R/W)

Local NFS server 10:50 0:26 4:23NFS server in EC2 65:39 26:26 74:11BlueSky/S3-West

warm proxy cache 5:10 0:33 5:50cold proxy cache

26:12 7:10

full segment prefetch

1:49 6:45

BlueSky/S3-Eastwarm proxy

5:08 0:35 5:53

cold proxy cache

57:26 8:35

full segment prefetch

3:50 8:07

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 11 / 16

Page 18: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Application Performance

Simple benchmark: unpack Linux kernel sources, checksum kernel sources,compile a kernel

Unpack Check Compile(write) (read) (R/W)

Local NFS server 10:50 0:26 4:23NFS server in EC2 65:39 26:26 74:11BlueSky/S3-West

warm proxy cache 5:10 0:33 5:50cold proxy cache 26:12 7:10full segment prefetch

1:49 6:45

BlueSky/S3-Eastwarm proxy

5:08 0:35 5:53

cold proxy cache

57:26 8:35

full segment prefetch

3:50 8:07

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 11 / 16

Page 19: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Application Performance

Simple benchmark: unpack Linux kernel sources, checksum kernel sources,compile a kernel

Unpack Check Compile(write) (read) (R/W)

Local NFS server 10:50 0:26 4:23NFS server in EC2 65:39 26:26 74:11BlueSky/S3-West

warm proxy cache 5:10 0:33 5:50cold proxy cache 26:12 7:10full segment prefetch 1:49 6:45

BlueSky/S3-Eastwarm proxy

5:08 0:35 5:53

cold proxy cache

57:26 8:35

full segment prefetch

3:50 8:07

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 11 / 16

Page 20: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Application Performance

Simple benchmark: unpack Linux kernel sources, checksum kernel sources,compile a kernel

Unpack Check Compile(write) (read) (R/W)

Local NFS server 10:50 0:26 4:23NFS server in EC2 65:39 26:26 74:11BlueSky/S3-West

warm proxy cache 5:10 0:33 5:50cold proxy cache 26:12 7:10full segment prefetch 1:49 6:45

BlueSky/S3-Eastwarm proxy 5:08 0:35 5:53cold proxy cache 57:26 8:35full segment prefetch 3:50 8:07

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 11 / 16

Page 21: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Read Performance Microbenchmark

0

50

100

150

200

250

300

350

400

0 20 40 60 80 100

ReadLatency(ms)

Proxy Cache Size (% Working Set)

Single-Client Request Stream

32 KB128 KB1024 KB

I Read performance depends on working set/cache size ratio

I At 100% hit rate, comparable to local NFS server

I Even at 50% hit rate, latency within about 2× to 3× of local case

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 12 / 16

Page 22: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Write Performance Microbenchmark

0

20

40

60

80

100

120

140

0 5 10 15 20 25 30 35

Ave

rag

e W

rite

La

ten

cy (

ms/

1 M

B w

rite

)

Client Write Rate (MB/s): 2-Minute Burst

Latency vs. Write Rate with Constrained Upload

128 MB Write Buffer1 GB Write Buffer

I Configure network to constrain bandwidth to cloud at 100 Mbps

I Write performance: similar to local disk, unless write rate exceedscloud bandwidth and write-back cache fills

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 13 / 16

Page 23: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Aggregate Performance: SPECsfs2008

100

200

300

400

500

600

700

800

900

1000

0 200 400 600 800 1000 1200 1400 1600

0 10 20 30 40 50

AchievedOpe

rations

perS

econ

d

Requested Operations per Second

Working Set Size (GB)

Local NFS

I Models a richer workload mix

I BlueSky is comparable to local NFS (as before, slight advantage onwrites from log-structured design)

I Performance is less predictable with a constrained network link

I Fetching full segments is a big loss with mostly random access

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 14 / 16

Page 24: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Aggregate Performance: SPECsfs2008

100

200

300

400

500

600

700

800

900

1000

0 200 400 600 800 1000 1200 1400 1600

0 10 20 30 40 50

AchievedOpe

rations

perS

econ

d

Requested Operations per Second

Working Set Size (GB)

Local NFSBlueSky

I Models a richer workload mix

I BlueSky is comparable to local NFS (as before, slight advantage onwrites from log-structured design)

I Performance is less predictable with a constrained network link

I Fetching full segments is a big loss with mostly random access

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 14 / 16

Page 25: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Aggregate Performance: SPECsfs2008

100

200

300

400

500

600

700

800

900

1000

0 200 400 600 800 1000 1200 1400 1600

0 10 20 30 40 50

AchievedOpe

rations

perS

econ

d

Requested Operations per Second

Working Set Size (GB)

Local NFSBlueSky

BlueSky (100 Mbps)

I Models a richer workload mix

I BlueSky is comparable to local NFS (as before, slight advantage onwrites from log-structured design)

I Performance is less predictable with a constrained network link

I Fetching full segments is a big loss with mostly random access

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 14 / 16

Page 26: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Aggregate Performance: SPECsfs2008

100

200

300

400

500

600

700

800

900

1000

0 200 400 600 800 1000 1200 1400 1600

0 10 20 30 40 50

AchievedOpe

rations

perS

econ

d

Requested Operations per Second

Working Set Size (GB)

Local NFSBlueSky

BlueSky (100 Mbps)BlueSky (norange)

I Models a richer workload mix

I BlueSky is comparable to local NFS (as before, slight advantage onwrites from log-structured design)

I Performance is less predictable with a constrained network link

I Fetching full segments is a big loss with mostly random access

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 14 / 16

Page 27: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Monetary Cost: SPECsfs2008

Normalized cost: cost per million SPECsfs operations(for S3 prices: $0.12/GB download, $0.01/1000–10000 ops)

Down Op Total (Up)

Log-structured baseline $0.18 $0.09 $0.27 $0.56No aggregation 0.17 2.91 3.08 0.56Full segment downloads 25.11 0.09 25.20 1.00

I Log-structured design minimizes cost for cloud storage operations

I Support for random access on reads (byte-range request) needed forlow cost

I Storage cost also an important consideration, but less sensitive tosystem design

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 15 / 16

Page 28: BlueSky: A Cloud-Backed File System for the Enterprise · 2019. 2. 25. · BlueSky: A Cloud-Backed File System for the Enterprise Michael Vrable Stefan Savage Geo rey M. Voelker University

Conclusions

I BlueSky is a prototype file server backed by cloud storage

I Prototype supports multiple client protocols (NFS, CIFS) and storagebackends (Amazon S3, Windows Azure)

I Allows clients to transparently move to cloud-backed storage

I Performance comparable to local storage when most access hits incache

I Design is informed by cost models of current cloud providers

Vrable, Savage, Voelker (UCSD) BlueSky February 16, 2012 16 / 16