51
Replication, History, and Grafting in the Ori File System Ali José Mashtizadeh, Andrea Bittau, Yifeng Frank Huang, David Mazières Stanford University

Replication, History, and Grafting in the Ori File System

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Replication, History, and Grafting in the Ori File System

Replication, History,and Grafting in theOri File SystemAli José Mashtizadeh, Andrea Bittau, Yifeng Frank Huang, David Mazières

Stanford University

Page 2: Replication, History, and Grafting in the Ori File System

Managed Storage

$1/GB/Year

$5-10/GB+

Local Storage $0.04/GB

Page 3: Replication, History, and Grafting in the Ori File System

What’s missing? Data management

•Availability – Data is always live.

•Accessibility – Data is globally accessible.

•Durability – Data is never lost. (History, Snapshots, Backup)

•Usability – Collaboration and version control are easy

Page 4: Replication, History, and Grafting in the Ori File System

Ori File System

Goal: All the benefits of Managed Storage, implemented with hardware you already own.

Local Storage $0.04/GB

Page 5: Replication, History, and Grafting in the Ori File System

Two Main Usage Models

Personal storage Shared storage

Public Folders Public Folders

Page 6: Replication, History, and Grafting in the Ori File System

Managed storage limitations todayBandwidth

- Limited by WAN bandwidthPrivacyStorage cost

- $ per GB of managed solutionsPoor integration of replication, versioning & sharing

- Copying files across machines- Apple Time Machine, Windows 8 File History,

Applications implement their own versioning- Emailing documents, Distributed version control

Page 7: Replication, History, and Grafting in the Ori File System

Idea: Leverage trends to do better

Fast LANsBig disks

Mobile storage

Page 8: Replication, History, and Grafting in the Ori File System

Disk vs WAN Throughput Growth

1

10

100

1000

10000

100000

1990 2013

Gro

wth

(lo

g sc

ale)

Internet Speed Disk Space

468x TransferTime Gap!

14 hours 278 daysTransfer time:

Page 9: Replication, History, and Grafting in the Ori File System

Ori design principles

Store not just files but file history

- Take advantage of disk space

Replicate files and history widely

- Make replication easy and instantaneous

- No master replica (OK if any device fails)

- Uses LAN speed and disk space

Use history for sharing

Page 10: Replication, History, and Grafting in the Ori File System

Ori Provides

Replication

History

File Sharing with History(Grafting)

Recovery

Public Folders

Page 11: Replication, History, and Grafting in the Ori File System

History

Page 12: Replication, History, and Grafting in the Ori File System

SFSRO/Git-like Data Model

Content Addressable Storage

SHA-256 Hash

Globally unique namespace

Deduplication

Tree

Tree Tree

Tree

Blob

(shared)

Blob

(fragment)

Blob

(fragment)

Blob

Large

Blob

CommitOlder

Commit. . .

Tree

Page 13: Replication, History, and Grafting in the Ori File System

Apply DVCS Techniques

Merge diverging replicas

Detect conflicts

- No magic bullets for all file types

- Make “merge base” available

- 3-way merge line-oriented files

Provide convenient tools

- History, snapshots, branches, …

Page 14: Replication, History, and Grafting in the Ori File System

Storage Layout

Objects are deduplicated, compressed, and stored

Log structured storage (files on your local file system)

Index used to lookup object locations

Page 15: Replication, History, and Grafting in the Ori File System

ReplicationSimplify data management

Page 16: Replication, History, and Grafting in the Ori File System

Today

Backup Centralized File StorageDropbox

SCP/Rsync/Airdrop

Page 17: Replication, History, and Grafting in the Ori File System

Egalitarian Replication

Page 18: Replication, History, and Grafting in the Ori File System

Replication subsumes backup

Crash!

Recover with Replication

Background Fetch optimization makes replica creation feel instantaneous

Page 19: Replication, History, and Grafting in the Ori File System

Replication in Ori

Opportunistic replication (Use LAN)- Bulk transport over SSH

Automatic device discovery and synchronization- UDP multicast messages – 5 second interval- Set a cluster name and symmetric key- Protected by AES-CBC

Page 20: Replication, History, and Grafting in the Ori File System

Replicate Deltas

Delta consists of acollection of objects

Versioning makesreplication easy!

Δ

Δ

Tree

Tree Tree

Tree

Blob

(shared)

Blob

(fragment)

Blob

(fragment)

Blob

Large

Blob

CommitOlder

Commit. . .

Tree

Delta

Page 21: Replication, History, and Grafting in the Ori File System

Protocol

Content Addressable Storage:

Objects are identical on disk and wire

- No rewriting of objects

Reference Counting:

Decompress metadata to update reference counts

- Decompression is faster than compression

Page 22: Replication, History, and Grafting in the Ori File System

Distributed Fetch

Fast LAN(Gbps)

WAN (Mbps)

UnrelatedFile System

Depends on content addressable storageTrade off Storage for Bandwidth

Page 23: Replication, History, and Grafting in the Ori File System

GraftingFile Sharing with History

Page 24: Replication, History, and Grafting in the Ori File System

Collaboration Today

Over EmailCloudVersionControl

Page 25: Replication, History, and Grafting in the Ori File System

File Sharing with Versioning

We want the file system to manage versioning and sharing

Require no forethought in setting up version control

No more insane naming: Presentation_Alice_Final_Bob_2_Final.pptx

Page 26: Replication, History, and Grafting in the Ori File System

Grafting in Ori

A 1

B 1 B 2 A 1* A 2* A 3*

A 2 A 3

B 3

B 3*Alice:

Bob:

Commit History

Alice’s LatestSnapshot

Cross repository links

Alice’s LatestSnapshot

Page 27: Replication, History, and Grafting in the Ori File System

Conflicts in Ori

Detects conflicts using history

Automatic merging when possible

Otherwise, provide files for 3-way mergefile, file:conflict, file:base

Conflicts rarely occur in single user model

Conflicts more likely with Grafts

– merges are explicit

Page 28: Replication, History, and Grafting in the Ori File System

Mobile DevicesSneakernets!

Page 29: Replication, History, and Grafting in the Ori File System

Today: Device space underutilized

iCloud, Google Drive,Office 365/SkyDrive

Page 30: Replication, History, and Grafting in the Ori File System

Data Carriers: Phone Storage Space

0

20

40

60

80

100

120

140

Oct

-06

Feb

-08

Jul-

09

No

v-1

0

Ap

r-1

2

Au

g-1

3

Dec

-14

Cap

acit

y (G

B)

Page 31: Replication, History, and Grafting in the Ori File System

Fast wireless networks

1

10

100

1000

10000

Oct-95 Jul-98 Apr-01 Jan-04 Oct-06 Jul-09 Apr-12 Dec-14

Ban

dw

idth

(M

bp

s)

Per-stream Bandwidth

802.11

802.11g802.11n

802.11ac

802.11b

802.11ad

4-8 Streams (MIMO)

Page 32: Replication, History, and Grafting in the Ori File System

Sneakernets

Page 33: Replication, History, and Grafting in the Ori File System

Sneakernets

Page 34: Replication, History, and Grafting in the Ori File System

Sneakernets

Average Commute in US: 25 MinutesCarry 16 GB Storage

5.2 Gbps Effective Bandwidth

Page 35: Replication, History, and Grafting in the Ori File System

“ Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.”

- Andrew S. Tanenbaum

Page 36: Replication, History, and Grafting in the Ori File System

Performance

Page 37: Replication, History, and Grafting in the Ori File System

Performance

File system benchmarks: Filebench

Network file system: Source code build

* Everything measured on an SSD, except the network benchmark

Page 38: Replication, History, and Grafting in the Ori File System

File system in User Space (FUSE)

Ori is built using FUSE

Baseline against the FUSE loopback

Compare: ext4, ori, loopback

Kernel

BenchmarkFUSE Driver

(orifs, loopback)

SSD

FUSE Kernel

ModuleExt4

User Space

Page 39: Replication, History, and Grafting in the Ori File System

Architecture

libOri

LocalStorage

HttpStorage

SSHStorageConnection

Manager

Object Storage(Packfiles)

Blob Tree Commit

Index Metadata

ext4

orifs (FUSE Driver)

Staging Area(Data Cache)

FS Metadata – In Memory(directories, fstat)

Staging Area(File Data Only)

Page 40: Replication, History, and Grafting in the Ori File System

Filebench: Synthetic Workloads

0

0.5

1

1.5

2

2.5

fileserver webserver varmail webproxy networkfs

Op

erat

ion

s/s

(No

rmal

ized

)

ext4 ori loopback

Higher is better

*

Page 41: Replication, History, and Grafting in the Ori File System

Ori vs NFS: Remote compile

20.4519.45

11.3316.04

0

10

20

30

40

50

60

Tim

e (s

)

LAN (1 Gbps)NFSv3

NFSv4

Ori

Ori w/BF

54.85

44.07

15.319.34

0

10

20

30

40

50

60

Tim

e (s

)

WAN (2/20 Mbps – 17 ms)

Lower is better40% longer 23% longer

BF = On-demand Background Fetch

Page 42: Replication, History, and Grafting in the Ori File System

Related Work

Network File Systems – AFP, CIFS, LBFS, NFS, Shark, …

Distributed File Systems – AFS, …

Disconnected File Systems – Coda, Ficus, JetFile, Intermezzo, …

Archival File Systems – Elephant, Plan 9, WAFL, Wayback, ZFS, …

Version Control – Git, Mercurial, …

Application Solutions – Bayou, Dropbox, …

Page 43: Replication, History, and Grafting in the Ori File System

Lessons Learned

Hardware and use cases have evolved

File systems need to catch up!

Replication is no longer just for data-centers

Keeping file history should be the default

Mobile devices create an opportunity for better solutions

- Fast LAN, Large Storage, Sneakernets

Page 44: Replication, History, and Grafting in the Ori File System

Future Work

Application Support for Merging on Ori

API ComplicationsMerges can surprise applications and usersEvent notification?

Integrating Grafting and Orisync

Authentication

Page 45: Replication, History, and Grafting in the Ori File System

Questions?Visit: http://ori.scs.stanford.edu/

Available for OS X, Linux, and FreeBSD

See paper for details on additional features

Page 46: Replication, History, and Grafting in the Ori File System
Page 47: Replication, History, and Grafting in the Ori File System

Backup Slides

Page 48: Replication, History, and Grafting in the Ori File System

Mobile Device Battery Life

Use 802.11 (or USB) – Better for battery life

Some platforms have:

- Periodic callbacks (opportunistic optimize battery life)

- Geofencing callbacks (wake up when arriving at a location)

Page 49: Replication, History, and Grafting in the Ori File System

Bonnie: IO Benchmark

0

50000

100000

150000

200000

250000

300000

16K read 16K write 16K rewrite

Op

erat

ion

s Pe

r Se

con

d

ext4 ori loopback

Higher is better

Page 50: Replication, History, and Grafting in the Ori File System

Distributed Fetch - Performance

7.75

132.05

170.79

020406080

100120140160180

DistributedPull

PartiallyDistributed

Pull

RemotePull

Tim

e (s

)

Remote pull of Python 3.2.3 source

Peer either has Python 2.7.3 or 3.2.3

Source

Internet110ms

290/530KB up/down

Destination

Nearby Peer

Page 51: Replication, History, and Grafting in the Ori File System

Ori vs NFS

NFSv3 NFSv4 Ori Ori on-demand

LAN WAN LAN WAN LAN WAN LAN WAN

Replicate 0.49 s 2.93 s

Configure 8.14 s 21.52 s 7.25 s 15.54 s 0.66 s 0.66 s 1.01 s 1.33 s

Build 12.32 s 33.33 s 12.20 s 28.54 s 9.50 s 9.55 s 11.45 s 12.77 s

Snapshot 0.19 s 0.19 s 2.72 s 3.37 s

Push 0.49 s 1.58 s 0.85 s 1.89 s

Total 20.45 s 54.85 s 19.45 s 44.07 s 11.33 s 15.30 s 16.04 s 19.34 s