34
Alluxio (formerly Tachyon) Open Source Memory Speed Virtual Distributed Storage Haoyuan Li CEO, Alluxio, Inc.

Open Source Memory Speed Virtual Distributed Storage

Embed Size (px)

Citation preview

Page 1: Open Source Memory Speed Virtual Distributed Storage

Alluxio (formerly Tachyon)Open Source Memory Speed Virtual Distributed Storage

Haoyuan LiCEO, Alluxio, Inc.

Page 2: Open Source Memory Speed Virtual Distributed Storage

2

Rebranded from Tachyon to Alluxio!

Tachyon

Alluxio

Page 3: Open Source Memory Speed Virtual Distributed Storage

3

Rebranded from Tachyon to Alluxio!

http://www.alluxio.com/blog/

Page 4: Open Source Memory Speed Virtual Distributed Storage

About Alluxio

• Team– Alluxio Creators and Top Developers/Committers

(all top 8 committers).

• Investors

Page 5: Open Source Memory Speed Virtual Distributed Storage

Performance Trend: Memory is Fast

• RAM throughput increasing exponentially• Disk throughput increasing slowly• Memory-locality key to interactive response

times

Page 6: Open Source Memory Speed Virtual Distributed Storage

Price Trend: Memory is Cheaper

Source: jcmit.com

Page 7: Open Source Memory Speed Virtual Distributed Storage

The Big Data Ecosystem Today

Page 8: Open Source Memory Speed Virtual Distributed Storage

What is Alluxio?• Alluxio: Memory Speed Virtual Distributed Storage• Enables Virtualized Data Across Multiple Types of Storage

Page 9: Open Source Memory Speed Virtual Distributed Storage

9

Open Source Community Growth

Dec-2012

Mar-2013

Jun-2013

Sep-2013

Dec-2013

Mar-2014

Jun-2014

Sep-2014

Dec-2014

Mar-2015

Jun-2015

Sep-2015

Dec-2015

Mar-2016

0

50

100

150

200

250

300

350

# Co

ntrib

utor

s (gi

t com

mit

hist

ory)

v0.2 v0.3v0.4 v0.5

v0.6v0.7

v0.8

Page 10: Open Source Memory Speed Virtual Distributed Storage

10

Open Source Community Growth

Dec-2012

Mar-2013

Jun-2013

Sep-2013

Dec-2013

Mar-2014

Jun-2014

Sep-2014

Dec-2014

Mar-2015

Jun-2015

Sep-2015

Dec-2015

Mar-2016

0

50

100

150

200

250

300

350

# Co

ntrib

utor

s (gi

t com

mit

hist

ory)

v0.2 v0.3v0.4 v0.5

v0.6v0.7

v0.8

v1.0

v1.1

Page 11: Open Source Memory Speed Virtual Distributed Storage

Open Source Alluxio System

• The fastest growing open source project in big data

• Over 250 contributors from over 100 organizations

Page 12: Open Source Memory Speed Virtual Distributed Storage

Alluxio Benefits• Flexibility

– Enable new workloads across any storage systems– Unified Name Space enable application to access data in any storage system

• Agility– Work with the framework of your choice– Work with the storage of your choice

• Performance – High performance data access

• Cost– Grow Storage and Compute independently

• Any application accesses any data from any storage at memory speed.

Page 13: Open Source Memory Speed Virtual Distributed Storage

New Features and Improvements in

Alluxio 1.0 and 1.1

Gene Pang @ Alluxio, Inc.June 15, 2016 @ Alluxio Meetup (hosted by Intel)

Page 14: Open Source Memory Speed Virtual Distributed Storage

14

About Me

• Gene Pang - Software Engineer @ Alluxio, Inc.

• One of the core maintainers of Alluxio Open Source Project

• Ph.D. @ AMPLab, UC Berkeley

• Worked at Google before UC Berkeley

• Twitter: @unityxx

Page 15: Open Source Memory Speed Virtual Distributed Storage

15

Outline

Performance Improvement Results in Alluxio 1.1

New Developments in Alluxio

Alluxio Architecture Overview

Page 16: Open Source Memory Speed Virtual Distributed Storage

16

Alluxio Architecture Overview

Page 17: Open Source Memory Speed Virtual Distributed Storage

17

Architecture Overview

AlluxioMaster

AlluxioWorker

AlluxioWorker

AlluxioWorker

Under File System

Under File System

Journal

Manages metadata

Servesdata blocks

Mount multiple storage systems

Page 18: Open Source Memory Speed Virtual Distributed Storage

18

Alluxio New Developments

Page 19: Open Source Memory Speed Virtual Distributed Storage

19

Releases

Tachyon 0.8 – Oct 22, 2015

Alluxio 1.0 – Feb 23, 2016

Alluxio 1.1 – Jun 7, 2016

Page 20: Open Source Memory Speed Virtual Distributed Storage

20

New DevelopmentsNew Integrations

Usability Improvements

Performance Improvements

Access Control (Alpha)

Page 21: Open Source Memory Speed Virtual Distributed Storage

21

New IntegrationsNative OpenStack Swift Driver

Alluxio to FUSE Connector

Google Cloud Storage

Aliyun Object Storage Service

Google Compute Engine

improve performance, reduce complexity

manage data on Alibaba Cloud

mount Alluxio to local file system

manage data on Google Cloud Platform

deploy Alluxio on Google Cloud Platform

Page 22: Open Source Memory Speed Virtual Distributed Storage

22

Access Control (Alpha)User/Group Support

Command-line Permission Tools

Configuration Parameter

File System Permissionssimilar to POSIX permission model

chown, chgrp, chmod

alluxio.security.authorization.permission.enabled

similar to POSIX permission model

Page 23: Open Source Memory Speed Virtual Distributed Storage

23

Usability ImprovementsWrite Location Policies

Simplified Configuration

Automatic Metadata Loading

configure how to write data to Alluxio

load metadata automatically

customize with properties

Page 24: Open Source Memory Speed Virtual Distributed Storage

24

Performance ImprovementsImproved Alluxio Master Scalability

Better Support for Random I/O Workloads

Improved Alluxio Worker Scalability

fine-grained locking, efficient journaling

improved data structures, improved locking

cache blocks during random I/O (e.g., parquet files)

Page 25: Open Source Memory Speed Virtual Distributed Storage

25

Alluxio 1.1 Performance Improvement Results

Page 26: Open Source Memory Speed Virtual Distributed Storage

26

Create File Throughput

1.0.1

Test Duration

Thro

ughp

ut

(Local Journal)

Page 27: Open Source Memory Speed Virtual Distributed Storage

27

Create File Throughput

1.0.11.1.0

Test Duration

Thro

ughp

ut

1.8x improvement

(Local Journal)

Page 28: Open Source Memory Speed Virtual Distributed Storage

28

Create File Throughput(Remote Journal)

1.0.1

Test Duration

Thro

ughp

ut

Page 29: Open Source Memory Speed Virtual Distributed Storage

29

Create File Throughput(Remote Journal)

1.0.11.1.0

Test Duration

Thro

ughp

ut

23x improvement

Page 30: Open Source Memory Speed Virtual Distributed Storage

30

List Directory Throughput

1.0.1

Test Duration

Thro

ughp

ut

Page 31: Open Source Memory Speed Virtual Distributed Storage

31

List Directory Throughput

1.0.11.1.0

Test Duration

Thro

ughp

ut

7x improvement

Page 32: Open Source Memory Speed Virtual Distributed Storage

32

Worker Scalability

1.0.1

# Blocks on Worker

Writ

e La

tenc

y

Page 33: Open Source Memory Speed Virtual Distributed Storage

33

Worker Scalability

1.0.11.1.0

# Blocks on Worker

Writ

e La

tenc

y

Page 34: Open Source Memory Speed Virtual Distributed Storage

34

Try out Alluxio 1.1.0http://www.alluxio.org/releases