Open Source Memory Speed Virtual Distributed Storage

Preview:

Citation preview

Alluxio (formerly Tachyon)Open Source Memory Speed Virtual Distributed Storage

Haoyuan LiCEO, Alluxio, Inc.

2

Rebranded from Tachyon to Alluxio!

Tachyon

Alluxio

3

Rebranded from Tachyon to Alluxio!

http://www.alluxio.com/blog/

About Alluxio

• Team– Alluxio Creators and Top Developers/Committers

(all top 8 committers).

• Investors

Performance Trend: Memory is Fast

• RAM throughput increasing exponentially• Disk throughput increasing slowly• Memory-locality key to interactive response

times

Price Trend: Memory is Cheaper

Source: jcmit.com

The Big Data Ecosystem Today

What is Alluxio?• Alluxio: Memory Speed Virtual Distributed Storage• Enables Virtualized Data Across Multiple Types of Storage

9

Open Source Community Growth

Dec-2012

Mar-2013

Jun-2013

Sep-2013

Dec-2013

Mar-2014

Jun-2014

Sep-2014

Dec-2014

Mar-2015

Jun-2015

Sep-2015

Dec-2015

Mar-2016

0

50

100

150

200

250

300

350

# Co

ntrib

utor

s (gi

t com

mit

hist

ory)

v0.2 v0.3v0.4 v0.5

v0.6v0.7

v0.8

10

Open Source Community Growth

Dec-2012

Mar-2013

Jun-2013

Sep-2013

Dec-2013

Mar-2014

Jun-2014

Sep-2014

Dec-2014

Mar-2015

Jun-2015

Sep-2015

Dec-2015

Mar-2016

0

50

100

150

200

250

300

350

# Co

ntrib

utor

s (gi

t com

mit

hist

ory)

v0.2 v0.3v0.4 v0.5

v0.6v0.7

v0.8

v1.0

v1.1

Open Source Alluxio System

• The fastest growing open source project in big data

• Over 250 contributors from over 100 organizations

Alluxio Benefits• Flexibility

– Enable new workloads across any storage systems– Unified Name Space enable application to access data in any storage system

• Agility– Work with the framework of your choice– Work with the storage of your choice

• Performance – High performance data access

• Cost– Grow Storage and Compute independently

• Any application accesses any data from any storage at memory speed.

New Features and Improvements in

Alluxio 1.0 and 1.1

Gene Pang @ Alluxio, Inc.June 15, 2016 @ Alluxio Meetup (hosted by Intel)

14

About Me

• Gene Pang - Software Engineer @ Alluxio, Inc.

• One of the core maintainers of Alluxio Open Source Project

• Ph.D. @ AMPLab, UC Berkeley

• Worked at Google before UC Berkeley

• Twitter: @unityxx

15

Outline

Performance Improvement Results in Alluxio 1.1

New Developments in Alluxio

Alluxio Architecture Overview

16

Alluxio Architecture Overview

17

Architecture Overview

AlluxioMaster

AlluxioWorker

AlluxioWorker

AlluxioWorker

Under File System

Under File System

Journal

Manages metadata

Servesdata blocks

Mount multiple storage systems

18

Alluxio New Developments

19

Releases

Tachyon 0.8 – Oct 22, 2015

Alluxio 1.0 – Feb 23, 2016

Alluxio 1.1 – Jun 7, 2016

20

New DevelopmentsNew Integrations

Usability Improvements

Performance Improvements

Access Control (Alpha)

21

New IntegrationsNative OpenStack Swift Driver

Alluxio to FUSE Connector

Google Cloud Storage

Aliyun Object Storage Service

Google Compute Engine

improve performance, reduce complexity

manage data on Alibaba Cloud

mount Alluxio to local file system

manage data on Google Cloud Platform

deploy Alluxio on Google Cloud Platform

22

Access Control (Alpha)User/Group Support

Command-line Permission Tools

Configuration Parameter

File System Permissionssimilar to POSIX permission model

chown, chgrp, chmod

alluxio.security.authorization.permission.enabled

similar to POSIX permission model

23

Usability ImprovementsWrite Location Policies

Simplified Configuration

Automatic Metadata Loading

configure how to write data to Alluxio

load metadata automatically

customize with properties

24

Performance ImprovementsImproved Alluxio Master Scalability

Better Support for Random I/O Workloads

Improved Alluxio Worker Scalability

fine-grained locking, efficient journaling

improved data structures, improved locking

cache blocks during random I/O (e.g., parquet files)

25

Alluxio 1.1 Performance Improvement Results

26

Create File Throughput

1.0.1

Test Duration

Thro

ughp

ut

(Local Journal)

27

Create File Throughput

1.0.11.1.0

Test Duration

Thro

ughp

ut

1.8x improvement

(Local Journal)

28

Create File Throughput(Remote Journal)

1.0.1

Test Duration

Thro

ughp

ut

29

Create File Throughput(Remote Journal)

1.0.11.1.0

Test Duration

Thro

ughp

ut

23x improvement

30

List Directory Throughput

1.0.1

Test Duration

Thro

ughp

ut

31

List Directory Throughput

1.0.11.1.0

Test Duration

Thro

ughp

ut

7x improvement

32

Worker Scalability

1.0.1

# Blocks on Worker

Writ

e La

tenc

y

33

Worker Scalability

1.0.11.1.0

# Blocks on Worker

Writ

e La

tenc

y

34

Try out Alluxio 1.1.0http://www.alluxio.org/releases

Recommended