Kinetic basho public

Preview:

Citation preview

Seagate Kinetic Open Storage PlatformJames Hughes …and many others

Storage is a Price Elastic Market

Price elasticity of demand • Alfred Marshall (1890)

As the price of Storage approaches $0 • Demands for storage will approach infinity

If the price of a Cisco router approaches $0 • Demands for routers will not approach

infinity - Storage is different

"3

http

://en

.wik

iped

ia.o

rg/w

iki/A

lfred

_Mar

shal

l

Seagate Confidential: Subject to NDA No. 77103, effective Jan. 18, 2009,

and all applicable supplements

"4

Areal Density Growth

0.1

1

10

100

1000

10000

100000

1989

1991

1993

1995

1997

1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

2019

year

giga

bit /

in2

Single particle superparamagnetic limit (estimated)

Charap’s limit (broken)

•  Late 1990s – super paramagnetic limit demonstrated through modeling

•  Perpendicular expected to extend to 0.5-1 Tb/in2

•  Additional innovations required at that point

•  heat-assisted recording

•  bit patterned media recording

•  Areal Density CAGR 40% •  Transfer Rate CAGR 20%

Perpendicular Writing & GMR

HAMR

HAMR+BPM

•  Inductive Writing & Reading

•  Inductive Writing/ MR reading

•  Inductive Writing/ GMR reading

29%

100%

40%

Seagate Confidential: Subject to NDA No. 77103, effective Jan. 18, 2009,

and all applicable supplements

•Jevons Paradox • Cloud Computing increases the efficiency of computing....

Cloud Computing will increase this trend

"5

http://en.wikipedia.org/wiki/Jevons_paradox

Seagate Confidential: Subject to NDA No. 77103, effective Jan. 18, 2009,

and all applicable supplements

•Jevons Paradox • Cloud Computing increases the efficiency of computing....

Cloud Computing will increase this trend

Improved technology doubles the amount of Information produced with a given amount of Storage !Demand for Storage rises

"5

http://en.wikipedia.org/wiki/Jevons_paradox

Seagate Confidential: Subject to NDA No. 77103, effective Jan. 18, 2009,

and all applicable supplements

Technology Trends

"6

Seagate Confidential: Subject to NDA No. 77103, effective Jan. 18, 2009,

and all applicable supplements

•Write head larger than read head • Turns Disk into a sequentially

written media •All updates to data and

metadata are written sequentially to a continuous stream, called a log

•Disk API of sectors is no longer “natural”

Shingled Disks

http://www.ssrc.ucsc.edu/Papers/amer-ieeetm11.pdf

"7

Log Structured Storage

How much is erased on a reposition? • Tape - the remainder of the tape • Shingled disk - the remainder of the track group • Flash - the entire page

All persistent Storage systems do/will implement log structure • e.g. “NoSQL Database of sectors”

Does it make sense to layer a database on top of a database? • Could we use the log structure of the media to provide a more

natural storage systems, not mimicking an antique paradigm?

"8

Leading to disaggregation of servers

Single System Performance Trend

http://web.eecs.umich.edu/~twenisch/papers/isca09-disaggregate.pdf

"9

Scaling Storage

Distributed Hash Table • Key/Value Store

RAM MemcachedFlash FAWNDisk Riak

http://en.wikipedia.org/wiki/Distributed_hash_table

"10

Metadata and Metadata Servers are Evil

Required by traditional file systems (POSIX) to translate names to sectors • Hard to scale, heavy HA requirements, expensive

Can we use a name as a key? • Place the data into a scaled key value store? • Eliminate costly metadata servers?

"11

Cumulative operations ordered by length

"12

0%

20%

40%

60%

80%

100%

1.00 10.00 100.00 1000.00 10000.00 100000.00

Length (KB)

Cum

ulat

ive

perc

enta

ge

32K

B

0.5% of the data

92% of the operations

operations

data

Map of Operations

"13

Time (minutes)Locatio

n (TB)

Leng

th

0

1

2

3

4 0

2

051

2KB

1

3

Seagate Confidential: Subject to NDA No. 77103, effective Jan. 18, 2009,

and all applicable supplements

Seagate Kinetic Open Storage Platform

"14

Dis-intermediates applications to drive –Goes around file systems, volume managers, drivers

Enable ecosystem of value added software –Partners (like Basho) can create their own system value

Lower TCO –Eliminates complexity

Seagate Kinetic Open Storage Platform

"15

"16

"16

"17

SA

DD

"17

SA

LibKinetic

App App

ProtoBuf TCP/IP/GbE

Proprietary to Seagate

GPL

StandardProprietary

to System Vendor

• Application • Clustering • Management

• Storage

• Interconnect

D

C++, Java, Python, Erlang, DIY

"17

SA

LibKinetic

App App

LibKinetic

App App

LibKinetic

App App

ProtoBuf TCP/IP/GbE

Proprietary to Seagate

GPL

StandardProprietary

to System Vendor

• Application • Clustering • Management

• Storage

• Interconnect

D

"17

SA

LibKinetic

App App

LibKinetic

App App

LibKinetic

App App

ProtoBuf TCP/IP/GbE

SA

D

SA

D

SA

D

SA

D

SA

D

SA

D

SA

D

SA

D Proprietary to Seagate

GPL

StandardProprietary

to System Vendor

• Application • Clustering • Management

• Storage

• Interconnect

D

Typical JBOD architecture • Does not require a server, just JBODs to the ToR Switch • 10 JBODS × 60 drives × 4TB = 2.4PB/Rack

System Hardware

"18

Provides RPC to Key/Value database • Data is pre-indexed • Compression and other value is easy and transparent

P2P (Drive to Drive) copy of key ranges Communicate using existing Data Center Plumbing (TCP/IP) Multiple masters - Data sharing between machines Configurable caching per command

• Async, Sync, Flush Local space management

Features

"19

Clustering (performance, reliability, management) Compatibility with large scale applications (S3, etc.) Centralized Management

• Reliability, availability, durability

Kinetic Systems

"20

Elimination of server layers Less Human requirements Reduced mistakes Disaggregate storage from

servers Power management

Lower TCO

"21

Elimination of server layers Less Human requirements Reduced mistakes Disaggregate storage from

servers Power management

Lower TCO

"21

Data movement • Get/put/delete/getnext/getprevious • Versioned (== for success), options

Range operations Multiple masters

• Authentication/Integrity/Authorization Cluster-able

• Simple cluster configuration version enforcement 3rd party copy Management

Goals of API

"22

Configures the drive • Network • authorized clients

Monitors • Health • Statistics • Logs

Initiates recovery • Change cluster version • 3rdPartyCopy

Management (System Vendor)

"23

Key Structure • Variable number of octets (0-4KB)

Data Structure (Serialized to a byte stream) • KeyOf • Version • E2E Data Integrity

–Algorithm name • Data Variable length (0-xMB)

Data formats

"24

Same normal performance expectations • Sequential Write • Random Write • Sequential Read • Sequential Write

Iometer for key/value

Performance Metrics

"25

Seagate Confidential: Subject to NDA No. 77103, effective Jan. 18, 2009,

and all applicable supplements

Demo Time!

"26

Performance Results

"27

0

30

60

90

120

0 2 4 6 8

MB/s

0

250

500

750

1000

0 2 4 6 8

Puts/s

1MB values put rate (MB/s) 1KB values put rate

Deliver more value to Seagate, Partners and Customers • Dis-intermediates cloud applications to drive • Enable innovation in hardware and software ecosystem • Lower TCO

OpenSource Software –Basho Riak, Swift, HDFS

More information • http://seagate.com/www/kinetic • https://developers.seagate.com • http://guthub.com/Seagate

Conclusion

"28