Download pdf - Accel Partners New Data Workshop 7-14-10

®

1

New Data Stack Workshop: Building a Scalable Cloud Datacenter

Ping Li, Accel [email protected]

July 14, 2010Stanford University

2

®

Accel Partners Confidential

Delivering Cloud Computing

• Cloud data centers will share infrastructure layers common to mainframes but redelivered for cloud capabilities

• “New Data Stack” will form foundation for cloud computing

• Elasticity

• Multi-app/user

• User-provisioned

• Portability

“Cloud Frame” MainframeMonitoring—Security

(RACF)

Resource Scheduler(z/VM & OS 370)

Monitoring—Performance(Mainview)

Provisioning & ConfigurationManagement

Virtualization(z/VM)

Performance Acceleration & dedicated processors (OS 370)

Clustering, failover, and mirroring(OS 370 & purpose built hw & microcode)

Backup and DR Tivoli Storage Manager, Parallel Sysplex

Private/Public

3

®


Data Explosion

Legacy Stack

New Data Stack

• 2,500 exabytes of new information in 2012 with Internet/web as primary driver• “Digital universe” grew by 62% last year to 800K petabytes and will grow to 1.2 zettabytes this yearSource: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.

.

Cloud Application Data

Business Transaction Data

4

®


“New Data” Trends

Data is growing faster than processing power – leading to coping strategies like throwing away data or frequent archiving to tape

61% CAGR

42% CAGRData

Transistors

Application responsiveness/scale trumps immediate consistency

Absolute consistency is the primary requirement – ACID transactions

Unstructured, complex data blobs (images, voice, logs, video) – doesn’t fit nicely into rows/columns

Highly structured, relatively small data records

Extremely large data sets (petabytes)Smaller data sets (bytes)

2,000 users = Tiny2,000 users = Huge

Circa 2010 – Cloud DataCirca 1975 – Transaction Data

Source: Gartner. .

5

®


New Data Stack Technologies

CloudLegacy

Distributed computing layer (virtual machines,Map Reduce, networked commodity servers)

High speed networking is pervasive

Non-relational/”no sql” data stores

Distributed file systems

Flash/SSD (high performance and abundant)

Open platforms

Internet/cloud scale

Distributed computing layer (virtual machines,Map Reduce, networked commodity servers)

High speed networking is pervasive

Non-relational/”no sql” data stores

Distributed file systems

Flash/SSD (high performance and abundant)

Open platforms

Internet/cloud scale

Centralized/monolithic computing layer

Computer networking limited

Relational databases

FC SAN/NAS

Disks/Tape (memory scarce/expensive)

Proprietary/closed vendors

Enterprise-scale

Centralized/monolithic computing layer

Computer networking limited

Relational databases

FC SAN/NAS

Disks/Tape (memory scarce/expensive)

Proprietary/closed vendors

Enterprise-scale

6

®


Agenda

1:15 pm NorthscaleSharon Barr, Vice President EngineeringJames Phillips, Founder, Chief Product OfficerDustin Sailings, Chief ArchitectBob Wiederhold, President, CEO

2:15 pm ClouderaAmr Awadallah, CTO/Co-FounderJeff Hammerbacher, Chief Scientist/Co-Founder

3:15 pm FacebookBobby Johnson, Director, Software EngineeringMark Rabkin, Software Engineer

4:15 pm Fusion-ioRobert Wipfel, Fellow

5:30 pm Cocktails!

Elastic Data Management Softwarefor web applications and cloud computing environments

The opportunity.

“ Relational database technology has served us well for 40 years, and will likely continue to do so for the foreseeable future to support transactions requiring ACID guarantees. But a large, and increasingly dominant, class of software systems and data do not need those guarantees. Much of the data manipulated by Web applications have less strict transactional requirements but, for lack of a practical alternative, many IT teams continue to use relational technology, needlessly tolerating its cost and scalability limitations. For these applications and data, distributed key-value cache and database technologies such as NorthScale provide a promising alternative. ”

Carl OlofsonResearch Vice PresidentDatabase Management Software ResearchIDC

Modern interactive software architecture

3

To support more users …

… simply add more commodity web servers

(or virtual machines) behind a load balancer …

… but you must get a bigger, more complex

database server.

Application scales linearly, data hits a wall

Application Scales OutJust add more commodity web servers

Database Scales UpGet a bigger, more complex server

4

What’s driving the curves?

5

1.Transaction overhead.

Same hardware, over an order of magnitude difference in supportable user base.

2.Expensive hardware.

More costly to start with, and the cost differential widens with growth.

3.Complex administration.

RDBMS technology is extremely complex and expensive to administer.

750 OPS 15,000 OPS

$7,500 $2,500750 OPS 750 OPS

RDBMS NorthScale

RDBMS NorthScaleRDBMS

Schema committee

Add new table(s)Re-normalize

Shard if needed

Tune performanceUpdate views

Insert and select.

NorthScale

Set and get.$125,000 $12,50015,000 OPS 15,000 OPS

3x

10x

Create indices

Billions in data management savings available

RDBMS ideal for intended purpose, will continue to be appropriate for debit-credit data – costly overkill for most new data

6

Relational databasetechnology ideal

Alternative database technology needed

Relational database technology was $18.8 billion market in 2007 (IDC)

Big leap from relational database to alternatives

7

Where do I start? What data should I move first? Which alternative database technology will “win”? This looks really complicated.

NorthScale solution.

“ I can’t tell you how many email requests I’ve received from our developers asking for something that is as simple and fast as memcached, but that promises data durability. Cassandra is just far too complex and heavyweight and we won’t be doing any more deployments. NorthScale is definitely on to something here. ”

Director of EngineeringLeading Social Network

Before: Where you are today

9

Relational database technology powers 99.999% of web applications.

Step 1: Cache relational data in memcached

10

Memcached is simple, fast and infinitely scalable. It is easy to adopt, and delivers immediate cost, performance and scalability benefits.

NorthScale Memcached Servers

Relational Database

Step 2: Gradually migrate data to membase

11

NorthScale Memcached Servers

Relational Database

NorthScale Membase Servers

After: Elastic compute and data layersData layer now scales with linear cost and constant performance.

Application Scales OutJust add more commodity web servers

12

Database Scales OutJust add more commodity data servers

Scaling out flattens the cost and performance curves.

An evolutionary path toward elastic data

13

NorthScale Membase Server

Membase is an elastic key-value database

15

Membase data servers

In the data center

Web application server

Application user

On the administrator console

Five minutes or less to a working cluster• Downloads for Linux and Windows• Start with a single node• One button press joins nodes to a clusterEasy to develop against• Just SET and GET – no schema required• Drop it in. 10,000+ existing applications

already “speak membase” (via memcached)• Practically every language and application

framework is supported, out of the boxEasy to manage• One-click failover and cluster rebalancing• Graphical and programmatic interfaces• Configurable alerting

Membase is Simple, Fast, Elastic

16


17

Predictable• “Never keep an application waiting”• Quasi-deterministic latency and throughput

Low latency• Auto-migration of hot data to lowest latency

storage technology (RAM, SSD, Disk)• Selectable write behavior – asynchronous,

synchronous (on replication, persistence)• Back-channel rebalancing [FUTURE]

High throughput• Multi-threaded• Low lock contention• Asynchronous wherever possible• Automatic write de-duplication


18

Scale out• Spread I/O and data across commodity

servers (or VMs) • Consistent performance with linear cost• Dynamic rebalancing of a live clusterAll nodes are created equal• No special case nodes• Clone to growExtensible• Filtered TAP interface provides hook points

for external systems (e.g. full-text search, backup, warehouse)

• Data bucket – engine API for specialized container types

• Membase NodeCode [FUTURE]

vBucket mapping

19

Key1Key2

All possiblemembase keys

Key3Key4Key5Key6Key7Key8Key9Key10

Keym

vBucket1

vBucket2

vBucket3

vBuckets

vBucketn

Server1 / Server2, Server3



Key vBucket(hash function)

vBucket Servers(table lookup)

Serverp / Serverq, Serverr

Host Server/Replica Servers

vBucket‐Server Map ‐ Example

vBuckets

vBucket5 ServerC / ServerA, ServerB

vBucket1 ServerA / ServerB, ServerC

Host Server/Replica Servers

vBucket3 ServerB / ServerA, ServerC

vBucket6 ServerC / ServerA, ServerB

vBucket2 ServerA / ServerB, ServerC

vBucket4 ServerB / ServerA, ServerC

Deployment options

20

applicationlogic

OTC memcached

client

data operations

applicationlogic

OTC memcached

client

data operations

cluster operations

11211

serverlist

OTC Memcached Server

11211

Membase Server

serverlist

proxy vbucketmap

applicationlogic

OTC memcached

client

Membase Server

localhost

proxyvbucket

map

applicationlogic

NEWmemcached

client

Membase Server

vbucketmap

Embedded proxy Standalone proxy “vBucket-aware” client

Deployment Option 1 Deployment Option 2 Deployment Option 3

11210

data operations

cluster operations

11211

proxy vbucketmap

11210

data operations

cluster operations

11211

proxy vbucketmap

11210

Membase “write” data flow – application view

21

User action results in the need to change the VALUE of KEY

Application updates key’s VALUE, performs SET operation

Membase (memcached) client hashes KEY, identifies KEY’s master serverSET request sent over

network to master server

Membase replicates KEY-VALUE pair, caches it in memory and stores it to disk

1

2

34

5

Listener‐Sender

DiskDisk Disk

RAM*

mem

base storage engine

SSDSSD SSD

Listener‐Sender

DiskDisk Disk

RAM*

mem

base storage engine

SSDSSD SSD

Membase data flow – under the hood

22

SET request arrives at KEY’s master server

Listener-Sender

Master server for KEY Replica Server 2 for KEYReplica Server 1 for KEY

2 2

1 SET acknowledgement returned to application5

DiskDiskDiskDisk DiskDisk

RAM*

mem

base

sto

rage

eng

ine

SSDSSDSSDSSD SSDSSD

3

4

moxi

11211 11210

memcachedprotocol listener/sender

membase storage engine

engine interface

memcapable 1.0 memcapable 2.0

21100 – 2119943698080

httpR

ES

T m

anag

emen

t AP

I/Web

UI

Hea

rtbea

t

Pro

cess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Con

figur

atio

n m

anag

er

on each node

Erlang/OTP

Reb

alan

ce o

rche

stra

tor

Nod

e he

alth

mon

itor

one per cluster

vBuc

ket s

tate

and

repl

icat

ion

man

ager

HTTP distributed erlangerlang port mapper

Data Manager Cluster Manager

Membase Architecture

moxi

11211 11210

memcachedprotocol listener/sender

membase storage engine

engine interface

memcapable 1.0 memcapable 2.0

21100 – 2119943698080

httpR

ES

T m

anag

emen

t AP

I/Web

UI

Hea

rtbea

t

Pro

cess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Con

figur

atio

n m

anag

er

on each node

Erlang/OTP

Reb

alan

ce o

rche

stra

tor

Nod

e he

alth

mon

itor

one per cluster

vBuc

ket s

tate

and

repl

icat

ion

man

ager

HTTP distributed erlangerlang port mapper

Membase Architecture

Data buckets are secure membase “slices”

25

Membase data servers

In the data center

Web application server

Application user

On the administrator console

Bucket 1

Bucket 2

Aggregate Cluster Memory and Disk Capacity

Leading cloud service (PAAS) providerOver 65,000 hosted applicationsNorthScale Memcached Server serving over 1,200 Heroku customers (as of June 10, 2010)

NorthScale in production

26

Social game leader – FarmVille, Mafia Wars, Café WorldOver 230 million monthly usersNorthScale Membase Serveris the 500,000 ops-per-second database behind FarmVille and Café World

Wednesday, July 14, 2010

Evolving a New Analytical PlatformWhat Works and What’s Missing

Jeff HammerbacherChief Scientist, ClouderaJuly 14, 2010


My BackgroundThanks for Asking

▪ [email protected]▪ Studied Mathematics at Harvard▪ Worked as a Quant on Wall Street▪ Conceived, built, and led Data team at Facebook▪ Nearly 30 amazing engineers and data scientists▪ Several open source projects and research papers

▪ Founder of Cloudera▪ Chief Scientist▪ Also, check out the book “Beautiful Data”


mailto:[email protected]

mailto:[email protected]

Presentation Outline▪ 1. Defining the Platform▪ BI: Science for Profit▪ Need tools for whole research cycle▪ SQL Server 2008 R2: defining the platform

▪ 2. State of the Platform Ecosystem▪ 3. Foundations for a New Implementation▪ Hadoop▪ Boiling the Frog

▪ 4. Future Developments▪ Questions and Discussion


1. Defining the Platform


BI is looking more like science (for profit)


Jim Gray: Science entering Fourth Paradigm“We have to do better at producing tools to

support the whole research cycle”


RDBMS only a small part of this tool set


Example: SQL Server 2008 R2


RDBMS: SQL Server


RDBMS: SQL ServerETL: SQL Server Integration Services



Reporting: SQL Server Reporting Services



Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services




Search: Full-Text Search





CEP: StreamInsight





CEP: StreamInsight

OLAP: PowerPivot





CEP: StreamInsight

OLAP: PowerPivot

MDM: Master Data Services





CEP: StreamInsight

OLAP: PowerPivot

MDM: Master Data ServicesCollaboration: SharePoint


What do we call this unified suite?


For today: Analytical Data Platform


LAMP Stack for Analytical Data ManagementFor today: Analytical Data Platform


2. The State of the Platform Ecosystem


Who makes up the platform ecosystem?


Platform Providers


Platform ProvidersInfrastructure Providers



Application Developers



Application Developers

Content Providers



Application DevelopersEnd Users

Content Providers


What is new about the ecosystem today?


Content Providers1. > 95% of enterprise data is unstructured

2. Data volumes growing rapidly


Infrastructure Providers1. Cloud

2. Warehouse-Scale Computers


Platform Providers1. Open source

2. Driven by consumer web properties


Application Developers1. Data Scientists

2. Diversity of languages


End Users1. Browser is the client

2. Tell a story about the business


3. Foundations for a New Implementation


New foundations: HDFS and MapReduce


2005: Doug/Mike start project inside Nutch


2006: Doug joins Yahoo!


2007: Make Hadoop scale


2007: Make Hadoop scaleYahoo! makes Pig open source


2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture

Yahoo! makes Pig open source




Randy Bryant’s “DISC” lecture




Randy Bryant’s “DISC” lecture

Powerset makes HBase open source


2008: Make Hadoop fast


2008: Make Hadoop fastYahoo! wins Daytona terabyte sort benchmark


2008: Make Hadoop fastFirst Hadoop Summit

Yahoo! wins Daytona terabyte sort benchmark



Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop




Facebook makes Hive open source




Facebook makes Hive open source“MapReduce: A Major Step Backwards”


2009: Insert Hadoop into the enterprise


2009: Insert Hadoop into the enterpriseCloudera releases CDH



First Hadoop World NYC



First Hadoop World NYCYahoo! sorts a petabyte with Hadoop




Cloudera adds training, support, services




Cloudera adds training, support, services

“The Unreasonable Effectiveness of Data”


2010: Integrate Hadoop into the enterprise


2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights



Yahoo! completes enterprise-class security




Datameer and Karmasphere funded





Quest, Talend, Netezza, and more integrate





Quest, Talend, Netezza, and more integrateHive adds JDBC and ODBC


Hadoop will be an Analytical Data Platform


4. Future Developments


Capture: Log collection and CEP


Curate: Workflow and Scheduling


Curate: Secondary and Full-Text Indexing


Curate: Learn Structure from Data


Analyze: Mesos-enabled frameworks


Analyze: Link working set and historical data


All behind a single user interface


HUEMaking Many Computers Feel Like One


!"#$%&'()* !"#$%"&'$"()*+(%*,-.((/0*12%#"()*30*"#*$42*

2)$2%/%"#2*(/2)*#('%52*/6-$+(%7*+(%*5(7/628*.-$-

! !"#$%&'#$()! '**)+,-.,"$"#/)0)12"+#3,"/)3"#$&,.$&'#$)43#5),"$)

"#$%&'()%&($*+&),%"#-"(-)./01,! 63-.*313$()! 7*,2($&')-'"'%$/)

&$823&$()+,-.,"$"#)9$&/3,"/)

0)($.$"($"+3$/

! :.$")/,2&+$)! ;<<=)>.'+5$)

*3+$"/$(

! ?$*3'@*$)! .'#+5$()43#5)13A$/)

1&,-)12#2&$)&$*$'/$/)#,)

3-.&,9$)/#'@3*3#B

! 62..,&#$()! 7*,2($&')$-.*,B/)CD<=),1)#5$).&,E$+#)1,2"($&/)'"()

'#)*$'/#),"$)+,--3##$&)1,&)CF<=),1)#5$/$),.$")/,2&+$)

+,-.,"$"#/G


(c) 2010 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0


ioMemory for Scale-out

Robert Wipfel, Fellow [email protected]

14th July, 2010, Accel Partners Panel Discussion

Factors impacting Scale-out

Balance • CPU • Disk • Network

Contention • Sharing • Locking

Throughput • IOPS • Bandwidth

Latency • Distributed • Dependencies

Graceful Recovery • No SPOFs • Fast Replay

Energy • Servers • RAM • Disks

Management and Monitoring

Need Disk

What’s *really* Needed…

Want •  Really fast

Don’t Want •  Volatile •  Expensive •  Limited capacity

Want •  Non-volatile •  Cheap •  Large capacity

Don’t Want •  Really slow

Want •  Non-volatile •  Really fast •  Large capacity •  Reasonable price •  Low energy

DRAM

Solution: ioMemory

A disruption called ioMemory

•  High speed like DRAM

•  Persistence and capacity of disks

PCIe based NAND Flash Storage

•  Very high IOPS

•  Micro-second latency

•  Very high data throughput

DRA

M

L1

SAN

, N

AS,

RA

IDed

DA

S

L2

L3

6 orders of magnitude

SSD

s



50µs (10E-‐6)

ioM

emor

y

Millisecond (10E-3) Nanosecond (10E-9) ACCESS DELAY IN TIME

Why is it called ioMemory?

Fusion-io ioDrive Maximum Write

24 GB, Flash, PCIe x4

Fusion-io ioDrive Improved Write


Fusion-io ioDrive Maximum Capacity


SSD SATA Vendor A 3.0Gbps 2.5 RAID 0

128 GB, Flash SATA/300

SSD SATA Vendor B 3.0Gbps 2.5 RAID 0


SSD SATA Vendor C 32 GB, Flash SATA/300

H2benchw 3.6: Interface Bandwidth MB/s

Raw Storage Performance

7/14/10

Application Performance

Fusion-io ioDrive Maximum Write


Fusion-io ioDrive Improved Write


Fusion-io ioDrive Maximum Capacity


SSD SATA Vendor A 3.0Gbps 2.5 RAID 0


SSD SATA Vendor B 3.0Gbps 2.5 RAID 0


SSD SATA Vendor C 32 GB, Flash SATA/300

IOMeter Database Benchmark I/O: Average Throughput MB/s

2x Faster Storage I/O

50x Faster Application I/O

ioMemory Performance

PCI bus protection

Checksums Poison bit

Strong ECC Wear leveling

Bad block re-mapping

Data labeling Parity-

protected pipelines

Flashback Chip protection

Power cut protection

ioMemory Reliability

MTBF = 2 Million Hours +

SSD

SSD

5

RAID Controller Application CPU

6 5

1

ioMemory

ioMemory

Application CPU

1

2

4

3

SSD

4b

3b

2

3a

4a

8 9

ioMemory is not a Solid State Disk

KI

LO

WA

TT

S

97 kWh/yr

3,013 kWh/yr

133,493 kWh/yr

15,000 RPM FC HDD

ioDrive Fusion-io

SSD ZeusIOPS

ioMemory is Green

Case Study

One of the world’s fastest growing Webmonsters

•  Over 900% more database queries per second

•  Dramatically improved server replication for most current data

•  Over 800% improvement to disaster recovery back-up time

•  Cut server footprint, power costs, and IT overhead by 75%

•  Full and immediate ROI on repurposed servers with

•  Continued ROI on operational cost saving

Case Study

Case Study

•  5x improvement to

•  Database replication performance

•  Data intensive query response

•  Analysis routines

•  Eliminating 210 failure points from system

•  Implemented full system redundancy

•  Dramatically lowered power and cooling expenses

Internet security company that protects over 1 billion inboxes

Case Study

Disruption

By deploying ioMemory… Cloudmark eliminated the need for this…

Department of Defense takes NASTRAN from 3-days to 6-hours

Demos Dynamics NAV can get a 4x performance improvement

Other Customer Examples

HMO achieves a 200 HDD to 1 ioDrive reduction for their Data Warehouse

Does a 30 to 1 box reduction for their reliable messaging system

Shows a 35x performance increase of unstructured search at OracleWorld

Stock exchange doubles the performance of their trading systems

ioMemory Products

160 GB •  116,046 (4k read packet size) •  93,199 (75/25 r/w mix 4k packet size)





19 Confiden8al Informa8on: Fusion-‐io

OEM Partners

20

Questions?

T H A N K Y O U