24
UC Berkeley 1 Above the Clouds: A View From Academia Armando Fox, UC Berkeley EDUSERV Symposium, 12 May 2011 Presentation slides licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License. Image: John Curley http://www.flickr.com/photos/jay_que/1834540/

Above the Clouds: A View From Academia

  • Upload
    eduserv

  • View
    3.271

  • Download
    0

Embed Size (px)

DESCRIPTION

The closing keynote by Armando Fox at the Eduserv Symposium 2011 - Virtualisation and the Cloud.

Citation preview

Page 1: Above the Clouds: A View From Academia

UC Berkeley

1

Above the Clouds:A View From Academia

Armando Fox, UC BerkeleyEDUSERV Symposium, 12 May 2011

Presentation slides licensed under Creative Commons Attribution-ShareAlike 3.0 Unported License.

Image: John Curley http://www.flickr.com/photos/jay_que/1834540/

Page 2: Above the Clouds: A View From Academia

Who Am I?

• Research: Internet-scale systems; productive parallel programming

• Teaching: software engineering• Writing: co-author Above the Clouds tech report• Disclaimer 1: I don’t speak for UC• Disclaimer 2: Relationship with Amazon

Page 3: Above the Clouds: A View From Academia

How We Got Into the Cloud: RAD Lab’s 5-year Mission

Enable 1 entrepreneur to prototype a great Web app over 3-day weekend, then deploy at scale

• Key technology: Statistical machine learning• Early critiques: “Demonstrate your ideas at scale!”• Moved from Sun Blackbox to EC2 in mid-2008

• Feb. 2009: Above the Clouds tech report*– Over 50K downloads, influenced high-profile IT co.’s

3

* abovetheclouds.cs.berkeley.edu, or CACM April 2010

Page 4: Above the Clouds: A View From Academia

Outline: Two Themes

• Academic clouds: public or private? – Theme 1: save money or improve research?– Theme 2: cloud user or cloud provider?

• Assumption: familiar with cloud basics

– Public/pay-as-you-go

– Private/closed/“condo”

• Non-goal: regulatory thickets around cloudifying “sensitive” information

Page 5: Above the Clouds: A View From Academia

Public Cloud: CS Research

• Over $350,000 spent on AWS since 2008 – PhD student ~ US$75k/year => cloud ~ 1/3 student/mo.

• Experiments: 100-300 nodes common, 900 max– large-scale storage, cloud programming, MapReduce– results at scale now required for top-tier conferences– most experiments last 0-4 hours– “Small” experiments also in cloud for convenience

• Comparison: Sun BlackBox at Berkeley– $200k acquire & install– $300k+ in hardware donations – staff: ≥0.5 FTE

Page 6: Above the Clouds: A View From Academia

Public Cloud: CS Education

• Great Ideas in Computer Architecture (reinvented Fall 2010): 190 students

• Software Engineering for Software-as-a-Service: 70 students

• Operating Systems: 70 students• Intro. Data Science: 30 students • Adv. topics in HCI: 20 students• Natural language processing: 20 students• Large-scale programming abstractions for the cloud:

~20 students (Fall 2011)

Administration, provisioning, sizing much easier on public cloud than UC instructional computing

Page 7: Above the Clouds: A View From Academia

Cloud Economics

• “Private should be cheaper if you have stable utilization”

Demand

Capacity

Time

Demand

Capacity

Time

Page 8: Above the Clouds: A View From Academia

$Private < $Public?

• Capital: hardware, networking, power 5-7x cheaper at 100K’s scale (Hamilton 2007)

• Operations: heavy automation => 1000’s machines per FTE admin

• R&D: cloud providers had to serve internal business need

• Services: “Scale makes availability affordable”: wide-area disaster recovery facilities

• Hidden/shared costs: power, cooling, staff, ....

Page 9: Above the Clouds: A View From Academia

Hard to Compete on Cost

• Zero-touch metering/billing infrastructure• Optimized for low margin

– $0.08/hr: virtual CPU on EC2– $0.02-0.08/hr: as-available “spot instances”– Reserved (prepay 1-3 years, save if utilization > 25%)

– $0.00/hr: 1 year free usage tier for all services– Private: ≥ $0.075/hr ($2000 private server amortized

over 3 years with no indirect costs)

• “Moving to EC2 would cost about a factor of 2”– Highly placed colleague at major social site

Page 10: Above the Clouds: A View From Academia

Try to smooth out peaks?

• Not waiting in queues accelerates research!– Run several experiments simultaneously, each using

100’s of machines for 1-2 hours, without queueing up

– Basic queueing theory: trade utilization vs service time

– Better performance isolation than private cloud (!)

– N.B. for long jobs, some queueing may be OK

• Corollary 1: cost-associative billing encourages research spontaneity

• Corollary 2: incentive to stop using is important!

Effective metering & billing is key to on-demand usage model

Page 11: Above the Clouds: A View From Academia

Example: wait times on UC Berkeley “Mako” cluster

Mako has 272 dual-socket (quad-core per socket) nodes with 24 GB RAM each

Source: ShaRCS—Shared Research Computing Services, presentation by UC Office of the President at the UC Cloud Summit, April 2011

Page 12: Above the Clouds: A View From Academia

On the other hand...Big Data

Application Data generated per day

DNA Sequencing (Illumina HiSeq machine)

1 TB

Large Synoptic Survey Telescope 30 TB; 400 Mbps sustained data rate between Chile and NCSA

Large Hadron Collider 60 TB

* Simson L. Garfinkel, An Evaluation of Amazon’s Grid Computing Services: EC2, S3 and SQS, Technical Report TR-08-07, School of Engineering & Applied Sciences, Harvard University, 2008.Source: Ed Lazowska, eScience 2010, Microsoft Cloud Futures Workshop, lazowska.cs.washington.edu/cloud2010.pdf

• Challenge: Long-haul networking is most expensive cloud resource, and improving most slowly

• Copy 8 TB to EC2 at ~20 Mbps*: ~35 days, ~$800• Ship four 2 TB drives to Amazon: 1 day, ~$150• Can private/shared networking resources be combined

with public cloud to get best of both?

Page 13: Above the Clouds: A View From Academia

On the other hand...Cloud Provider

• Hard research on public cloud:– scheduling/provisioning research– security: honeypots, malware

containment,epidemic modeling

– energy efficiency or other physical monitoring– experimenting with networking fabric, multicast, etc.

• N.B., cloud provider research needs cloud users!– Example: Microsoft Research Silicon Valley

“Sherwood” cluster (~240 nodes)

Demanding customers drove cloud research

Page 14: Above the Clouds: A View From Academia

Nonprofit/Academic clouds

• PlanetLab & Emulab– highly successful from their customers’ point of view– lots of great research, some of which might have been

impossible on today’s public cloud

• Academic/research clusters– Yahoo/IBM/M45 cluster, Google/IBM cluster, TerraGrid: primarily

application-level research– OpenCirrus (HP/Intel/Yahoo/UIUC/IDA Singapore/Karlsruhe):

bare-metal, federated, 1K+ cores/site

• Access model: write proposal; closed community • Saving money is non-goal (in fact, a subsidized

investment by universities & industrial partners)

Page 15: Above the Clouds: A View From Academia

OpenCirrus

• Infrastructure costs increase with # sites• Claim: even at ~50% utilization, owning your infrastructure pays for itself

in ~3 years Source: R. Campbell et al., OpenCirrus..., Proc. 2011 Workshop on Hot Topics in Cloud Computing (HotCloud’09),

June 2011 (to appear)

Page 16: Above the Clouds: A View From Academia

Public & private clouds don’t see same benefits

Benefit Public Private

“infinite” resources on-demand Yes No

Instantaneous provisioning Yes Varies

Better hardware Yes No

Zero-commitment pay-as-you-go* Yes No

Reduced costs from economy of scale Yes No

Can do “cloud provider” research No Yes

Can trust co-tenants No Yes

Better utilization through virtualization Yes Yes

Quickly & inexpensively move big data No Yes

Address data-custody regulatory issues Varies Yes16* Implies ability to meter, and incentive to release idle resources

Page 17: Above the Clouds: A View From Academia

UC Berkeley

So You Want to Build a Cloud...

• Single point of failure?

• Zero Touch?

• Hidden costs?

Page 18: Above the Clouds: A View From Academia

Single point of failure

• 30+ hour EBS outage on 21 April 2011– triggered by human error (network config change)

• Georedundant services (Netflix) largely unaffected– At least, georedundancy was an available option!

• Non-redundant services had catastrophic outages

• Question: would “more” operational expertise have resolved outage faster?

Page 19: Above the Clouds: A View From Academia

Metering and Billing

• Billing is policy. Metering is mechanism.– Pay-as-you-go policy allows cost associativity– Any policy only as flexible as its mechanism – Amazon’s mechanism: “zero touch” metering– So, Virtual Private Cloud ≠ your private cloud

• Which of these need human intervention:– Signing up? Provisioning? Deploying? Billing?– Academic/nonprofit clouds don’t even try this

Page 20: Above the Clouds: A View From Academia

Hidden Costs

• Single billing scheme captures all costs, or must some costs be billed/accounted separately?– shared expenses: power, networking– general employment benefits/overhead for staff

• Cost of keeping up with innovation– On average, AWS has deployed 1 new service every

2 months since EC2 beta launch*

• Competition from new providers will exacerbate– Microsoft Azure, VMware CloudFoundry, ...

* 21 Web service APIs as of April 2011

Page 21: Above the Clouds: A View From Academia

Two themes

• Academic clouds: public or private? – Theme 1: save money or improve research?– Theme 2: cloud user or cloud provider?

• Capability– Cloud accelerates and enables new research– Scale that can’t be achieved any other way

• Cost– Will private cloud cost less? Is that the main goal?– Have hidden costs been accounted for?– Cost-associativity allows bursty use, encourages

spontaneity, but needs fine grained metering

Page 22: Above the Clouds: A View From Academia

Two themes

• Academic clouds: public or private? – Theme 1: save money or improve research?– Theme 2: cloud user or cloud provider?

• Cloud provider research may require private cloud– Security, energy, bare-metal, cloud provisioning, ...

– But, still need cloud users (customers) to drive/validate

– Need public-cloud-level APIs, service reliability

• Cloud user– Big data may impede some public-cloud-ready apps

– Exotic architectures (SSD, in-memory DB, ...)

– Regulatory issues....

Page 23: Above the Clouds: A View From Academia

Summary

• Public cloud shows how to “move slider” between insourcing & outsourcing

• Unlikely to compete on cost with very large scale public clouds

• So, how much can/should you outsource......for technical reasons (types of research possible)?

...for regulatory reasons (data privacy, etc.)?

• Remember the non-obvious costs– Metering & billing, esp. for shared overheads– Keeping up with the ecosystem

Page 24: Above the Clouds: A View From Academia

Thanks!

• UC Berkeley Reliable Adaptive Distributed Systems Lab & Affiliates

• UC Cloud Computing Task Force• Andy Powell & Eduserv

RAD Lab Team in 2009