28
1 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

Embed Size (px)

Citation preview

Page 1: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

11

Indranil Gupta (Indy)Lecture 4

Cloud Computing: Older TestbedsJanuary 28, 2010

CS 525 Advanced Distributed

SystemsSpring 2010

All Slides © IG

Page 2: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

2

Administrative Announcements

Office Hours Changed from Today onwards:

• Tuesdays 2-3 pm (same as before)• Thursdays 3-4 pm (new)

• My office 3112 SC

Page 3: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

3

Administrative Announcements

Student-led paper presentations (see instructions on website)• Start from February 11th• Groups of up to 2 students present each class, responsible

for a set of 3 “Main Papers” on a topic– 45 minute presentations (total) followed by discussion– Set up appointment with me to show slides by 5 pm day prior to

presentation– Select your topic by Jan 31st

• List of papers is up on the website• Each of the other students (non-presenters) expected to read

the papers before class and turn in a one to two page review of the any two of the main set of papers (summary, comments, criticisms and possible future directions)– Email review and bring in hardcopy before class

Page 4: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

4

Announcements (contd.)

Projects• Groups of 2 (need not be same as

presentation groups)• We’ll start detailed discussions “soon” (a few

classes into the student-led presentations)

Page 5: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

5

1940

1950

1960

1970

1980

1990

2000

Timesharing Companies & Data Processing Industry

2010

Grids

Peer to peer systems

Clusters

The first datacenters!

PCs(not distributed!)

Clouds and datacenters

“A Cloudy History of Time” © IG 2010

Page 6: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

6

• Can there be a course devote to purely cloud computing that touches on only results within the last 5 years? – No!

• Since cloud computing is not completely new, where do we start learning about its basics?– From the beginning: distributed

algorithms, peer to peer systems, sensor networks

More Discussion Points

Page 7: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

7

That’s what we do in CS525

• Basics of Peer to Peer Systems– Read papers on Gnutella and Chord

• Basics of Sensor Networks– See links

• Basics of Distributed Algorithms

Yeah! Let’s go to the basics.

Page 8: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

8

Hmm, CCT and OpenCirrus are new. What about classical

testbeds?

Page 9: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

• A community resource open to researchers in academia and industry• http://www.planet-lab.org/• Currently, 1077 nodes at 494 sites across the world• Founded at Princeton University (led by Prof. Larry Peterson), but owned in a federated manner by

494 sites

• Node: Dedicated server that runs components of PlanetLab services.• Site: A location, e.g., UIUC, that hosts a number of nodes.• Sliver: Virtual division of each node. Currently, uses VMs, but it could also other technology.

Needed for timesharing across users.• Slice: A spatial cut-up of the PL nodes. Per user. A slice is a way of giving each user (Unix-shell

like) access to a subset of PL machines, selected by the user. A slice consists of multiple slivers, one at each component node.

• Thus, PlanetLab allows you to run real world-wide experiments.• Many services have been deployed atop it, used by millions (not just researchers): Application-level

DNS services, Monitoring services, CDN services.• If you need a PlanetLab account and slice for your CS525 experiment, let me know asap! There are

a limited number of these available for CS525.

9All images © PlanetLab

Page 10: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

Emulab• A community resource open to researchers in academia and industry• https://www.emulab.net/ • A cluster, with currently 475 nodes• Founded and owned by University of Utah (led by Prof. Jay Lepreau)

• As a user, you can:– Grab a set of machines for your experiment– You get root-level (sudo) access to these machines– You can specify a network topology for your cluster (ns file format)

• Thus, you are not limited to only single-cluster experiments; you can emulate any topology

• Is Emulab a cloud? Is PlanetLab a cloud?

• If you need an Emulab account for your CS525 experiment, let me know asap! There are a limited number of these available for CS525.

10All images © Emulab

Page 11: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

11

And then there were…Grids!

What is it?

Page 12: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

12

Example: Rapid Atmospheric Modeling System, ColoState U

• Hurricane Georges, 17 days in Sept 1998– “RAMS modeled the mesoscale convective

complex that dropped so much rain, in good agreement with recorded data”

– Used 5 km spacing instead of the usual 10 km– Ran on 256+ processors

• Can one run such a program without access to a supercomputer?

Page 13: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

13

Wisconsin

MITNCSA

Distributed ComputingResources

Page 14: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

14

An Application Coded by a Physicist

Job 0

Job 2

Job 1

Job 3

Output files of Job 0Input to Job 2

Output files of Job 2Input to Job 3

Jobs 1 and 2 can be concurrent

Page 15: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

15

An Application Coded by a Physicist

Job 2

Output files of Job 0Input to Job 2

Output files of Job 2Input to Job 3

May take several hours/days4 stages of a job

InitStage inExecuteStage outPublish

Computation Intensive, so Massively Parallel

Several GBs

Page 16: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

16

Wisconsin

MITNCSA

Job 0

Job 2Job 1

Job 3

Page 17: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

17

Job 0

Job 2Job 1

Job 3

Wisconsin

MIT

Condor Protocol

NCSAGlobus Protocol

Page 18: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

18

Job 0

Job 2Job 1

Job 3Wisconsin

MITNCSA

Globus Protocol

Internal structure of differentsites invisible to Globus

External Allocation & SchedulingStage in & Stage out of Files

Page 19: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

19

Job 0

Job 3Wisconsin

Condor Protocol

Internal Allocation & SchedulingMonitoringDistribution and Publishing of Files

Page 20: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

20

Tiered Architecture (OSI 7 layer-like)

Resource discovery,replication, brokering

High energy Physics apps

Globus, Condor

Workstations, LANs

Page 21: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

21

The Grid RecentlySome are 40Gbps links!(The TeraGrid links)

“A parallel Internet”

Page 22: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

22

Globus Alliance

• Alliance involves U. Illinois Chicago, Argonne National Laboratory, USC-ISI, U. Edinburgh, Swedish Center for Parallel Computers

• Activities : research, testbeds, software tools, applications

• Globus Toolkit (latest ver - GT3) “The Globus Toolkit includes software services and libraries for

resource monitoring, discovery, and management, plus security and file management.  Its latest version, GT3, is the first full-scale implementation of new Open Grid Services Architecture (OGSA).”

Page 23: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

23

Some Things Grid Researchers Consider Important

• Single sign-on: collective job set should require once-only user authentication

• Mapping to local security mechanisms: some sites use Kerberos, others using Unix

• Delegation: credentials to access resources inherited by subcomputations, e.g., job 0 to job 1

• Community authorization: e.g., third-party authentication

• For clouds, you need to additionally worry about failures, scale, on-demand nature, and so on.

Page 24: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

24

• Cloud computing vs. Grid computing: what are the differences?

• National Lambda Rail: hot in 2000s, funding pulled in 2009

• What has happened to the Grid Computing Community?– See Open Cloud Consortium– See CCA conference (2008, 2009)

Discussion Points

Page 25: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

Backups

25

Page 26: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

26

Normal No backup tasks 200 processes killed

Sort

• Backup tasks reduce job completion time a lot!• System deals well with failures

M = 15000 R = 4000

Workload: 1010 100-byte records (modeled after TeraSort benchmark)

Page 27: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

27

More

• Entire community, with multiple conferences, get-togethers (GGF), and projects

• Grid Projects:http://www-fp.mcs.anl.gov/~foster/grid-projects/

• Grid Users: – Today: Core is the physics community (since the Grid originates

from the GriPhyN project)– Tomorrow: biologists, large-scale computations (nug30

already)?

Page 28: 11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced Distributed Systems Spring 2010 All Slides © IG

28

Grid History – 1990’s• CASA network: linked 4 labs in California and New Mexico

– Paul Messina: Massively parallel and vector supercomputers for computational chemistry, climate modeling, etc.

• Blanca: linked sites in the Midwest– Charlie Catlett, NCSA: multimedia digital libraries and remote

visualization

• More testbeds in Germany & Europe than in the US• I-way experiment: linked 11 experimental networks

– Tom DeFanti, U. Illinois at Chicago and Rick Stevens, ANL:, for a week in Nov 1995, a national high-speed network infrastructure. 60 application demonstrations, from distributed computing to virtual reality collaboration.

• I-Soft: secure sign-on, etc.