11 Indranil Gupta (Indy) Lecture 4 Cloud Computing: Older Testbeds January 28, 2010 CS 525 Advanced...

Preview:

Citation preview

11

Indranil Gupta (Indy)Lecture 4

Cloud Computing: Older TestbedsJanuary 28, 2010

CS 525 Advanced Distributed

SystemsSpring 2010

All Slides © IG

2

Administrative Announcements

Office Hours Changed from Today onwards:

• Tuesdays 2-3 pm (same as before)• Thursdays 3-4 pm (new)

• My office 3112 SC

3

Administrative Announcements

Student-led paper presentations (see instructions on website)• Start from February 11th• Groups of up to 2 students present each class, responsible

for a set of 3 “Main Papers” on a topic– 45 minute presentations (total) followed by discussion– Set up appointment with me to show slides by 5 pm day prior to

presentation– Select your topic by Jan 31st

• List of papers is up on the website• Each of the other students (non-presenters) expected to read

the papers before class and turn in a one to two page review of the any two of the main set of papers (summary, comments, criticisms and possible future directions)– Email review and bring in hardcopy before class

4

Announcements (contd.)

Projects• Groups of 2 (need not be same as

presentation groups)• We’ll start detailed discussions “soon” (a few

classes into the student-led presentations)

5

1940

1950

1960

1970

1980

1990

2000

Timesharing Companies & Data Processing Industry

2010

Grids

Peer to peer systems

Clusters

The first datacenters!

PCs(not distributed!)

Clouds and datacenters

“A Cloudy History of Time” © IG 2010

6

• Can there be a course devote to purely cloud computing that touches on only results within the last 5 years? – No!

• Since cloud computing is not completely new, where do we start learning about its basics?– From the beginning: distributed

algorithms, peer to peer systems, sensor networks

More Discussion Points

7

That’s what we do in CS525

• Basics of Peer to Peer Systems– Read papers on Gnutella and Chord

• Basics of Sensor Networks– See links

• Basics of Distributed Algorithms

Yeah! Let’s go to the basics.

8

Hmm, CCT and OpenCirrus are new. What about classical

testbeds?

• A community resource open to researchers in academia and industry• http://www.planet-lab.org/• Currently, 1077 nodes at 494 sites across the world• Founded at Princeton University (led by Prof. Larry Peterson), but owned in a federated manner by

494 sites

• Node: Dedicated server that runs components of PlanetLab services.• Site: A location, e.g., UIUC, that hosts a number of nodes.• Sliver: Virtual division of each node. Currently, uses VMs, but it could also other technology.

Needed for timesharing across users.• Slice: A spatial cut-up of the PL nodes. Per user. A slice is a way of giving each user (Unix-shell

like) access to a subset of PL machines, selected by the user. A slice consists of multiple slivers, one at each component node.

• Thus, PlanetLab allows you to run real world-wide experiments.• Many services have been deployed atop it, used by millions (not just researchers): Application-level

DNS services, Monitoring services, CDN services.• If you need a PlanetLab account and slice for your CS525 experiment, let me know asap! There are

a limited number of these available for CS525.

9All images © PlanetLab

Emulab• A community resource open to researchers in academia and industry• https://www.emulab.net/ • A cluster, with currently 475 nodes• Founded and owned by University of Utah (led by Prof. Jay Lepreau)

• As a user, you can:– Grab a set of machines for your experiment– You get root-level (sudo) access to these machines– You can specify a network topology for your cluster (ns file format)

• Thus, you are not limited to only single-cluster experiments; you can emulate any topology

• Is Emulab a cloud? Is PlanetLab a cloud?

• If you need an Emulab account for your CS525 experiment, let me know asap! There are a limited number of these available for CS525.

10All images © Emulab

11

And then there were…Grids!

What is it?

12

Example: Rapid Atmospheric Modeling System, ColoState U

• Hurricane Georges, 17 days in Sept 1998– “RAMS modeled the mesoscale convective

complex that dropped so much rain, in good agreement with recorded data”

– Used 5 km spacing instead of the usual 10 km– Ran on 256+ processors

• Can one run such a program without access to a supercomputer?

13

Wisconsin

MITNCSA

Distributed ComputingResources

14

An Application Coded by a Physicist

Job 0

Job 2

Job 1

Job 3

Output files of Job 0Input to Job 2

Output files of Job 2Input to Job 3

Jobs 1 and 2 can be concurrent

15

An Application Coded by a Physicist

Job 2

Output files of Job 0Input to Job 2

Output files of Job 2Input to Job 3

May take several hours/days4 stages of a job

InitStage inExecuteStage outPublish

Computation Intensive, so Massively Parallel

Several GBs

16

Wisconsin

MITNCSA

Job 0

Job 2Job 1

Job 3

17

Job 0

Job 2Job 1

Job 3

Wisconsin

MIT

Condor Protocol

NCSAGlobus Protocol

18

Job 0

Job 2Job 1

Job 3Wisconsin

MITNCSA

Globus Protocol

Internal structure of differentsites invisible to Globus

External Allocation & SchedulingStage in & Stage out of Files

19

Job 0

Job 3Wisconsin

Condor Protocol

Internal Allocation & SchedulingMonitoringDistribution and Publishing of Files

20

Tiered Architecture (OSI 7 layer-like)

Resource discovery,replication, brokering

High energy Physics apps

Globus, Condor

Workstations, LANs

21

The Grid RecentlySome are 40Gbps links!(The TeraGrid links)

“A parallel Internet”

22

Globus Alliance

• Alliance involves U. Illinois Chicago, Argonne National Laboratory, USC-ISI, U. Edinburgh, Swedish Center for Parallel Computers

• Activities : research, testbeds, software tools, applications

• Globus Toolkit (latest ver - GT3) “The Globus Toolkit includes software services and libraries for

resource monitoring, discovery, and management, plus security and file management.  Its latest version, GT3, is the first full-scale implementation of new Open Grid Services Architecture (OGSA).”

23

Some Things Grid Researchers Consider Important

• Single sign-on: collective job set should require once-only user authentication

• Mapping to local security mechanisms: some sites use Kerberos, others using Unix

• Delegation: credentials to access resources inherited by subcomputations, e.g., job 0 to job 1

• Community authorization: e.g., third-party authentication

• For clouds, you need to additionally worry about failures, scale, on-demand nature, and so on.

24

• Cloud computing vs. Grid computing: what are the differences?

• National Lambda Rail: hot in 2000s, funding pulled in 2009

• What has happened to the Grid Computing Community?– See Open Cloud Consortium– See CCA conference (2008, 2009)

Discussion Points

Backups

25

26

Normal No backup tasks 200 processes killed

Sort

• Backup tasks reduce job completion time a lot!• System deals well with failures

M = 15000 R = 4000

Workload: 1010 100-byte records (modeled after TeraSort benchmark)

27

More

• Entire community, with multiple conferences, get-togethers (GGF), and projects

• Grid Projects:http://www-fp.mcs.anl.gov/~foster/grid-projects/

• Grid Users: – Today: Core is the physics community (since the Grid originates

from the GriPhyN project)– Tomorrow: biologists, large-scale computations (nug30

already)?

28

Grid History – 1990’s• CASA network: linked 4 labs in California and New Mexico

– Paul Messina: Massively parallel and vector supercomputers for computational chemistry, climate modeling, etc.

• Blanca: linked sites in the Midwest– Charlie Catlett, NCSA: multimedia digital libraries and remote

visualization

• More testbeds in Germany & Europe than in the US• I-way experiment: linked 11 experimental networks

– Tom DeFanti, U. Illinois at Chicago and Rick Stevens, ANL:, for a week in Nov 1995, a national high-speed network infrastructure. 60 application demonstrations, from distributed computing to virtual reality collaboration.

• I-Soft: secure sign-on, etc.

Recommended