41
CIS 6930.008: Internet-Scale Networked Systems Adriana Iamnitchi (Anda) [email protected]

CIS 6930.008: Internet-Scale Networked Systems

  • Upload
    vita

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

CIS 6930.008: Internet-Scale Networked Systems. Adriana Iamnitchi (Anda) [email protected]. Contact Info. Email : [email protected] Office : ENB 334 Office hours : Wed 2-4 and by appointment (email me) Course page : http://www.csee.usf.edu/~anda/cis6930.008. CIS 6930.008: Course Goals. - PowerPoint PPT Presentation

Citation preview

Page 1: CIS 6930.008:  Internet-Scale  Networked Systems

CIS 6930.008: Internet-Scale

Networked Systems

Adriana Iamnitchi (Anda)

[email protected]

Page 2: CIS 6930.008:  Internet-Scale  Networked Systems

2CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Contact Info

Email: [email protected]: ENB 334Office hours: Wed 2-4 and by appointment (email me)Course page: http://www.csee.usf.edu/~anda/cis6930.008

Page 3: CIS 6930.008:  Internet-Scale  Networked Systems

3CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

CIS 6930.008: Course Goals

Primary– Gain deep understanding of fundamental issues

that affect design of large-scale federated distributed systems

– Map primary contemporary research themes

– Gain experience in distributes systems research Secondary

– By studying a set of outstanding papers, build knowledge of how to present research

– Learn how to read papers & evaluate ideas

Page 4: CIS 6930.008:  Internet-Scale  Networked Systems

4CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

What I’ll Assume You Know

Basic Internet architecture– IP, TCP, DNS, HTTP

Basic principles of distributed computing– Asynchrony (cannot distinguish between

communication failures and latency)

– Partial global state knowledge (cannot know everything correctly)

– Failures happen. In very large systems, even rare failures happen often

If there are things that don’t make sense, ask!

Page 5: CIS 6930.008:  Internet-Scale  Networked Systems

5CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Examples of Distributed Systems

ATT web Gnutella network

The InternetA Sensor Network

Page 6: CIS 6930.008:  Internet-Scale  Networked Systems

6CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Definition (a version)

A distributed system is a collection of autonomous, programmable, failure-prone entities that are able to communicate through a communication medium that is unreliable.– Entity=a process on a device (PC, PDA, mote)– Communication Medium=Wired or wireless

network “Internet-Scale”:

– Spanning multiple institutional or network (DNS) domains

– (Much) Larger than “cluster”

Page 7: CIS 6930.008:  Internet-Scale  Networked Systems

7CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

This semester’s Theme (a proposal)

Exploiting

Emergent Behavior

in Large-Scale Distributed Systems

Page 8: CIS 6930.008:  Internet-Scale  Networked Systems

Filecules and Small Worlds in a Scientific Workload:

Characteristics and Significance

Page 9: CIS 6930.008:  Internet-Scale  Networked Systems

9CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Grid: Resource-Sharing Environment

Users:

– 1000s from 10s institutions

– Well-established communities Resources:

– Computers, data, instruments, storage, applications

– Owned/administered by institutions Applications: data- and compute-

intensive processing Approach: common infrastructure

Page 10: CIS 6930.008:  Internet-Scale  Networked Systems

10CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

The Problem

We have now:– Mature grid deployments running in production mode

We do not have yet:– Quantitative characterization of real workloads.

> How many files, how much input data per process, etc.– And thus, benchmarks, workload models, reproducible results

Costs:– Local solutions, often replicating work– “Temporary” solutions that become permanent– Far from optimal solutions– Impossible to compare alternatives on relevant workloads

Page 11: CIS 6930.008:  Internet-Scale  Networked Systems

11CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Still, Why Should We Care?

Partial Topology Random 30% die Targeted 4% die

from Saroiu et al., MMCN 2002

Impossibility results, high costs: Tradeoffs are necessary– Solution: Select tradeoffs based on

> User requirements (of course)

> Usage patterns

Patterns exist and can be exploited. Examples: – Zipf distribution for request popularity (web caching) Breslau et

al., Infocom’99

– Network topology:

Page 12: CIS 6930.008:  Internet-Scale  Networked Systems

12CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

The DØ Experiment High-energy physics data grid 72 institutions, 18 countries, 500+ physicists Detector Data

– 1,000,000 Channels– Event rate ~50 Hz– So far, 1.9 PB of data

Data Processing – Signals: physics events– Events about 250 KB, stored in files of ~1GB– Every bit of raw data is accessed for

processing/filtering– Past year overall: 0.6 PB

DØ:– … processes PBs/year– … processes 10s TB/day– … uses 25% – 50% remote computing

Page 13: CIS 6930.008:  Internet-Scale  Networked Systems

Filecules and Small Worlds in Scientific Communities:

Characteristics and Significance

Joint work with

Matei Ripeanu (UBC) and

Ian Foster (ANL and UChicago)

Page 14: CIS 6930.008:  Internet-Scale  Networked Systems

14CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

“No 24 in B minor, BWV 869”“Les Bonbons”

“Yellow Submarine”“Les Bonbons”

“Yellow Submarine”“Wood Is a Pleasant Thing to Think About”

“Wood Is a Pleasant Thing to Think About”

New metric: The Data-Sharing Graph GmT(V, E):

V is set of users active during interval T An edge in E connects users that asked for at

least m common files within T

Page 15: CIS 6930.008:  Internet-Scale  Networked Systems

15CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Small average path length

Large clustering coefficient

The DØ Collaboration

Clustering coeficient: 7days, 50 files

00.10.20.30.40.50.60.70.80.9

1

12/1

5/01

01/0

4/02

01/2

4/02

02/1

3/02

03/0

5/02

03/2

5/02

04/1

4/02

05/0

4/02

05/2

4/02

06/1

3/02

07/0

3/02

07/2

3/02

Random D0

Average path length: 7days, 50 files

00.5

11.5

2

2.53

3.54

12/1

5/01

01/0

4/02

01/2

4/02

02/1

3/02

03/0

5/02

03/2

5/02

04/1

4/02

05/0

4/02

05/2

4/02

06/1

3/02

07/0

3/02

07/2

3/02

Random D0

Small World!

CCoef =# Existing Edges

# Possible Edges

6 months of traces (January – June 2002)300+ users, 2 million requests for 200K files

Page 16: CIS 6930.008:  Internet-Scale  Networked Systems

16CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Small-World Graphs

Small path length, large clustering coefficient– Typically compared against random graphs

Think of:– “It’s a small world!”

– “Six degrees of separation” Milgram’s experiments in the 60s Guare’s play “Six Degrees of Separation”

Page 17: CIS 6930.008:  Internet-Scale  Networked Systems

17CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Other Small Worlds

0.1

1.0

10.0

1 10 100 1000 10000Clustering coefficient ratio (log scale)

Avg

. pat

h le

ngth

rat

io (

log

scal

e) .

Word co-occurrences

Film actors

LANL coauthors

Internet

Web

Food web

Power grid

D. J. Watts and S. H. Strogatz, Collective dynamics of small-world networks. Nature, 393:440-442, 1998R. Albert and A.-L. Barabási, Statistical mechanics of complex networks, R. Modern Physics 74, 47 (2002).

Page 18: CIS 6930.008:  Internet-Scale  Networked Systems

18CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Web Data-Sharing Graphs

0.1

1.0

10.0

1 10 100 1000 10000Clustering coefficient ratio (log scale)

Avg

. pat

h le

ngth

rat

io (

log

scal

e) . Web data-sharing graph

Other small-world graphs

7200s, 50files

3600s, 50files

1800s, 100files

1800s, 10file

300s, 1file

Data-Sharing Relationships in the Web, Iamnitchi, Ripeanu, and Foster, WWW’03

Page 19: CIS 6930.008:  Internet-Scale  Networked Systems

19CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

DØ Data-Sharing Graphs

0.1

1.0

10.0

1 10 100 1000 10000Clustering coefficient ratio (log scale)

Avg

. pat

h le

ngth

rat

io (

log

scal

e) . Web data-sharing graph

D0 data-sharing graphOther small-world graphs

7days, 1file

28 days,1 file

Page 20: CIS 6930.008:  Internet-Scale  Networked Systems

20CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

KaZaA Data-Sharing Graphs

7day, 1file

28 days1 file

0.1

1.0

10.0

1 10 100 1000 10000Clustering coefficient ratio (log scale)

Avg

. pat

h le

ngth

rat

io (

log

scal

e) . Web data-sharing graph

D0 data-sharing graphOther small-world graphsKazaa data-sharing graph

2 hours1 file

1 day2 files

4h2 files

12h4 files

Small-World File-Sharing Communities, Iamnitchi, Ripeanu, and Foster, Infocom ‘04

Page 21: CIS 6930.008:  Internet-Scale  Networked Systems

21CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

3 days 7 days 10 days 14 days 21 days 28 days

0

10

20

30

40

50

60

70

80

90

100 Except largest cluster

Total hit rateD0

Web

1 hour 4 hours 8 hours

010

20

30

40

50

60

70

8090

100 Except largest clusterTotal hit rate

Kazaa

Interest-Aware Information Dissemination in Small-World Communities, Iamnitchi and Foster, HPDC’05

Interest-Aware Data Dissemination

2 min 5 min 15 min 30 min

0

10

20

30

40

50

60

70

80

90

100 Except largest clusterTotal hit rate

Page 22: CIS 6930.008:  Internet-Scale  Networked Systems

22CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Tracking User Attention in Collaborative Tagging Communities, Elizeu Santos-Neto, Matei Ripeanu, and Adriana Iamnitchi, Workshop on Contextualized Attention Metadata (CAMA'07), Vancouver, Canada, June 2007.

Current Work: Tagging Communities

Page 23: CIS 6930.008:  Internet-Scale  Networked Systems

DØ Workload Characterization

Joint work with

Shyamala Doraimani (USF) and Gabriele Garzoglio (FNAL)

Page 24: CIS 6930.008:  Internet-Scale  Networked Systems

24CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

DØ Traces

Traces from January 2003 to May 2005 234,000 jobs, 561 users, 34 domains,

1.13 million files accessed 108 input files per job on average Detailed data access information about

half of these jobs (113,062)

Page 25: CIS 6930.008:  Internet-Scale  Networked Systems

25CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Contradicts Traditional Models

File size distribution Expected: log-normal. Why

not?– Deployment decisions– Domain specific– Data transformation

File popularity distribution Expected: Zipf. Why not? (speculations): Scientific data is uniformly interesting User community is relatively small

Page 26: CIS 6930.008:  Internet-Scale  Networked Systems

26CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Filecules: Intuition

Page 27: CIS 6930.008:  Internet-Scale  Networked Systems

27CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Filecules: General Characteristics

Filecules in High-Energy Physics: Characteristics and Impact on Resource Management, Adriana Iamnitchi, Shyamala Doraimani, Gabriele Garzoglio, HPDC’06

Page 28: CIS 6930.008:  Internet-Scale  Networked Systems

28CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Filecules: Size

Filecules of different sizes: Largest filecule:17 TB or 51,841 files 28% mono-file filecules

Page 29: CIS 6930.008:  Internet-Scale  Networked Systems

29CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Consequences for Caching

Use filecule membership for prefetching– When a file is missing from the local cache, prefetch the

entire filecule Use time locality in cache replacement

– Least Recently Used (classic algorithm) Implemented:

– LRU with files and LRU with filecules

– Greedy Request Value: prefetching + job reordering > Does not exploit temporal locality

> Prefetching based on cache content

– Our variant of LRU with filecules and job reordering

E. Otoo, et al. Optimal file-bundle caching algorithms for data-grids. In SC ’04

Page 30: CIS 6930.008:  Internet-Scale  Networked Systems

30CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Comparison: Caching Algorithms (1)

Page 31: CIS 6930.008:  Internet-Scale  Networked Systems

31CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Comparison: Caching Algorithms (2)

% of cache change is a measure of transfer costs.

Page 32: CIS 6930.008:  Internet-Scale  Networked Systems

32CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Summary Part 1 Revisited traditional workload models

– Generalized from file systems, the web, etc.– Some confirmed (temporal locality), some infirmed (file

size distribution and popularity) Compared caching algorithms on D0 data:

– Temporal locality is relevant– Filecules guide prefetching

Page 33: CIS 6930.008:  Internet-Scale  Networked Systems

33CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Summary

Workload characterization based on a HEP grid

– Quantify scale (data processed, number of files)

– Contradict traditional models Patterns can guide system design

– Filecules: caching, data replication

– Small world data sharing: adaptive information dissemination, replica placement

Page 34: CIS 6930.008:  Internet-Scale  Networked Systems

34CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Administravia:Paper Reviewing (1)

Goals:– Think of what you read– Get used to writing paper reviews

Reviews due by noon before class Be professional in your writing Have an eye on the writing style:

– Clarity– Beware of traps: learn to use them in writing and

detect them in reading– Detect (and stay away from) trivial claims. E.g., 1st sentence in the Introduction: “The tremendous/unprecedented/phenomenal

growth/scale/ubiquity of the Internet…”

Page 35: CIS 6930.008:  Internet-Scale  Networked Systems

35CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Administravia:Paper Reviewing (2)

Follow the form provided when relevant. State the main contribution of the paper Critique the main contribution: Rate the significance of the

paper on a scale of 5 (breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two.

Rate how convincing the methodology is. Do the claims and conclusions follow from the experiments? Are the assumptions realistic? Are the experiments well designed? Are there different experiments that would be more convincing? Are there other alternatives the authors should have

considered? (And, of course, is the paper free of methodological errors?)

Page 36: CIS 6930.008:  Internet-Scale  Networked Systems

36CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Administravia:Paper Reviewing (3)

What is the most important limitation of the approach? What are the three strongest and/or most interesting ideas in

the paper? What are the three most striking weaknesses in the paper? Name three questions that you would like to ask the authors. Detail an interesting extension to the work not mentioned in

the future work section. Optional comments on the paper that you’d like to see

discussed in class.

Page 37: CIS 6930.008:  Internet-Scale  Networked Systems

37CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Administravia:Discussion leading

Come prepared!– Prepare discussion outline– Prepare questions:

> “What if”s> Unclear aspects of the solution proposed> …

– Similar ideas in different contexts– Initiate short brainstorming sessions

Leaders do NOT need to submit paper reviews Main goals:

– Keep discussion flowing – Keep discussion relevant– Engage everybody (I’ll have an eye on this, too)

Page 38: CIS 6930.008:  Internet-Scale  Networked Systems

38CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Administravia:Projects

Combine with your research if relevant to the class Get approval from all instructors if you overlap final

projects:– Don’t sell the same piece of work twice

– You can get more than twice as many results with less than twice as much work

Aim high!– Put one extra month and get a publication out of it

– It is doable (we have proofs) Try ideas that you postponed out of fear: it’s just a

class, not your PhD.

Page 39: CIS 6930.008:  Internet-Scale  Networked Systems

39CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Administravia:Project deadlines (tentative)

January 30: 1-page project proposal Feb. 26: 3-page literature survey

– Know relevant work in your problem area

– If implementation project, list tools, similar projects March 31: 5-page Midterm project due

– Have a clear image of what’s possible/doable

– Report preliminary results Last class:In-class project presentation

– Demo, if appropriate May 1:

– Final report due

Page 40: CIS 6930.008:  Internet-Scale  Networked Systems

40CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Next Classed Lectures on basics of distributed systems Will start reading papers in about 2 weeks

Page 41: CIS 6930.008:  Internet-Scale  Networked Systems

41CIS6930.008: Internet-Scale Networked Systems (Spring 2008)

Questions?