45
Large-Scale Network Dynamics: A New Frontier Jie Wang Dept of Computer Science University of Massachusetts Lowell ed at Dept. of Computer Science, Boston University, Nov. 6, 2009 . of Computer Science, University of Texas at Dallas, Oct. 30, 2009 . of Electrical and Computer Engineering, Michigan State Univ., Sept. 24, 2009

Large-Scale Network Dynamics: A New Frontier

  • Upload
    fausta

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

Large-Scale Network Dynamics: A New Frontier. Jie Wang Dept of Computer Science University of Massachusetts Lowell. Presented at Dept. of Computer Science, Boston University, Nov. 6, 2009 At Dept. of Computer Science, University of Texas at Dallas, Oct. 30, 2009 - PowerPoint PPT Presentation

Citation preview

Page 1: Large-Scale Network Dynamics: A New Frontier

Large-Scale Network Dynamics: A New Frontier

Jie WangDept of Computer Science

University of Massachusetts Lowell

Presented at Dept. of Computer Science, Boston University, Nov. 6, 2009At Dept. of Computer Science, University of Texas at Dallas, Oct. 30, 2009At Dept. of Electrical and Computer Engineering, Michigan State Univ., Sept. 24, 2009

Page 2: Large-Scale Network Dynamics: A New Frontier

2

“The earth to be spann’d, connected by network,The races, neighbors, to marry and be given in marriage,The oceans to be cross’d, the distant brought near,The lands to be welded together”

Walt Whitman (1819 - 1892), Passage to India “The network is the computer”John Gage (1942 - ), Sun Microsystems

“The network is the informationand the storage”Weibo Gong, UMass Amherst

Page 3: Large-Scale Network Dynamics: A New Frontier

3

Small-World Phenomenon

Two persons are linked if they are coauthors of an article. The Erdős number is the collaboration distance with mathematician Paul Erdős.

Six degrees of separationWhat is your Erdős number?

Erdös number  0  ---      1 person      Erdös number  1  ---    504 people      Erdös number  2  ---   6593 people      Erdös number  3  ---  33605 people      Erdös number  4  ---  83642 people      Erdös number  5  ---  87760 people      Erdös number  6  ---  40014 people      Erdös number  7  ---  11591 people      Erdös number  8  ---   3146 people      Erdös number  9  ---    819 people      Erdös number 10  ---    244 people      Erdös number 11  ---     68 people      Erdös number 12  ---     23 people      Erdös number 13  ---      5 people

The median Erdös number is 5; the mean is 4.65, and the standard deviation is 1.21

Page 4: Large-Scale Network Dynamics: A New Frontier

4

The Watts-Strogatz -Modelbetween order and randomness

Small-World Networks

- Short mean path; or short characteristic path- Large clustering coefficient

Page 5: Large-Scale Network Dynamics: A New Frontier

5

What Are Big-World Networks? Acquaintance Networks over Generations From “Mathematics Genealogy Project”

Gottfried Leibniz(1646-1716)

Jacob Bernoulli(1654-1705)

Johann Bernoulli(1667-1748) Leonhard Euler

(1707-1783)Joseph Lagrange

(1736-1813)

Simeon Poisson(1781-1840) Michel Chasles

(1793-1880)H. A. Newton(1830-1896)

E. H. Moore(1862-1932) Oswald Veblen

(1880-1960)

Alonzo Church(1903-1995)

John B. Rosser(1907-1989)

Gerald Sacks(1933 -)

343 academicdescendants

Stephen Homer Jie Wang

Page 6: Large-Scale Network Dynamics: A New Frontier

6

Scale-Free Phenomenon

Power law distribution:f(x) ~ x–α

Log-log scale:log f(x) ~ –αlog x

Scale-free networks are small-wolrdSmall-world may not be scale-freeSubnets of scale-free networks may not be scale-free

Page 7: Large-Scale Network Dynamics: A New Frontier

7

Brain Networks“A mental state M is nothing other than brain state B. The mental state "desire for a cup of coffee" would thus be nothing more than the "firing of certain neurons in certain brain regions.” -- E. G. Boring (1886-1968)

Page 8: Large-Scale Network Dynamics: A New Frontier

8

Are Brain Networks Small-World?

Brian networks are highly dynamic

Can process 100 trillion instructions per second

Some believe brain networks are small-world

Mathematical challenge: Work out a mathematical model consistent with brain functionalities

There are 100 billion (1011) neurons in the human brain, and 100 trillion (1014) connections (synapses)

Page 9: Large-Scale Network Dynamics: A New Frontier

9

Connecting the DotsNetworks are connected dots

“You can't connect the dots looking forward; you can only connect them looking backwards.”

Steven Jobs (1955 -)

Page 10: Large-Scale Network Dynamics: A New Frontier

10

Infectious Disease SpreadingHow Were Dots Connected?

Sept 05 – Sept 12, 2009Sept 12 – Sept 19, 2009Sept 19 – Sept 26, 2009Sept 26 – Oct 03, 2009Oct 03 – Oct 10, 2009Oct 10 – Oct 17, 2009

Page 11: Large-Scale Network Dynamics: A New Frontier

11

How Will the Dots Be Connected?

Dynamic connections are not deterministic, nor random. But they have patterns and trends.

Statistical analysis is like connecting the dots backward, while predicting disease spread is like connecting the dots forward …

Page 12: Large-Scale Network Dynamics: A New Frontier

12

A Simple Relational Model: The SIR Dynamics

Susceptible

Recovered Infectious

Structure-biased k-acquaintance model

Homophily: the tendency to associate with people like yourself Symmetry: undirected links Triad closure: the tendency of one’s acquaintances to also be acquainted with each other

An 8-acquaitance nodeunder SIR

Susceptible RecoveredInfectious

Page 13: Large-Scale Network Dynamics: A New Frontier

13

Structure-Biased Spread

Page 14: Large-Scale Network Dynamics: A New Frontier

14

A Mathematical Model of Spread Prediction

Page 15: Large-Scale Network Dynamics: A New Frontier

15

Mathematical Epidemiology• Most mathematical methods study differential equations based on simplified

assumptions of uniform mixing or ad hoc contact processes• Example:

Page 16: Large-Scale Network Dynamics: A New Frontier

16

Percolation and Outbreak• Large-scale graphs based on scale-free and small-world

models are common platforms to study epidemics

• Individuals (sites) are connected by social contacts (bonds)

• Each site is susceptible with probability p and each bond is open with probability q, indicating infectiousness

• A percolation threshold exists for phase transition of disease spread

– When both p and q are high, a cluster of infectious sites

connected by open bonds will permeate the entire population, resulting in an outbreak

– Otherwise, infectious clusters will be small and isolated

Page 17: Large-Scale Network Dynamics: A New Frontier

17

Percolation Threshold Demo

65 x 65 grid

q = 0.2q = 0.51q = 0.578

Page 18: Large-Scale Network Dynamics: A New Frontier

18

Modeling Challenges• Population and demographics

– urban, suburban, rural, mobility– income, age, gender, education, religion, culture, ethnic

background, household size • Social contact pattern

– household, work, study, shopping, entertainment, travel, medical activities, …

– dense and frequent local contacts; sparse and occasional long-distance contacts

• Infection process– disease characteristics: infectious speed & recovery levels– people's general health level and vaccination history– frequency and duration of contacts

B. Liu and J. Wang et al

It seems difficult to address these challenges using mathematical methods alone

Page 19: Large-Scale Network Dynamics: A New Frontier

19

Computational Methods• Simulations with contingent parameters

– Modeling disease outbreaks in realistic urban social networks (S. Eubank et al. Nature, 2004)

– Understanding the spreading patterns of mobile phone viruses (P. Wang et al., Science, 2009)

BT susceptible phones within the range of an infected BT phone will all be infected. An MMS virus can infect all susceptible phones whose numbers are in the phonebook of an infected phone

Page 20: Large-Scale Network Dynamics: A New Frontier

20

Mobile Networks and OSesLocation, mobility, and communication pattern dynamics

Page 21: Large-Scale Network Dynamics: A New Frontier

21

Page 22: Large-Scale Network Dynamics: A New Frontier

22

Online Social Networks (OSNs)• Topological dynamics

– temporal attribute of node and edge arrivals and departures

– explain why the mean degree and characteristic path length tend to be stable over time, while density and scale do not

• Communication dynamics– friendships vs. activities

• Mobility dynamics– GPS-enabled smartphones– location-based applications

G. Chen, B. Liu, J. Wang et al

Page 23: Large-Scale Network Dynamics: A New Frontier

23

The Rise of OSNs• 1997: SixDegrees allowed users to create

profiles, list and surf and friend lists

• 1997-2001: a number of community tools support profile and friend lists, AsianAvenue, BlackPlanet, MiGente, LiveJournal

• 2001 - present : business and professional social network emerged, Ryze, LinkedIn

• 2003: MySpace attracts teens, bands, among others and grows to largest OSN

• 2004: Facebook designed for college networking (Harvard), expanded to other colleges, high schools, and other individuals

Page 24: Large-Scale Network Dynamics: A New Frontier

24

Common OSNs

Page 25: Large-Scale Network Dynamics: A New Frontier

25

OSNs Go Mobile

• Location aware – GPS-enabled phones, sharing current location, availability, attaching

location to user-generated content

• Outlook– anticipated $3.3 billion revenue by 2013

• Dodgeball, Loopt, Brightkite, Whrrl, Google Latitude, Foursquare

Page 26: Large-Scale Network Dynamics: A New Frontier

26

PageRank for Measuring Page Popularity

Biased Random Walks

Just walk at random?

Page 27: Large-Scale Network Dynamics: A New Frontier

27

Association Rank for Friendship Prediction

G. Chen and J. Wang et al

Page 28: Large-Scale Network Dynamics: A New Frontier

28

• Startup in 2005, Denver, CO; opened to public: 2008

• User activities– Check in, status update, photo upload– All attached with current location– Updates through SMS, Email, Web, iPhone …

• Social graph with mutual connection– See your friends’ or local activity streams

Page 29: Large-Scale Network Dynamics: A New Frontier

29

Data Trace• Brightkite Web APIs

• 12/9/08-1/9/09: 18,951 active users

• Back traced to 3/21/08: 1,505,874 updates

• Profile: age, gender, tags, friends list

• Social graph: 41,014 nodes and 46,172 links

• Testing data: next 45 days had 5,098 new links added

G. Chen and N. Li

Page 30: Large-Scale Network Dynamics: A New Frontier

30

Snapshots taken from 12/09/08 to 01/09/09

Page 31: Large-Scale Network Dynamics: A New Frontier

31

Three Attributes to Measure Community Rank

Tags

Social Distance

Location

Page 32: Large-Scale Network Dynamics: A New Frontier

32

Probability Measure

Page 33: Large-Scale Network Dynamics: A New Frontier

33

Tag Graph Metric

Page 34: Large-Scale Network Dynamics: A New Frontier

34

Social Distance

Page 35: Large-Scale Network Dynamics: A New Frontier

35

Location Metric

Page 36: Large-Scale Network Dynamics: A New Frontier

36

Community Rank ValueIndicating the likelihood of friendship

Page 37: Large-Scale Network Dynamics: A New Frontier

37

ROC Curve

Page 38: Large-Scale Network Dynamics: A New Frontier

38

MySpace• Launched in Santa Monica,

CA, in 2003 • Grew rapidly and attracted

Friendster’s users, bands, …• Teenagers began joining en

masse in 2004• Three distinct populations

began to form:– musicians/artists– teenagers– post-college urban social crowd

• Purchased by News Corporation for $580M in 2005

• Arguably the largest online social network site

Page 39: Large-Scale Network Dynamics: A New Frontier

39

MySpace Profile and Activities• Each profile: age, gender, location, last login time,

etc; identified by a unique ID– Some profiles claim neutral gender, e.g, bands

• Profiles can be set to private (default is public)• What can users do?

– search and add friends to their friend lists– post messages to friend’s blog space

• Only friends have access to private profile’s friend list and blog space

• Other functions: IM/Call, Block/Rank User, Add to Group favorite

Page 40: Large-Scale Network Dynamics: A New Frontier

40

Measurement: SnailCrawler• Generate random IDs

uniformly between 1 and max (1,500,000,000)

• Many IDs are not occupied (invalid)

• Retrieve profile information from MySpace (HTTP)– name, ID, gender, age, location, public/private/custom

– other information for public profiles: company, religion, marriage, children, smoke/drink, orientation, zodiac, education, ethnicity, occupation, hometown, body-type, mood, last login, …

W. Gauvin, B. Liu, X. Fu, J. Wang et al

Page 41: Large-Scale Network Dynamics: A New Frontier

41

Data Trace

• People of 16 years old or younger are protected by law

• Teenagers and twenties post most blogs

• False ages at 98-100 years old

• Among teenagers 16-19, female publish more than male

• After 20, no significant differences; often male publish more than female

• Scanned: 3,090,016– Blogs: 67,045

Page 42: Large-Scale Network Dynamics: A New Frontier

42

Blog publish time (on special days)

Feb Sept Dec

• females publish more than males, and male more than neutral• spikes on holidays, e.g., Valentine’s day, Christmas

Valentine’s day

Christmas

Page 43: Large-Scale Network Dynamics: A New Frontier

43

Blog publish time (month & week)

• females publish more than males• more blogs posted May to Oct• slightly more blogs posted during weekdays

Sun Mon

Jan Dec Sun Sat

Page 44: Large-Scale Network Dynamics: A New Frontier

44

Blog publish time (within a day)

• big jump at 1 pm • people tend to publish from afternoon well into mid-night• peak around 10pm, bottom around 5am

Page 45: Large-Scale Network Dynamics: A New Frontier

45