CS 124/LINGUIST 180 From Languages to Information · 2018. 10. 25. · CS 124/LINGUIST 180 From Languages to Information Dan$Jurafsky Stanford$University$ Social’Networks:’ Small’Worlds,’Weak’

CS 124/LINGUIST 180From Languages to Information

Dan JurafskyStanford University

Social Networks: Small Worlds, Weak Ties, and Power Laws

Slides from Jure Leskovec, Lada Adamic, James Moody, Bing Liu,

Networks� Information in networks, not just text!� Pagerank: the structure of a network tells you something

�What are the properties of networks and what can we learn from them?

Social network analysis

�Social network analysis is the study of entities (people in an organization), and their interactions and relationships.

�The interactions and relationships can be represented with a network or graph, �each vertex (or node) represents an actor and

�each link represents a relationship. May be directed or not.

CS583, Bing Liu, UIC Nov 5, 2009

Various measures of centrality

�A central actor: involved in many ties.�Degree centrality: number of direct connections a node has

�Prestige centrality: everyone points to this actor:�Number of in-‐links �Pagerank is based on prestige

Modified from Bing Liu

Betweenness CentralityA node with high betweenness� lots of paths have to pass through it� influences network, choke-‐point for information� failure is a problem

Betweenness of node 7 should be high 3.6. ADVANCED MATERIAL: BETWEENNESS MEASURES AND GRAPH PARTITIONING73

1

2

3

64

5

7 8

9

10

11

1213

14

(a) A sample network

1

2

3

64

5

7 8

9

10

11

1213

14

(b) Tightly-knit regions and their nested structure

Figure 3.14: In many networks, there are tightly-knit regions that are intuitively apparent, and they caneven display a nested structure, with smaller regions nesting inside larger ones.

The Notion of Betweenness. To motivate the design of a divisive method for graph

partitioning, let’s think about some general principles that might lead us to remove the 7-8

edge first in Figure 3.14(a).

A first idea, motivated by the discussion earlier in this chapter, is that since bridges and

local bridges often connect weakly interacting parts of the network, we should try removing

these bridges and local bridges first. This is in fact an idea along the right lines; the problem

is simply that it’s not strong enough, for two reasons. First, when there are several bridges,

it doesn’t tell us which to remove first. As we see in Figure 3.14(a), where there are five

bridges, certain bridges can produce more reasonable splits than others. Second, there can

be graphs where no edge is even a local bridge, because every edge belongs to a triangle —

and yet there is still a natural division into regions. Figure 3.15 shows a simple example,

where we might want to identify nodes 1-5 and nodes 7-11 as tightly-knit regions, despite

Betweenness Centrality

� The betweenness of a node A (or an edge A-‐B)=

number of shortest paths that go through A (or A-‐B)___________________________________________________________________________

total number of shortest paths that exist between all pairs of nodes

Betweennessnumber of shortest paths that go through A -‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐-‐

total number of shortest paths between all pairs of nodes

AC

BD

Betweenness of B?

More formally:

# sh. paths through B

# shortest paths

ACADCD

0

01

121 Centrality

= 1/4

An example network� Network of which students have had sex with each other in a high school. � important for studying disease spread, etc.

� What do you think its shape is?� For example: is it core-‐periphery (like the web)?

Fig. 1.—The network structure of four models of infection

High school dating

Peter S. Bearman, James Moody and Katherine Stovel Chains of affection: The structure ofadolescent romantic and sexual networksAmerican Journal of Sociology 110 44-‐91 (2004)Image drawn by Mark Newman

Slide from Drago Radev

Why does the graph have this shape?� Teens probably don’t say:

� “By selecting this partner, I maximize the probability of inducing a spanning tree.”

� The “microtaboo” Bearman and Moody propose� don’t date your ex-‐girlfriend’s boyfriend’s ex-‐girlfriend� (or the reverse)� a simulation shows this constraint results in spanning treeChains of Affection

75

Fig. 8.—Hypothetical cycle of length 4

Status Dislocation and Closeness

Given the conditions of homophily described previously, figures 9 and 10show that a simple rule—the prohibition against dating (from a femaleperspective) one’s old boyfriend’s current girlfriend’s old boyfriend—ac-counts for the structure of the romantic network at Jefferson. Why mightthis negative proscription operate in a medium-sized community of es-sentially homogenous adolescents?

The explanation we offer only makes sense for short cycles. From theperspective of males or females (and independent of the pattern of “re-jection”), a relationship that completes a cycle of length 4 can be thoughtof as a “seconds partnership,” and therefore involves a public loss ofstatus.35 Most adolescents would probably stare blankly at the researcherwho asked boys: Is there a prohibition in your school against being in arelationship with your old girlfriend’s current boyfriend’s old girlfriend?It is a mouthful, but it makes intuitive sense. Like adults, adolescentschoose partners with purpose from the pool of eligible partners. But be-yond preferences for some types of partners over others—for example,preferences for partners interested in athletics, who do not smoke, or whowill skip school to have more fun—adolescents prefer partners who willnot cause them to lose status in the eyes of their peers. In the same waythat high-status students avoid relationships with low-status students, byselecting partners on the basis of the characteristics that have resonance

35 The status-loss hypothesis competes with other potential micromechanisms, e.g.,“jealousy” or the avoidance of too much “closeness,” a sentiment perhaps best describedunscientifically as the “yuck factor.” The status-loss hypothesis involves significantscope limitations: namely, status loss is limited to contexts where actors by virtue oftheir relational density can watch each other relatively closely. By contrast, the “yuckfactor”—which is essentially individualized—could operate in more diffuse contexts.



Small Worlds

Small worlds

Slide from Lada Adamic

Six Degrees of Kevin Bacon� Popularization of a small-‐world idea: � The Bacon number:

� Create a network of Hollywood actors � Connect two actors if they co-‐appeared in the movie

� Bacon number: number of steps to Kevin Bacon � As of 2013, the highest (finite) Bacon number reported is 11

� Only approx. 12% of all actors cannot be linked to Bacon

Slide adapted from Jure Leskovec

Erdös numbers are small too

NE

MA

• Chose 300 people in Omaha, NE and Wichita, KA• Ask them to get a letter to a stock-‐broker in Boston by passing it through friends

• How many steps did it take?

The Small World Experiment


What is the typical shortest path between any two people?

Stanley Milgram (1967)

Milgram’s small world experiment

It took 6.2 steps on average“Six degrees of separation”

Can we check this computationally?

�� 62&,20(75<�

��

��

=aaa�

8� �� $� ?� 1[��

��D��Z�

��

�� ,� �� ,��

180%(5�2)�,17(50(',$5,(6�

),*85(��

/HQJWKV�RI�&RPSOHWHG�&KDLQV�

%RVWRQ�EXVLQHVV�FRQWUDFWV��WZR�GLVWLQJXLVKDEOH�GLVWULEXWLRQV�HPHUJHG��7KH�PHDQ�RI�WKH�6KDURQ�GLVWULEXWLRQ�LV��OLQNV��DQG�WKDW�RI�WKH�%RVWRQ�GLVWULEX��WLRQ�LV��7KH�GLIIHUHQFH�LV�VLJQLILFDQW�DW�D� OHYHO�EHWWHU�WKDQ��DV�DVVHVVHG�E\�WKH�GLVWULEXWLRQ�IUHH�0DQQ�:KLWQH\�8�WHVW��1RWH�WKDW�PRUH�SRZHUIXO�VWDWLVWLFDO�WHVWV�RI�WKH�VLJQLILFDQFH�RI�GLIIHUHQFHV�EHWZHHQ�PHDQV�FDQQRW�EH�DSSOLHG�WR�WKHVH�GDWD��VLQFH�WKRVH�WHVWV�DVVXPH�QRUPDOLW\�RI�XQGHU��O\LQJ�GLVWULEXWLRQV��7KH�VKDSH�RI�WKH�WUXH�RU�WKHRUHWLFDO�GLVWULEXWLRQ�RI�OHQJWKV�RI�DFTXDLQWDQFH�FKDLQV�LV�SUHFLVHO\�ZKDW�ZH�GR�QRW�NQRZ��4XDOLWDWLYHO\��ZKDW�VHHPV�WR�RFFXU�LV�WKLV��&KDLQV�ZKLFK�FRQYHUJH�RQ�

WKH�WDUJHW�SULQFLSDOO\�E\�XVLQJ�JHRJUDSKLF�LQIRUPDWLRQ�UHDFK�KLV�KRPHWRZQ�RU�WKH�VXUURXQGLQJ�DUHDV�UHDGLO\��EXW�RQFH�WKHUH�RIWHQ�FLUFXODWH�EHIRUH�HQ��WHULQJ�WKH�WDUJHWV�FLUFOH�RI�DFTXDLQWDQFHV��7KHUH�LV�QR�DYDLODEOH�LQIRUPDWLRQ�WR�QDUURZ�WKH�ILHOG�RI�SRWHQWLDO�FRQWDFWV�ZKLFK�DQ�LQGLYLGXDO�PLJKW�KDYH�ZLWKLQ�WKH�WRZQ��6XFK�DGGLWLRQDO�LQIRUPDWLRQ�DV�D�OLVW�RI�ORFDO�RUJDQL]DWLRQV�

This content downloaded from 171.67.216.23 on Thu, 05 Mar 2015 02:15:27 UTCAll use subject to JSTOR Terms and Conditions

Facebook

99.6% of all pairs of users connected by paths of 5 degrees (6 hops)92% are connected by only four degrees (5 hops).

721 million users69 billion friendship links

Backstrom Boldi Rosa Ugander and Vigna, 2012“Four Degrees of Separation"

Fun facts: Origins of the “6 degress” hypothesis

� Hungarian writer Karinthy’s 1929 play “Chains” (Láncszemek)� https://djjr-‐courses.wdfiles.com/local-‐-‐files/soc180%3Akarinthy-‐chain-‐links/Karinthy-‐Chain-‐Links_1929.pdf

Duncan Watts: Networks, Dynamics andthe Small-World Phenomenon

• Why do we see the small world pattern?

• What implications does it has for the dynamical properties of social systems?

Slide from James Moody

Duncan Watts: Networks, Dynamicsand the Small-‐World Phenomenon

Watts says there are 4 conditions that make the small world phenomenon interesting:

1) The network is large -‐ O(Billions)2) The network is sparse -‐ people are connected to a small fraction of the total network

3) The network is decentralized -‐-‐ no single (or small #) of stars

4) The network is highly clustered -‐-‐ most friendship circles are overlapping


Duncan Watts: Networks, Dynamics and the Small-World PhenomenonFormally, we can characterize a graph through 2 statistics.

1) The characteristic path length, L The average length of the shortest paths

connecting any two nodes.(Note: this is not quite the same as the diameter of the graph, which is

the maximum shortest path connecting any two nodes)

2) The clustering coefficient, CThe average local density.

A small world graph is any graph with a relatively small L and a relatively large C.


Local clustering coefficient (Watts&Strogatz 1998)

� For a vertex iC = The fraction of pairs of neighbors of the node that are connected“What percentage of your friends know each other?”� Let ni be the number of neighbors of vertex i

number of connections between i’s neighborsmaximum number of possible connections between i’s neighbors

# directed connections between i’s neighborsni * (ni -‐1)

# undirected connections between i’s neighborsni * (ni -‐1)/2


Ci =

Ci directed =

Ci undirected =

Local clustering coefficient (Watts & Strogatz 1998)

� Average Ci over all n vertices


∑=i

iCnC 1

i

ni = 4max number of connections:4*3/2 = 63 connections presentCi = 3/6 = 0.5

link absentlink present

Watts and Strogatz “Caveman network”


• Everyone in a cave knows each other

• A few people make connections

• Are C and L high or low?

• C high, L high

Watts and Strogatz model [WS98]� Start with a ring, where every node is connected to the next z nodes ( a regular lattice)

� With probability p, rewire every edge (or, add a shortcut) to a uniformly chosen destination.


order

randomnessp = 0 p = 10 < p < 1

Small world

Why does this work? Key is fraction of shortcuts in the networkIn a highly clustered, ordered network, a single random connection will create a shortcut that lowers Ldramatically

Small world properties can be created by a small number of shortcuts


Clustering and Path Length

Slide from Lada AdamicRegular Graphs have a high clustering coefficient but also a high L

Random Graphs have a low clustering coefficient but a low L

Small World: Summary� Could a network with high clustering be at the same time a small world? � Yes! You don’t need more than a few random links

The Watts StrogatzModel: � Provides insight on the interplay between clustering and the small-‐world

� Captures the structure of many realistic networks � Accounts for the high clustering of real networks

Slide from Jure Leskovec



Weak links

Weak links� Mark Granovetter (1960s) studied how people find jobs. He found out that most job referrals were through personal contacts

� But more by acquaintances and not close friends.

� Aside:� Accepted by the American Journal of Sociology after 4 years of unsuccessful attempts elsewhere.

� One of the most cited papers in sociology.

� Mystery: Why didn’t jobs come from close friends?

Adapted from Drago Radev

Triadic Closure“If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future.” (Anatole Rapoport 1953)

48 CHAPTER 3. STRONG AND WEAK TIES

B

A

C

G

F

E D

(a) Before B-C edge forms.

B

A

C

G

F

E D

(b) After B-C edge forms.

Figure 3.1: The formation of the edge between B and C illustrates the e�ects of triadic

closure, since they have a common neighbor A.

seeking, and o�ers a way of thinking about the architecture of social networks more generally.

To get at this broader view, we first develop some general principles about social networks

and their evolution, and then return to Granovetter’s question.

3.1 Triadic Closure

In Chapter 2, our discussions of networks treated them largely as static structures — we take

a snapshot of the nodes and edges at a particular moment in time, and then ask about paths,

components, distances, and so forth. While this style of analysis forms the basic foundation

for thinking about networks — and indeed, many datasets are inherently static, o�ering us

only a single snapshot of a network — it is also useful to think about how a network evolves

over time. In particular, what are the mechanisms by which nodes arrive and depart, and

by which edges form and vanish?

The precise answer will of course vary depending on the type of network we’re considering,

but one of the most basic principles is the following:

If two people in a social network have a friend in common, then there is an

increased likelihood that they will become friends themselves at some point in the

future [347].

We refer to this principle as triadic closure, and it is illustrated in Figure 3.1: if nodes B and

C have a friend A in common, then the formation of an edge between B and C produces

a situation in which all three nodes A, B, and C have edges connecting each other — a

structure we refer to as a triangle in the network. The term “triadic closure” comes from

Reminder: clustering coefficient C� C of a node A is the probability that two randomly selected friends of A are friends themselves

� A before new edge = 1/6� (of B-‐C, B-‐D, B-‐E, C-‐D, C-‐E, C-‐E)� After new edge? � Triadic closure leads to higher clustering coefficients


B

A

C

G

F

E D


B

A

C

G

F

E D







3.1 Triadic Closure












future [347].





2/6

Why Triadic Closure?

1. We meet our friends through other friends� B and C have opportunity to meet through A

2. B and C’s mutual friendship with A gives them a reason to trust A

3. A has incentive to bring B and C together to avoid stress:� if A is friends with two people who don’t like each other it causes stress

� Bearmanand Moody: teenage girls with low clustering coefficients in their network of friends much more likely to consider suicide

Bridges


BA

ED

C

Figure 3.3: The A-B edge is a bridge, meaning that its removal would place A and B in

distinct connected components. Bridges provide nodes with access to parts of the network

that are unreachable by other means.

Reasons for Triadic Closure. Triadic closure is intuitively very natural, and essentially

everyone can find examples from their own experience. Moreover, experience suggests some

of the basic reasons why it operates. One reason why B and C are more likely to become

friends, when they have a common friend A, is simply based on the opportunity for B and C

to meet: if A spends time with both B and C, then there is an increased chance that they

will end up knowing each other and potentially becoming friends. A second, related reason

is that in the process of forming a friendship, the fact that each of B and C is friends with

A (provided they are mutually aware of this) gives them a basis for trusting each other that

an arbitrary pair of unconnected people might lack.

A third reason is based on the incentive A may have to bring B and C together: if A is

friends with B and C, then it becomes a source of latent stress in these relationships if B

and C are not friends with each other. This premise is based in theories dating back to early

work in social psychology [217]; it also has empirical reflections that show up in natural but

troubling ways in public-health data. For example, Bearman and Moody have found that

teenage girls who have a low clustering coe⌅cient in their network of friends are significantly

more likely to contemplate suicide than those whose clustering coe⌅cient is high [48].

3.2 The Strength of Weak Ties

So how does all this relate to Mark Granovetter’s interview subjects, telling him with such

regularity that their best job leads came from acquaintances rather than close friends? In

fact, triadic closure turns out to be one of the crucial ideas needed to unravel what’s going

on.

A bridge is an edge whose removal places A and B in different components

If A is going to get new information (like a job) that she doesn’t already know about, it might come from B

Local Bridge3.2. THE STRENGTH OF WEAK TIES 51

BA

ED

C

F H

GJ K

Figure 3.4: The A-B edge is a local bridge of span 4, since the removal of this edge would

increase the distance between A and B to 4.

Bridges and Local Bridges. Let’s start by positing that information about good jobs is

something that is relatively scarce; hearing about a promising job opportunity from someone

suggests that they have access to a source of useful information that you don’t. Now consider

this observation in the context of the simple social network drawn in Figure 3.3. The person

labeled A has four friends in this picture, but one of her friendships is qualitatively di�erent

from the others: A’s links to C, D, and E connect her to a tightly-knit group of friends who

all know each other, while the link to B seems to reach into a di�erent part of the network.

We could speculate, then, that the structural peculiarity of the link to B will translate into

di�erences in the role it plays in A’s everyday life: while the tightly-knit group of nodes A, C,

D, and E will all tend to be exposed to similar opinions and similar sources of information,

A’s link to B o�ers her access to things she otherwise wouldn’t necessarily hear about.

To make precise the sense in which the A-B link is unusual, we introduce the following

definition. We say that an edge joining two nodes A and B in a graph is a bridge if deleting

the edge would cause A and B to lie in two di�erent components. In other words, this edge

is literally the only route between its endpoints, the nodes A and B.

Now, if our discussion in Chapter 2 about giant components and small-world properties

taught us anything, it’s that bridges are presumably extremely rare in real social networks.

You may have a friend from a very di�erent background, and it may seem that your friendship

is the only thing that bridges your world and his, but one expects in reality that there will

A local bridge is an edge whose endpoints A and B have no friends in common(so a local bridge does not form the side of any triangle)

If A is going to get new information (like a job) that she doesn’t already know about, it might come from B

Strong and Weak Ties

� Strength of ties� amount of time spent together� emotional intensity� intimacy (mutual confiding)� reciprocal services

� Simplifying assumption:� Ties are either strong (s) or weak (w)

Adapted from James Moody

Strong ties and triadic closure� The new B-‐C edge more likely to form if A-‐B and A-‐C are strong ties

� More extreme: if A has strong ties to B and to C, there must be an edge B-‐C


B

A

C

G

F

E D


B

A

C

G

F

E D







3.1 Triadic Closure












future [347].





ss

Strong triadic closure52 CHAPTER 3. STRONG AND WEAK TIES

BA

ED

C

F H

GJ KS

SS

W

W S

W WW W

WS

S

S S

W W

S

SS

S S

S

Figure 3.5: Each edge of the social network from Figure 3.4 is labeled here as either a strongtie (S) or a weak tie (W ), to indicate the strength of the relationship. The labeling in the

figure satisfies the Strong Triadic Closure Property at each node: if the node has strong ties

to two neighbors, then these neighbors must have at least a weak tie between them.

be other, hard-to-discover, multi-step paths that also span these worlds. In other words, if

we were to look at Figure 3.3 as it is embedded in a larger, ambient social network, we would

likely see a picture that looks like Figure 3.4.

Here, the A-B edge isn’t the only path that connects its two endpoints; though they may

not realize it, A and B are also connected by a longer path through F , G, and H. This kind

of structure is arguably much more common than a bridge in real social networks, and we

use the following definition to capture it. We say that an edge joining two nodes A and B

in a graph is a local bridge if its endpoints A and B have no friends in common — in other

words, if deleting the edge would increase the distance between A and B to a value strictly

more than two. We say that the span of a local bridge is the distance its endpoints would

be from each other if the edge were deleted [190, 407]. Thus, in Figure 3.4, the A-B edge is

a local bridge with span four; we can also check that no other edge in this graph is a local

bridge, since for every other edge in the graph, the endpoints would still be at distance two if

the edge were deleted. Notice that the definition of a local bridge already makes an implicit

connection with triadic closure, in that the two notions form conceptual opposites: an edge

is a local bridge precisely when it does not form a side of any triangle in the graph.

Local bridges, especially those with reasonably large span, still play roughly the same

If a node Q has two strong ties to nodes Y and Z, there is an edge between Y and Z

Closure and bridges� If a node A in a network satisfies the Strong Triadic Closure Property and is involved in at least two strong ties, then any local bridge it is involved in must be a weak tie.

� So local bridges are likely to be weak ties� Explaining why jobs came from weak ties52 CHAPTER 3. STRONG AND WEAK TIES

BA

ED

C

F H

GJ KS

SS

W

W S

W WW W

WS

S

S S

W W

S

SS

S S

S

Figure 3.5: Each edge of the social network from Figure 3.4 is labeled here as either a strongtie (S) or a weak tie (W ), to indicate the strength of the relationship. The labeling in the

figure satisfies the Strong Triadic Closure Property at each node: if the node has strong ties

to two neighbors, then these neighbors must have at least a weak tie between them.

be other, hard-to-discover, multi-step paths that also span these worlds. In other words, if

we were to look at Figure 3.3 as it is embedded in a larger, ambient social network, we would

likely see a picture that looks like Figure 3.4.

Here, the A-B edge isn’t the only path that connects its two endpoints; though they may

not realize it, A and B are also connected by a longer path through F , G, and H. This kind

of structure is arguably much more common than a bridge in real social networks, and we

use the following definition to capture it. We say that an edge joining two nodes A and B

in a graph is a local bridge if its endpoints A and B have no friends in common — in other

words, if deleting the edge would increase the distance between A and B to a value strictly

more than two. We say that the span of a local bridge is the distance its endpoints would

be from each other if the edge were deleted [190, 407]. Thus, in Figure 3.4, the A-B edge is

a local bridge with span four; we can also check that no other edge in this graph is a local

bridge, since for every other edge in the graph, the endpoints would still be at distance two if

the edge were deleted. Notice that the definition of a local bridge already makes an implicit

connection with triadic closure, in that the two notions form conceptual opposites: an edge

is a local bridge precisely when it does not form a side of any triangle in the graph.

Local bridges, especially those with reasonably large span, still play roughly the same

Strength of weak ties

� Weak ties can occur between cohesive groups� old college friend� former colleague from work


weak ties will tend to have low transitivity

Strength of weak ties – how to get a job� Granovetter: How often did you see the contact that helped you find the job prior to the job search� 16.7% often (at least once a week)� 55.6% occasionally (more than once a year but less than twice a week)� 27.8% rarely – once a year or less

� Weak ties will tend to have different information than we and our close contacts do

� Long paths rare� 39.1 % info came directly from employer� 45.3 % one intermediary� 3.1 % > 2 (more frequent with younger, inexperienced job seekers)

� Compatible with Watts/Strogatz small world model: short average shortest paths thanks to ‘shortcuts’ that are non-‐transitive


More evidence for strength of weak ties

In the Milgram small world experiments, acquaintanceship ties were more effective than family, close friends at passing information

Summary� Triangles (triadic closure) lead to higher clustering coefficients� Your friends will tend to become friends

� Local bridges will often be weak ties� Information comes over weak ties



Power Laws

Degree of nodes � Many nodes on the internet have low degree

� One or two connections� A few (hubs) have very high degree� The number P(k) of nodes with degree k follows a power law:

� Where alpha for the internet is about 2.1� I.e., the fraction of web pages with k in-‐links is proportional to 1/k2€

P(k)∝ k−α

Power-law distributions

� Right skew� normal distribution is centered on mean� power-‐law or Zipf distribution is not

� High ratio of max to min� human heights (max and min not that different)� city sizes

� Power-‐law distributions have no “scale” (unlike a normal distribution)


Normal (Gaussian) distributionof human heights


average value close tomost typical

distribution close to symmetric aroundaverage value

Power-law distribution

� linear scale


n log-log scale

n high skew (asymmetry)n straight line on a log-log plot

Power laws are seemingly everywherenote: these are cumulative distributions


Moby Dick scientific papers 1981-1997 AOL users visiting sites ‘97

bestsellers 1895-1965 AT&T customers on 1 day California 1910-1992

Yet more power laws


Moon Solar flares wars (1816-1980)

richest individuals 2003 US family names 1990 US cities 2003

Power law distribution� Straight line on a log-‐log plot

� Exponentiate both sides to get that p(x), theprobability of observing an item of size ‘x’ is given by


α−=Cxxp )(

)ln())(ln( xcxp α−=

normalizationconstant (probabilities over all xmust sum to 1)

power law exponent α

What does it mean to be scale free?

� A power law looks the same no mater what scale we look at it on (2 to 50 or 200 to 5000)

� Only true of a power-‐law distribution!� p(bx) = g(b) p(x) – shape of the distribution is unchanged except for a multiplicative constant

� p(bx) = (bx)−α = b−α x−α

Slide from Lada Adamic log(x)

log(p(x))

x →b*x

Many real world networks are power law


exponent α(in/out degree)

film actors co-appearance 2.3telephone call graph 2.1email networks 1.5/2.0sexual contacts 3.2WWW 2.3/2.7internet 2.5peer-to-peer 2.1metabolic network 2.2protein interactions 2.4

Hey, not everything is a power law

� number of sightings of 591 bird species in the North American Bird survey in 2003.


cumulativedistribution

n another examples:n size of wildfires (in acres)

Zipf’s law is a power-law

� Zipf�George Kingsley Zipf

� how frequent is the 3rd or 8th or 100th most common word?

� Intuition: small number of very frequent words (“the”, “of”)� lots and lots of rare words (“expressive”, “Jurafsky”)

� Zipf's law: the frequency of the r'th most frequent word is inversely proportional to its rank:

y ~ r -‐β , with β close to unity.

Pareto’s law and power-laws

� Pareto� The Italian economist VilfredoPareto was interested in the distribution of income.

� Pareto’s law is expressed in terms of the cumulative distribution (the probability that a person earns X or more).

P[X > x] ~ x-‐k


Income� The fraction I of the income going to the richest P of the population is given by

Income fraction= (100/P)k-‐1

� if k = 0.5top 1 percent gets 100-‐0.5 = .10

� currently k = 0.6 [Jones, 2015 “Pareto and Piketty]top 1 percent gets 100-‐0.4 = .16

� (higher k = more inequality)

Where do power laws come from?� Many different processes can lead to power laws� There is no one unique mechanism that explains it all


Preferential attachment


• Price (1965)• Citation networks• new citations to a paper are proportional to the number it already has

• each new paper is generated with m citations• new papers cite previous papers with probability proportional to their in-‐degree (citations)

This is a “Rich get Richer” ModelExplanation for various power law effects1. Citations2. Assume cities are formed at different times, and

that, once formed, a city grows in proportion to its current size simply as a result of people having children

3. Words: people are more likely to use a word that is frequent (perhaps it comes to mind more easily or faster)

Implications: Wealth� Thomas Piketty’s book, #1 on NY Times best seller list in 2014

� Focuses on rise of inequality in wealth

� That same power law� An equation from a Stanford economist, wealth is a power law on η:

Pareto and Piketty: The Macroeconomics of Top Income and Wealth Inequality 37

at rate r − g − τ − α > 0. This is the basic “exponential growth” part of the require-ment for a Pareto distribution.

Next, we obtain heterogeneity in the simplest possible fashion: assume that each person faces a constant probability of death, d, in each period. Because Piketty (2014) emphasizes the role played by changing rates of population growth, we’ll also include population growth, assumed to occur at rate n. Each new person born in this economy inherits the same amount of wealth, and the aggregate inheritance is simply equal to the aggregate wealth of the people who die each period. It is straightforward to show that the steady-state distribution of this birth-death process is an exponential distribution, where the age distribution is Pr[Age > x] = e −(n+d)x. That is, the age distribution is governed by the birth rate, which equals n + d. The intuition behind this formulation is that a fraction n + d of new people are added to the economy each instant.

We now have exponential growth occurring over an exponentially distributed amount of time. The model we presented in the context of the income distribution suggested that the Pareto inequality measure equals the ratio of the “growth rate” to the “exponential distribution parameter” and that logic also holds for this model of the wealth distribution. In particular, wealth has a steady-state distribution that is Pareto with

ηwealth = r − g − τ − α _______________ n + d .

An equation like this is at the heart of many of Piketty’s statements about wealth inequality, for example as measured by the share of wealth going to the top 1 percent. Other things equal, an increase in r − g will increase wealth inequality: people who are lucky enough to live a long time—or are part of a long-lived dynasty—will accumulate greater stocks of wealth. Also, a higher wealth tax will lower wealth inequality. In richer frameworks that include stochastic returns to wealth, the super-rich are also those who benefit from a lucky run of good returns, and a higher variance of returns will increase wealth inequality.

Can this class of models explain why wealth inequality was so high historically in France and the United Kingdom relative to today? Or why wealth inequality was historically much higher in Europe than in the United States? Qualitatively, two of the key channels that Piketty emphasizes are at work in this framework: either a low growth rate of income per person, g, or a low rate of population growth, n—both of which applied in the 19th century—will lead to higher wealth inequality.

Piketty (2014, p. 232) summarizes the logic underlying models like this with characteristic clarity: “[I]n stagnant societies, wealth accumulated in the past takes on considerable importance.” On the role of population growth, for example, Piketty notes that an increase means that inherited wealth gets divided up by more offspring, reducing inequality. Conversely, a decline in population growth will concentrate wealth. A related effect occurs when the economy’s per capita growth rate rises. In this case, inherited wealth fades in value relative to new wealth generated

Power laws� Many processes are distributed as power laws

� Word frequencies, citations, web hits� Power law distributions have interesting properties

� scale free, skew, high max/min ratios� Various mechanisms explain their prevalence

� rich-‐get-‐richer, etc� Explain lots of phenomena we have been dealing with

� the use of stop words lists (a small fraction of word types cover most tokens in running text)



Power Laws

What classes should I take to follow up on this class?

Follow-up CS coursesSpring 2016

CS224U: Natural Language UnderstandingCS276: Information Retrieval and Web SearchCS224D: Deep Learning for Natural Language Processing

Fall 2016 (probably)CS147: Introduction to HCI DesignCS221: Artificial IntelligenceCS229: Machine LearningCS224W: Social and Information Network Analysis

Winter 2016CS224N: Natural Language ProcessingCS246: Mining Massive Datasets

Follow-up Linguistics coursesGeneral:

Ling 1: Intro to LinguisticsLing 140 Language Acquisition

Social meaning:(Spring 2016) Ling 65: African-‐American Vernacular English(Spring 2016) Ling 150: Language and SocietyLing 156: Language and GenderLing 1XX: The Linguistics of Advertising

Meaning/Understanding:Ling 130a: Semantics and PragmaticsLing 141: Language and Gesture

Others:Ling 105 Phonetics LING 121a: The Syntax of EnglishLING 121b: Crosslinguistic SyntaxLing 192: Language Testing

Documents

CS 124/LINGUIST 180 From Languages to Information · 2018. 10. 25. · CS 124/LINGUIST 180 From Languages to Information Dan$Jurafsky Stanford$University$ Social’Networks:’ Small’Worlds,’Weak’