33
School of Information University of Michigan SI 614 Community structure in networks Lecture 17

School of Information University of Michigan SI 614 Community structure in networks Lecture 17

Embed Size (px)

Citation preview

School of InformationUniversity of Michigan

SI 614Community structure in networks

Lecture 17

Outline

One mode networks and cohesive subgroups measures of cohesion types of subgroups

Affiliation networks

team assembly

Why care about group cohesion?

opinion formation and uniformity

if each node adopts the opinion of the majority of its neighbors, it is possible to have different opinions in different cohesive subgroups

within a cohesive subgroup – greater uniformity

Other reasons to care

Discover communities of practice (more on this next time)

Measure isolation of groups

Threshold processes: I will adopt an innovation if some number of my contacts do I will vote for a measure if a fraction of my contacts do

What properties indicate cohesion?

mutuality of ties everybody in the group knows everybody else

closeness or reachability of subgroup members individuals are separated by at most n hops

frequency of ties among members everybody in the group has links to at least k others in the group

relative frequency of ties among subgroup members compared to nonmembers

Cliques

Every member of the group has links to every other member

Cliques can overlap

overlapping cliques of size 3 clique of size 4

Considerations in using cliques as subgroups

Not robust one missing link can disqualify a clique

Not interesting everybody is connected to everybody else no core-periphery structure no centrality measures apply

How cliques overlap can be more interesting than that they exist

Pajek remember from class on motifs:

construct a network that is a clique of the desired size Nets>Fragment (1 in 2)>Find

a less stingy definition of cohesive subgroups: k cores

Each node within a group is connected to k other nodes in the group

3 core4 core

Pajek: Net>Partitions>Core>Input,Output,All

Assigns each vertex to the largest k-core it belongs to

subgroups based on reachability and diameter

n – cliques maximal distance between any two nodes in subgroup is n

2-cliques

theoretical justification information flow through intermediaries

frequency of in group ties

Compare # of in-group ties

Given number of edges incident on nodes in the group, what is the probabilitythat the observed fraction of them fall within the group?

The smaller the probability – the stronger the cohesion

within-group ties

ties from group to nodes external to the group

considerations with n-cliques

problem diameter may be greater than n n-clique may be disconnected (paths go through nodes not in

subgroup)

2 – clique

diameter = 3

path outside the 2-clique

fix n-club: maximal subgraph of diameter 2

cohesion in directed and weighted networks

something we’ve already learned how to do: find strongly connected components

keep only a subset of ties before finding connected components reciprocal ties edge weight above a threshold

1 23

4 567

8

910

111213

1415

16

1718

19

20

21

22 2324

2526

27

2829 30

3132

3334 35 36

37 38 39

40

1 DigbysBlog2 JamesWalcott3 Pandagon4 blog.johnkerry.com5 OliverWillis6 AmericaBlog7 CrookedTimber8 DailyKos9 AmericanProspect10Eschaton11Wonkette12TalkLeft13PoliticalWire14TalkingPointsMemo15Matthew Yglesias16WashingtonMonthly17MyDD18JuanCole19Left Coaster20BradfordDeLong

21 JawaReport22VokaPundit23Roger LSimon24TimBlair25Andrew Sullivan26 Instapundit27BlogsforBush28 LittleGreenFootballs29BelmontClub30Captain’sQuarters31Powerline32 HughHewitt33 INDCJournal34RealClearPolitics35Winds ofChange36Allahpundit37MichelleMalkin38WizBang39Dean’sWorld40Volokh(C)

(B)

(A) A) all citations between A-list blogs in 2 months preceding the 2004 election

B) citations between A-list blogs with at least 5 citations in both directions

C) edges further limited to those exceeding 25 combined citations

Example: political blogs(Aug 29th – Nov 15th, 2004)

only 15% of the citations bridge communities

Affiliation networks

otherwise known as membership network

e.g. board of directors hypernetwork or hypergraph bipartite graphs interlocks

m-slices

transform to a one-mode network weights of edges correspond to number of affiliations in

common m-slice: maximal subnetwork containing the lines with a

multiplicity equal to or greater than m

A =

1 1 1 1 0

1 1 1 1 0

1 1 2 2 0

1 1 2 4 1

0 0 0 1 1

1 1

1 2

1

2 slice

1-slice

Pajek:

Net>Transform>2-Mode to 1-Mode> Include Loops, Multiple Lines

Info>Network>Line Values (to view)

Net>Partitions>Valued Core>First threshold and step

Scottish firms interlocking directorates

legend:

2-railways

4-electricity

5-domestic products

6-banks

7-insurance companies

8-investment banks

methods used directly on bipartite graphs rare

Finding bicliques of users accessing documents An algorithm by Nina Mishra, HP Labs

Documents Users

Team Assembly Mechanisms Determine Collaboration Network Structure and Team Performance

Roger Guimera, Brian Uzzi, Jarrett SpiroLuıs A. Nunes AmaralScience, 2005

astronomy andastrophysics

social psychology

economics

Issues in assembling teams

Why assemble a team? different ideas different skills different resources

What spurs innovation? applying proven innovations from one domain to another

Is diversity (working with new people) always good? spurs creativity + fresh thinking but

conflict miscommunication lack of sense of security of working with close collaborators

Parameters in team assembly

1. m, # of team members

2. p, probability of selecting individuals who already belong to the network

3. q, propensity of incumbents to select past collaborators

Two phases giant component of interconnected collaborators isolated clusters

creation of a new team

incumbents (people who have already collaborated with someone)

newcomers (people available to participate in new teams)

pick incumbent with probability p if incumbent, pick past collaborator with probability q

Time evolution of a collaboration network

newcomer-newcomer collaborations

newcomer-incumbent collaborations

new incumbent-incumbent collaborations

repeat collaborations

after a time of inactivity, individuals are removed from the network

BMI data

Broadway musical industry 2258 productions from 1877 to 1990 musical shows performed at least

once on Broadway team: composers, writers,

choreographers, directors, producers but not actors

Team size increases from 1877-1929 the musical as an art form is still

evolving After 1929 team composition

stabilizes to include 7 people: choreographer, composer, director,

librettist, lyricist, producer

Collaboration networks

4 fields (with the top journals in each field) social psychology (7) economics (9) ecology (10) astronomy (4)

impact factor of each journal ratio between citations and recent citable items published

A= total cites in 1992 B= 1992 cites to articles published in 1990-91 (this is a subset of A) C= number of articles published in 1990-91 D= B/C = 1992 impact factor

size of teams grows over time

degree distributionsdata

data generated from a model with the same p and q and sequence of team sizes formed

Predictions for the size of the giant component

higher p means already published individuals are co-authoring – linking the network together and increasing the giant component

S = fraction of network occupied by the giant component

Predictions for the size of the giant component(cont’d)

increasing q can slow the growth of the giant component – co-authoring with previous collaborators does not create new edges

network statistics

Field teams individuals p q fR S (size of giant component)

BMI 2258 4113 0.52 0.77 0.16 0.70

social psychology

16,526 23,029 0.56 0.78 0.22 0.67

economics 14,870 23,236 0.57 0.73 0.22 0.54

ecology 26,888 38,609 0.59 0.76 0.23 0.75

astronomy 30,552 30,192 0.76 0.82 0.39 0.98

what stands out?

what is similar across the networks?

different network topologies

economics

astronomy

ecology

main findings

all networks except astronomy close to the “tipping” point where giant component emerges sparse and stringy networks

giant component takes up more than 50% of nodes in each network

impact factor (how good the journal is where the work was published) p positively correlated

going with experienced members is good q negatively correlated

new combinations more fruitful S for individual journals positively correlated

more isolated clusters in lower-impact journals

ecology, economics,

social psychology

ecology

social psychology