Upload
dnac
View
156
Download
1
Embed Size (px)
Citation preview
Communi'es in Networks
Peter J. Mucha University of North Carolina
at Chapel Hill
0.021086
p = 0.7
Virg
inia
Mar
ylan
dFl
orid
a St
ate
Duke
North
Car
olin
a St
ate
Wak
e Fo
rest
Clem
son
Geor
gia
Tech
North
Car
olin
a
Texa
s Tec
hTe
xas A
&MBa
ylor
Texa
sOkla
homa
Oklaho
ma Stat
e
Colorado
Kansas State
Iowa State
Nebraska
Missouri
Kansas
Utah State
Colorado State
UtahBrigham Young
WyomingAir ForceNevada−Las Vegas
New MexicoSan Diego StateTulsaTexas−El PasoSouthern MethodistFresno StateNevadaHawaiiSan Jose State
Louisiana TechRiceBoise State
Alabama−Birmingham
LouisvilleMemphis
Cincinnati
Houston
East Carolina
Tulane
Southern Mississippi
Army
Non−Division IA
Texas Christian
Central Florida
South Florida
Troy State
New M
exico State
Louisiana−Lafayette
Arkansas State
North TexasLouisiana−M
onroe
Idah
oM
iddl
e Te
nnes
see
Stat
eAr
kans
asFl
orid
aG
eorg
ia
Tenn
esse
e
Kent
ucky
Sout
h Ca
rolin
a
Vand
erbi
lt
Loui
siana
Sta
te
Mississ
ippi
Mississ
ippi S
tate
Aubur
n
Alabam
a
Washin
gton S
tate
WashingtonUCLA
Southern California
Oregon StateOregon
Arizona StateStanfordCaliforniaArizonaMiami (Florida)SyracuseTempleRutgersBoston College
PittsburghWest VirginiaVirginia Tech
Navy
Notre DamePurdue
Ohio State
Penn State
Indiana
Wisconsin
Illinois
Michigan
Northwestern
Iowa
Minnesota
Michigan State
Connecticut
Miami (Ohio)Kent
MarshallAkron
BuffaloOhio
Bowling Green StateCentral M
ichiganEastern M
ichiganW
estern MichiganToledo
Ball StateNorthern Illinois
AGRICULTURE
APPROPRIATIONS
INTERNATIONAL RELATIONS
BUDGET
HOUSE ADMINISTRATION
ENERGY/COMMERCE
FINANCIAL SERVICES
VETERANS’ AFFAIRS
EDUCATION
ARMED SERVICES
JUDICIARY
RESOURCES
RULES
SCIENCE
SMALL BUSINESS
OFFICIAL CONDUCTTRANSPORTATION
GOVERNMENT REFORMWAYS AND MEANS
INTELLIGENCE
HOMELAND SECURITY
10 20 30 40 50 60 70 80 90 100 110CTMEMANHRI VTDE NJNY PAIL INMI OHWI IAKSMNMONENDSDVAALAR FLGALAMSNCSC TXKYMDOKTNWVAZCO IDMTNVNMUTWYCAORWAAK HI
Congress #
Coupling = 0.2: 13 communities
1917D, 122R, 13other
36PA, 15F, 6AA
373D, 162J, 75other
1615R, 220W, 163F, 97AJ, 273other
605R, 109D, 6other
105DR, 1F
1256D, 140R, 62other
13PA, 4AA
67DR, 7F66D, 2W, 1FS
105R, 44D
145DR, 28AA, 6F, 5PA
941R, 159D, 7I, 3C
1807−18091827−1829
1847−18491867−1869 1927−1929
1947−19491967−1969
1987−19892007−2009
Communi'es in Networks 1. What is a community and why are they useful? 2. How do you calculate communi'es?
• Descrip've: e.g., Modularity • Genera've: e.g., Stochas'c Block Models
3. Where is community detec'on going in the future? … with apologies that this presenta0on will seriously err on the self-‐absorbed side. It’s a big field, and I do not promise to know nor present it all. “Communi'es in Networks,” Porter, Onnela & Mucha, No0ces of the American Mathema0cal Society 56, 1082-‐97 & 1164-‐6 (2009). “Community Detec'on in Graphs,” S. Fortunato, Physics Reports 486, 75-‐174 (2010).
Acknowledgements: • Shankar Bhamidi, Jean Carlson, Aaron Clauset,
Skyler Cranmer, James Fowler, James Gleeson, Sco[ Gra\on, Jim Moody, Mark Newman, Andrew Nobel, Mason Porter
• Dani Basse[, Elizabeth Leicht, Nishant Malik, Sergey Melnik, J.-‐P. Onnela, Serguei Saavedra
• Dan Fenn, Elizabeth Menninga, Feng “Bill” Shi, Ashton Verdery, Simi Wang, James Wilson, Andrew Waugh
• Thomas Callaghan, A. J. Friend, Chris' Frost, Eric Kelsic, Kevin Macon, Sean Myers, Ye Pei, Sco[ Powers, Stephen Reid, Thomas Richardson, Mandi Traud, Casey Warmbrand, Yan Zhang
• NSF (CAREER/REU & VIGRE), NIGMS (SNAH), JSMF (MAP/JF & PJM), Caltech SURF, UNC (AGEP, CAS, SURF)
Communi'es in Networks 1. What is a community and why are they useful? 2. How do you calculate communi'es?
• Descrip've: e.g., Modularity • Genera've: e.g., Stochas'c Block Models
3. Where is community detec'on going in the future? … with apologies that this presenta0on will seriously err on the self-‐absorbed side. It’s a big field, and I do not promise to know nor present it all. “Communi'es in Networks,” Porter, Onnela & Mucha, No0ces of the American Mathema0cal Society 56, 1082-‐97 & 1164-‐6 (2009). “Community Detec'on in Graphs,” S. Fortunato, Physics Reports 486, 75-‐174 (2010).
• Jim Moody (paraphrased): “I’ve been accused of turning everything into a network.”
• PJM (in response): “I’m accused of turning everything into a network and a graph par''oning problem.”
• “Structure ßà Func0on” How to extend the no+on of modularity in networks to mul+ple networks between the same actors/units, i.e. how to properly use iden+ty in modularity?
Philosophical Disclaimer
Images by Aaron Clauset
Karate Club Example
This par''on op'mizes modularity, which measures the number of intra-‐community 'es (rela've to randomness)
“If your method doesn’t work on this network, then go home.”
Karate Club Example
Brought to you by Mason Porter and The Power Law Shop h[p://www.cafepress.com/thepowerlawshop
Women’s and kids’ sizes also available “If your method doesn’t work on this network, then go home.”
“Cris Moore (leJ) is the inaugural recipient of the Zachary Karate Club Club prize, awarded on behalf of the community by Aric Hagberg (right). (9 May 2013)”
Traud et al., “Comparing community structure to characteris'cs in online collegiate social networks” (2011) Traud et al. “Social structure of Facebook networks” (2012)
Caltech 2005: Colors indicate residen'al “House” affilia'ons Purple = Not provided
Facebook Caltech 2005: Colors indicate residen'al “House” affilia'ons Purple = Not provided
Traud et al., “Comparing community structure to characteris'cs in online collegiate social networks” (2011) Traud et al. “Social structure of Facebook networks” (2012)
Facebook Caltech 2005: Colors indicate residen'al “House” affilia'ons Purple = Not provided
Traud et al., “Comparing community structure to characteris'cs in online collegiate social networks” (2011) Traud et al. “Social structure of Facebook networks” (2012)
Facebook Caltech 2005: Colors indicate residen'al “House” affilia'ons Purple = Not provided
Traud et al., “Comparing community structure to characteris'cs in online collegiate social networks” (2011) Traud et al. “Social structure of Facebook networks” (2012)
Logis'c Regression: zRand:
Roll call as a network?
Scien'fic Coauthorship v. Roll Call Similari'es
see Waugh et al., “Party polariza'on in Congress: a network science approach” (2009)
see Waugh et al., “Party polariza'on in Congress: a network science approach” (2009)
Moody & Mucha, “Portrait of poli'cal party polariza'on” (2013)
Parker et al., “Network Analysis Reveals Sex-‐ and An'bio'c Resistance-‐Associated An'virulence Targets in Clinical Uropathogens” (2015)
Parker et al., “Network Analysis Reveals Sex-‐ and An'bio'c Resistance-‐Associated An'virulence Targets in Clinical Uropathogens” (2015)
Communi'es in Networks 1. What is a community and why are they useful? 2. How do you calculate communiBes?
• DescripBve: e.g., Modularity • GeneraBve: e.g., StochasBc Block Models
3. Where is community detec'on going in the future? … with apologies that this presenta0on will seriously err on the self-‐absorbed side. It’s a big field, and I do not promise to know nor present it all. “Communi'es in Networks,” Porter, Onnela & Mucha, No0ces of the American Mathema0cal Society 56, 1082-‐97 & 1164-‐6 (2009). “Community Detec'on in Graphs,” S. Fortunato, Physics Reports 486, 75-‐174 (2010).
Community Detec'on Firehose Overview • Computa'onal sledgehammer for large data • “Hard/rigid” v. “so\/overlapping” clusters • cf. biclustering methods and mathema'cs of expander graphs • A community should describe a “cohesive group,” and there are
varying formula'ons and algorithms – Linkage clustering (average, single), local clustering coefficients,
betweeness (geodesic, random walk), spectral, conductance,… • Classic approach in CS: Spectral Graph Par''oning
– Need to specify number of communi'es sought • Conductance • MDL, Infomap, OSLOM, … (many other things I’ve missed) … • Modularity: a good par''on has more intra-‐community edges than
one would expect at random • Stochas'c Block Models: a genera've random graph model with
different in/out probabili'es between labeled groups
“Communi'es in Networks,” Porter, Onnela & Mucha, No0ces of the American Mathema0cal Society 56, 1082-‐97 & 1164-‐6 (2009).
“Community Detec'on in Graphs,” S. Fortunato, Physics Reports 486, 75-‐174 (2010).
Images by Aaron Clauset
Structure ßà Func'on/Process “Modularity” Approach:
Community Detec'on: Null Model & Computa'onal Heuris'cs
• GOAL: Assign nodes to communi'es in order to maximize quality func'on Q
• NP-‐Complete [Brandes et al. 2008] ~ enumerate possible par''ons
• Numerous packages developed/developing – e.g. igraph library (R, python), NetworkX – Need appropriate null model
Maximizing Modularity (Newman & Girvan, PRE 2004; Newman, PRE 2004, PNAS 2006, PRE 2006) • Independent edges, constrained to expected degree sequence same as observed.
• Requires Pij = f(ki)f(kj), quickly yielding
• γ resolu'on parameter ad hoc (default = 1) (Reichardt & Bornholdt, PRE 2006; Lambio[e et al., arXiv 2008)
• Resolu0on limit (Fortunato & Barthelemy, PNAS 2007) Degenerate landscape (Good, de Montjoye & Clauset, PRE 2010) Forces par00on (many authors!)
Fenn et al., Chaos 2009 Macon, PJM & MAP, Physica A 2012
Community Detec'on: Other Models
• Erdos-‐Renyi (Bernoulli) • Newman-‐Girvan*
• Leicht-‐Newman* (directed) • Barber* (bipar'te)
Poli'cal Blogs (Adamic & Glance, WWW-‐2005)
“On closer inspec0on, we find that the method [(a)] fails in this case because it does not take into account the wide varia0on among the degrees of nodes in the network. In this network (and many others) degrees vary over a great range, whereas degrees in the block model are Poisson distributed and narrowly peaked about their mean. This means, in effect, that there is no choice of parameters for the model that gives a good fit to the data. Ficng this block model is similar to ficng a straight line through an inherently curved set of data points—you can do it, but it is unlikely to give you a meaningful answer.” —Newman, Nature Physics 2012 Similar visualiza'ons from different models in Amini et al., arXiv (2012) Bo[om Right: Par''ons v. overlap & extrac'on (Wilson et al. in prep)
Fortunato & Barthelemy, PNAS 2007 Ball, Karrer & Newman, PRE 2011
Louvain (Blondel et al. J.Stat.Mech. 2008)
Other great codes to know: h[p://www.mapequa'on.org/ h[ps://graph-‐tool.skewed.de/
InfoMap (Rosvall & Bergstrom 2008)
OSLOM (Lancichinez et al., PLoS One 2011)
• Score: Significance • “Homeless” ver'ces • Overlap • Cluster hierarchy • Because of the way the algorithm evolves clusters, it can naturally be used for temporal network data.
Conductance & NCP Plots (Leskovec, Mahoney, …)
Stochas'c Block Models R: Mixer Python: Graph-‐Tool
Other great codes to know: h[p://www.mapequa'on.org/ h[ps://graph-‐tool.skewed.de/
At the most general level…
Two related but different issues to keep straight: 1. Theore'cal Concept (e.g., “Modularity”,
“Map Equa'on”, “Stochas'c Block Models”) 2. Computa'onal Heuris'c & Implementa'on
(e.g. “Fast Greedy”, “Louvain”, “Itera've Improvement”, or the specific SBM code [possible ini'aliza'on issues with some])
And, finally, how do you compare communi'es?
Comparing Par''ons (e.g. Sec'on 15.2 of Fortunato 2010)
R x C Con'ngency Table:
1. Cluster Matching – Requires injec0on
2. Pair Coun'ng – “Adjusted” v.
“Standardized”
3. Informa'on Theory – Varia'on of
Informa'on, Normalized Mutual Informa'on
Informa'on-‐Theore'c Comparisons (e.g. Sec'on 15.2 of Fortunato 2010)
Pair Coun'ng & Standardiza'on (see, e.g., Traud et al., SIAM Review 2011)
wαβ counts: α & β binary indicator for same/different • Rand, Jaccard, Minkowski,
Fowlkes-‐Mallows,… • “Adjusted”: center on mean
with perfect match = 1 • “Standardized” by stdev,
expressed as z-‐score • Linear in w11 à equal z • Monotonic in w11 à equal p
Pair Coun'ng & Standardiza'on (see, e.g., Traud et al., SIAM Review 2011)
wαβ counts: α & β binary indicator for same/different • Rand, Jaccard, Minkowski,
Fowlkes-‐Mallows,… • “Adjusted”: center on mean
with perfect match = 1 • “Standardized” by stdev,
expressed as z-‐score • Linear in w11 à equal z • Monotonic in w11 à equal p
Facebook Caltech 2005: Colors indicate residen'al “House” affilia'ons Purple = Not provided
Traud et al., “Comparing community structure to characteris'cs in online collegiate social networks” (2011) Traud et al. “Social structure of Facebook networks” (2012)
Logis'c Regression: zRand:
Communi'es in Networks 1. What is a community and why are they useful? 2. How do you calculate communi'es?
• Modularity, Stochas'c Block Models, Infomap 3. Where is community detecBon going in the future? … with apologies that this presenta0on will seriously err on the self-‐absorbed side. It’s a big field, and I do not promise to know nor present it all. “Communi'es in Networks,” Porter, Onnela & Mucha, No0ces of the American Mathema0cal Society 56, 1082-‐97 & 1164-‐6 (2009). “Community Detec'on in Graphs,” S. Fortunato, Physics Reports 486, 75-‐174 (2010).
MulBlayer Networks Ordered
Categorical Mucha et al., “Community structure in 'me-‐dependent, mul'scale, and mul'plex networks” (2010)
Kivelä et al., “Mul'layer Networks” (2014)
Mul'layer Modularity Deriva'on
• Generalized Lambio[e et al. (2008) connec'on between modularity and autocorrela'on under Laplacian dynamics to rederive null models for bipar'te (Barber), directed (Leicht-‐Newman), and signed (Traag et al.) networks, via one-‐step condi'onal probabili'es
intra-‐slice adjacency data
and null
inter-‐slice idenBty arcs
Same formalism works for more general mul'layer networks, with sum over inter-‐layer connec'ons within same community
Mucha et al., “Community structure in 'me-‐dependent, mul'scale, and mul'plex networks” (2010)
110 Senates (two-‐year Congresses)
110 Senates (two-‐year Congresses)
PJM & MAP, Chaos 2010
PJM & MAP, Chaos 2010
PJM & MAP, Chaos 2010
PJM & MAP, Chaos 2010
See mapequa'on.org
“Mul'layer Stochas'c Block Model”
Strata MLSBM (sMLSBM) Stanley et al., “Clustering network layers with the
strata mul'layer stochas'c block model” (to appear)
Initialization
layer l kmeans cluster L layers in
to S strata
stratum s
Iterative Process stratum s
Update number of strata to the number of unique clustering
patterns according to (1) and (2)
kmeans cluster
2L layers in
to S strata
(1)
(2)
sMLSBM on SparCC microbial interac'ons Stanley et al., “Clustering network layers with the
strata mul'layer stochas'c block model” (to appear)
Summary • Community detec'on is an exploratory tool that can
provide a simplified high-‐level view of the organiza'on of a network.
• There are many methods. Don’t 0e yourself down to one method: good clusters should be robust, and (hopefully) your story shouldn’t depend on the precise method (or understand why).
• Many of these methods have parameters and it is important to know about them for best use.
• Mul'layer networks are very general. There are rela'vely few op'ons currently available for finding communi'es in mul'layer network data, but this area will expand rapidly.
Other great codes to know: h[p://www.mapequa'on.org/ h[ps://graph-‐tool.skewed.de/