Upload
fidella-lerato
View
47
Download
7
Embed Size (px)
DESCRIPTION
KliqueFinder: Identifying Clusters in Network Data. Kenneth A. Frank Michigan State University Based on: Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56 Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119. - PowerPoint PPT Presentation
Citation preview
home
KliqueFinder: Identifying Clusters in Network Data
Kenneth A. Frank
Michigan State University
Based on:
• Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56
• Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.
• Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006). "Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123 * co first authors.
• https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representation1
home
Overview
• Clustering and Graphical Representations of Networks
• Running KliqueFinder...
– Step 1) Criteria for Determining Group Membership
– Step 2: Maximizing Criterion
– Step 3) Examine evidence of clusters
– Step 4) Evaluating the Performance of the Algorithm : Did...
• Make Sociogram in Netdraw
• Confidentiality/Ethical issues in Collecting Network Data
• Modifying the Image: Adding Node Data or Relations...
• Two mode
• Software Challenge...
• Batch KliqueFinder
• Prepping Converting data
• A Priori Clusters2
home
Clustering and Graphical Representations of Networksvideo : (26:09-31:41): ID: [email protected] PW:kenfrank2014
Goal: to identify patterns in the network
• Rearrange rows and columns of social network matrix to reveal clustering
• Plot actors and ties in two dimensions to reveal clustering
3
home
Theory for defining cluster membership
• cohesion (clusters are called subgroups): an actor should be in a cluster if the actor has demonstrated a preference for engaging in ties with members of the cluster.
– Result: ties are concentrated within subgroups• structural equivalence (blocks): an actor should be in a cluster if the
actor engages in a similar pattern of ties as members of that cluster.
– Result: blocks represent positions, but ties not necessarily concentrated within blocks.
4
Crystallized Sociogram: Friendships Among the French Financial Elite
Lines indicate friendships: solid within subgroups, dotted between subgroups.
numbers represent actors
Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration
Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686
5
Crystallized Sociogram: Clusters in Foodwebs
Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure." Nature 426:282-285
6
Data Input
7
Old (10 spaces for each) New: flexible columns,
Same results
File name must be less than 20 character. Best if file name is six characters followed by .list: xxxxxx.list . For example stanne.list
Actor 1 interacts with actor 2 at a level of 3Extent of relation can be binary or weighted
Prepping data in excel Prepping Data in UCINETConverting data using sas
ID’s should be 6 digits or less
Data
Actor 1 interacts with actor 2 at a level of 3Extent of relation can be binary or weighted
Best if file name is six characters followed by .list.xxxxxx.listFor example stanne.list
New version of KliqueFinder is more flexible About 10 column widths.
ID’s should be 6 digits or lessPrepping data in excel Prepping Data in UCINETConverting data using sas
Edgelist
First two rows do not appear in the data –I put them there to show the format: 10 spaces for each entry
8
home
Steps for finding clustersvideo: (31:41-43:30): ID: [email protected] PW:kenfrank2014
1) Determine criterion for defining clusters2) Maximize criterion3) Examine evidence of clusters4) Evaluate performance of the algorithm5) Interpret clusters
commonality of attributesfocal experiencessubsequent behavior
9
home
Step 1) Criteria for Determining Group Membership
Structural Equivalence:Factor analyze sociomatrix (Katz & Kahn)iteratively rearrange and revalue rows and
columns (CONCORR -- White el al., 1976)
Cohesionutilize fixed criteria (e.g., must be connected to at
least k others in clusters, or must be minimal path length from k others, etc).
use flexible criterion -- preference relative to group sizes and number of ties:
10
Model Based Cohesion
Wii’=1 if tie between actors i and i’, 0 otherwise
samegroupii’ =1 if actors i and i’ are members of the same subgroup,
0 otherwise.
Then θ1 represents subgroups salience:
So ...... Maximize θ1 (odds ratio)
11
home
Odds Ratio for Association Between Common Subgroup Membership and
The Occurrence of Ties Between Actors
12
home
Step 2: Maximizing Criterion
• 1) find a subgroup seed (3 actors who interact with each other, and with similar others)
• 2) add to the cluster to maximize θ1 until you cannot do any more
• 3) start new subgroup with new seed• 4) shuffle between existing subgroups• 5) make new subgroups as necessary,
dissolve existing ones as necessary.13
KliqueFinder Algorithm: Phase I
Find subgroup seed of 2 or 3
Identify single move that most increases objective function θ1
Does move increase function?
yes
Reassign actor that makes best move
No
If assignment moves actor out of a group of 3, reassign reamaining 2 to next best groups
For finding best subgroup seed: 1) can only choose from unaffiliated actors2) Each actor can only be a seed once
Initialize: assign each actor to own subgroup
Computationally intensive, modify for large networks
home
KliqueFinder Algorithm: Phases II and III
• Phase II: If best move does not increase objective function and there are fewer than 3 actors available for subgroups then– Attach all isolated (or singleton) actors to best
existing subgroups, even if this reduces objective function
• Phase III: shuffle actors between existing subgroups without seeding new ones or disbanding existing ones– Number of subgroups is fixed– This is simple hill climbing and can be cast as EM
algorithm
home
Running KliqueFindervideo :(43:30-1:01:00):
ID: [email protected] PW:kenfrank2014
• Click on “Browse…” button to specify the directory where the data file is located.
• Download KliqueFinder at
–http://hlmsoft.net/wkf/–Follow instructions to install. Put in c:\kliqfind–Mac users: vmware fusion, Windows 7, 32 bit: http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/
16
home
KliqueFinder
• Choose “Basic setup” and then click “Run setup file” button.
17
home
KliqueFinder
• Click on the “Browse” button to choose a data file.
18
Run AnalysisData file
19
New Version of Data Input more Flexible
20
Old (10 spaces for each) New: flexible columns,
Same results
File name must be less than 20 charactersID’s should be 6 digits or less
Actor 1 interacts with actor 2 at a level of 3Extent of relation can be binary or weighted
Prepping data in excel Prepping Data in UCINETConverting data using sas
View Clusters Output
21
N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037|------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......|------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......|------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......|------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|
θ1 =1.1738
Blocked Network Data
22
home
Step 3) Examine evidence of clusters
1) randomly redistribute ties
2) apply algorithm
3) record value of odds ratio and θ1
4) repeat 1000 times to generate distribution
5) use mean of distribution as baseline for comparison
23
home
Randomly Redistributing Ties
24
home
Apply Algorithm to Random Data,
25θ1=.81822
Monte Carlo Sampling Distributionvideo: (1:06:35-1:18:50) ID: [email protected] PW:kenfrank2014
Output in sampdist.dat
θ1=Log odds/2 Odds Ratio
Set up sampling. Remember to do “new data” set up when doneTo prepare for next analysis
Indicate simulate dataData can include weights
26
Code for Reading in Sample Distribution Data
GET DATA /TYPE=TXT /FILE="C:\KLIQFIND\sampdist.dat" /FIXCASE=1 /ARRANGEMENT=FIXED /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= /1 theta1 0-29 F30.10 oddsratio 30-59 F30.10 samplesize 60-89 F30.10.CACHE.EXECUTE.DATASET NAME DataSet9 WINDOW=FRONT.
DATASET ACTIVATE DataSet9.GRAPH /HISTOGRAM=theta1.
spss
title "Sampling distribution for theta1";data one;infile "sampdist.dat" missover;Input theta1 odds1;
proc univariate plot;var theta1;
SAS
27
Stata
*This command imports the data fileimport delimited C:\KLIQFIND\sampdist.dat, delimiter(" ", asstring)
*These commands perform data management:drop v1rename v2 theta1rename v3 oddsratiorename v4 samplesize
*This command plots histogram for theta1:hist theta1,freq
Comparison of Sampling Distributions
28
Distribution of θ1base From Application of the Algorithm to Data Simulated Without Regard for Subgroup Membership
Observed value: 1.17381.1738
29
Sampling Distribution Parameters
Edit simulation parameters.First element is number of replications
30
Must keep # of reps in first 5 columns
home
Approximate p-value Based on Previous Simulations
PREDICTED THETA (1 base) BASED ON SIMULATIONS.
VALUE BASED ON UNWEIGHTED DATA.
0.76985
ESTIMATE OF THETA (1 subgroup processes)
0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397
THE TOTAL THETA1 IS:
1.1738
APPROXIMATE TEST OF CONCENTRATION OF TIES
WITHIN SUBGROUPS BASED ON
SIZE OF THETA1 subgroup processes:
THETA1 |
SUBGROUP | APPROX | APPROX
PROCESSES| LRT | P-VALUE
0.40 34.82 0.00
Reject null hypotheses of no clusters:H0:Θ1 subgroup processes =0
31
home
Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the
Correct Subgroups?
• Many algorithms search for optimal subgroups. KliqueFinder does not, but how different are the subgroups it finds from the optimal or known subgroups?
32
Output for Recovery of SubgroupsPREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUPMEMBERSHIP, + OR - .5734 (FOR A 95% CI)
1.4989
The Log odds applies to the following table:
OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | A | B |KNOWN | | |SUBGROUP |--------|--------| | | | SAME | C | D | | | | -------------------
THE LOGODDS TRANSLATES TO AN ODDS RATIO OF
4.4766
WHICH INDICATES THE INCREASE IN THE ODDSTHAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TOTHE SAME SUBGROUP IF THEY ARE TRULY IN THE IN THE SAME SUBGROUP.
33
Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide
Odds of Recovery (Toy Example)1 2 3 4 5 6
1 1 1 0 1 0
2 1 0 0 0 0
3 1 1 0 0 1
4 0 1 1 1 1
5 0 0 0 0 1
6 1 0 0 1 1 1
Simulated data with known subgroups
1 2 3 4 5 6
1 1 1 0 1 0
2 1 0 0 0 0
3 1 1 0 0 1
4 0 1 1 1 1
5 0 0 0 0 1
6 1 0 0 1 1
OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | | |KNOWN | A (6)| B (3)|SUBGROUP |--------|--------| | | | SAME | | | | C (2)| D (4)| -------------------
Observed subgroups identified by KliqueFinder
Missassignment of actor 4 contributes 3 to cell B and 2 to cell C
Cell D: 4 pairs correctly assigned to same subgroup:(1,2; 1,3; 2,3; 5,6)
Cell A: 6 pairs correctly assigned to different subgroups:1,5; 2,5; 3,5; 1,6; 2,6; 3,6
Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00
Sometimes Netdraw can’t find fileretrieve manually
36
Modifying Image in Netdraw
37
38
home 39
N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037|------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......|------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......|------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......|------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|
Density = 4/(4x8)=1/8Kliqfinder uses Density =4/(4x5)=.20 because maximum number of nominations is 5
Data used for multidimensional Scaling within subgroups. Distance=maximum value/cell entrye.g., maximum value is 4, So a tie of 2 4/2=2, distance of 2
DIRECT ASSOCIATIONS GROUP 1 2 3 4 LABEL A B C D N 4 6 8 6 GROUP 1 2.42 0.00 0.20 0.05 2 0.25 1.07 0.13 0.27 3 0.38 0.40 2.40 0.28 4 0.21 0.17 0.67 1.17
In xxxxxx.clusters
Distance in multidimensionalScaling between subgroups=maximum value /density
cohesion Structural similarity
video: (1:19:15-1:23:40)) ID: [email protected] PW:kenfrank201440
Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.
Choosing lines: Groups
41
home
Confidentiality/Ethical issues in Collecting Network Data
• Need names on survey
• Data can be confidential but not anonymous (especially for longitudinal)
• R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.” Social Networks 27 / 2 (2005): 89 – 93. Read it online. http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf
– (All issues of social networks available via science direct)
• Who benefits from network analysis? Who bears the cost?
– Kadushin, Charles “Who benefits from network analysis: ethics of social network research” Social Networks 27 / 2 (2005): Pages 139-153.
• Issues to raise when dealing with Human Subjects Board:
– Klovdahl, Alden S. Social network research and human subjects protection: Towards more effective infectious disease control Pages 119-137
• Hint on Human Subjects boards: they like precedents. Once you have one network study accepted, refer to it when submitting others!
• https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm
42
video : (1:23:41-1:28)ID: [email protected] PW:kenfrank2014
home
The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups
1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc.
Benefit: reveals location of resources relative to social; structureProtection: does not reveal specific responses because all information is at the
cluster level.
2) Provide locations from in a sociogram unique for each respondent, indicating where that person is located (“you are here”). But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses.
Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information.Protection: Specific responses of others not revealed, so confidentiality preserved.
43
home 44
Can even include names of actors
Using subgroups for feedback to respondents and in a proposal
Choosing Lines: Actor Level Within
45
Choosing Lines: Actor Level
Remove group nodes
46
Choosing Lines: Actor Level Between
47
Choosing Lines: Group Level
48
Modifying the Image: Adding Node Data or Relations
video : ID: [email protected] PW:kenfrank2014 : (1:49:35-2:07:48)
http://www.analytictech.com/ucinet/download.htm
http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAA&url=http%3A%2F%2Fwww.analytictech.com%2FNetdraw%2FNetdrawGuide.doc&ei=6pC4Tp29Men3sQLv99WoCA&usg=AFQjCNHg_NTjlHOclmeJkwQs2xRaiPYgXQ&sig2=WLwXKSjJq_Yinpfkwv0m4w
http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data
49
Files for KliqueFinder
50
xxxxxx.listInput data xxxxxx.ilabel xxxxxx.xnet
Node dataNetwork data Alternative network data
Kliqfind.parPrintoSimulate.par
Parameters
KliqueFinder
Output xxxxxx.clusters
Diagnosticsand matrix formatteddata
xxxxxx.vna
for Netdraw
xxxxxx.placeData containing actor ID’s and subgroup placement
Modifying node data by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace
*node dataid type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 *Node propertiesID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE
*Tie datafrom to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1*Tie propertiesFROM TO color size headcolor headsize active"0A " "0B " 12632256 1 12632256 0 TRUE"0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE
Add new node variable here (e.g. gender)then add data
51
Adding Node Attributes with Extra FileKliqueFinder will put attributes into vna file
52
File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data filexxxxxx.Ilabelxxxxxx.list
Cut and paste into stanne.Ilabel
stanne.list
1 Jacob 1 3 5 2 Stan 1 2 5 3 Linton 1 2 5 4 Charles 1 3 3 5 Mark 1 3 3 6 Tom 2 3 3 7 Ronald 2 3 5 8 Nan 2 1 3 9 Elizabeth 2 1 4 10 Barry 2 2 3 11 Martin 2 3 1 12 Steve 2 3 1 13 PeterC 2 1 5 14 Patrick 1 1 1 15 Katy 1 1 3 16 Kathleen 3 3 3 17 Ove 2 2 2 18 JamesC 5 5 5 19 Robert 4 4 4 20 JamesM 1 2 3 4 21 Noah 4 3 2 1 22 Marijtje 1 2 1 2 23 Ronald 2 1 2 1 24 Harrison 3 1 3 1 25 Duncan 4 1 4 1
10 columns for ID; Skip a space; Name; Node attribute 1-5
53
54
Interactive: adding node data
or
55
56
Include Node Data in Image
57
Modifying Links
Lines indicate friendships: solid within subgroups, dotted between subgroups.
numbers represent actors
Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration
Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686
58
Hostile Actions
59
Supportive Actions
60
A
B
C
D
E
Within Subgrop Scale Expanded by a Factor of 9 ID/Grade Level; - within subgroups, ... between subgroups.
-45
-35
-25
-15
-5
5
15
25
35
-25 -15 -5 5 15 25
• Each number is a teacher• G_ indicates grade in which teacher teaches• Lines connecting two numbers indicate teachers who are close colleaguesSolid lines within subgroups, dashed between• Circles indicate cohesive subgroups
61
home
Ripple Plot
• Overlay talk about technology on social geography of crystallized sociogram
• Lines indicate talk about technology
• Size of dot indicates teacher’s use of technology at time 1
• Ripples indicate increase in use from time 1 to time 2
62
Frank, K. A. and Zhao, Y. (2005). "Subgroups as a Meso-Level Entity in the Social Organization of Schools." Chapter 10, pages 279-318. Book honoring Charles Bidwell's retirement, edited by Larry Hedges and Barbara Schneider. New York: Sage publications.
63
Modifying Links by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace
*node dataid type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 *Node propertiesID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE
*Tie datafrom to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1*Tie propertiesFROM TO color size headcolor headsize active"0A " "0B " 12632256 1 12632256 0 TRUE"0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE
Add new node variable here (e.g. gender)then add data
Add new relation here (e.g. technology)then add data
64
Modifying Links with Extra FileKliqueFinder will put attributes into vna file
65
File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data filexxxxxx.xnetxxxxxx.list
stanne.xnet
stanne.list 1 2 4 19 15 3 22 26 1
Nominator nominee strength of tie
File containing extra network
66
Modifying Links: Interactive – Finicky
67
Interactive Modifying Links
68
Two mode*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events.
Social Networks 28:97-123. * co first authors.
1
2
Data source
video : ID: [email protected] PW:kenfrank2014:(1:39:25-1:49:35)69
Copy homact.list from c:\kliqfind/setups to c:\kliqfind
70
Two-mode Data
Actor 1 participates in event 19 at a level of 1Extent of relation can be binary or weighted
Edgelist
First two rows do not appear in the data –I put them there to show the format: 10 spaces for each entry
71
New version of KliqueFinder is more flexible About 10 column widths.
ID’s should be 6 digits or lessPrepping data in excel Prepping Data in UCINETConverting data using sas
Two mode Clusters output
72
Blocked Two-Mode Blocked Network Data
73
Two-mode Crystallized Sociogram
74
home
Centralization & Centrality in KliqueFinder• KliqueFinder produces a measure of Warp.
• Starts with distances defined by– Maximum value in network / observed value
• E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4.
– These are the distances used in the MDS to produce the sociograms (see “running KliqueFinder ppt”)
• Obtains eigen values – within each cluster based on raw data within cluster– Between clusters based on 1/density of ties between clusters
• Density=average value in a given block
• Warp =sum of positive eigen values/sum of all eigen values– Note it does not use the square root of the eigen values (variances are more
additive)• Output into xxxxxx.bcord (9th element) and into netdraw as node
attribute for groups, called “centrality”• Centrality for individuals is distance to the center of their subgroup
(radius).
75
Running on a Large Data File (more than 1000 actors)
76
If you start the program and it just sits there, it is looking for the best seed for the first subgroup. Seed is 3 actors, but it looks for all combinations of 3 that share common ties in network. Intensive, and unnecessary for large data (1st subgroup does not matter so much). To shortcut: change value from 12. save & run.
home
Software Challenge video : ID: [email protected] PW:kenfrank2014 :(2:07:57-2:08:15)
• Analyze nonpr1.list– Evidence of clusters?– Performance of algorithm?
• Replace lines with nonpr2
• Describe the KliqueFinder algorithm
77
home
KliqueFinder Applications:Adding Individual Attributes in
SAS:run KliqueFinder
data file collt1.list
make graphuse ID from other file? Yes:
sas file name: c:\kliqfind\indiv [be sure to include full path]
id variable: nominatorstring variable: gradelevSave
In sas, run socgramz in the working directory
78
home
KliqueFinder Applications:Adding Individual Attributes:
• Select “Yes” for “User ID (character) from other SAS file?”
79
home
KliqueFinder Applications:Adding Individual Attributes:
• Type the following information in the corresponding boxes
• Then Click “Save”80
Choosing an ID Variable
81
home
With ID based on Grade
82
home
KliqueFinder Applications:Replacing Lines
run KliqueFinder
data file collt1.list
make graph
save
retrieve socgramz.sas in the working directory
replace all occurrences of collt1.list with collt2.list
run
83
Opening socgramz.sas
84
Changing lines
85
Change lines to different source
86
New Lines based on Collt2
87
Batch KliqueFinder
88
home
Basics
• Program runs KliqueFinder on multiple files• Input
– List of filenames– Files containing data– BACK UP YOUR DATA FIRST!
• Output– Clustering output (.place, .clusters, vna) for
each list file
89
File containing names of data files: testb.txt
Data file: stanne.listData file: ffe.list
Files
90
BACK UP YOUR DATA FIRST!
KliqueFinder
• Browse to directory you want to work in
• Choose “Basic setup” and then click “Run setup file” button.
91
Running Batch Mode
92
File with names of data files
Click here to run as batch
BACK UP DATA FILES BEFORE RUNNING!
Name your file xxxxxx.liste.g., test01.list
Right click
Choose Formatted text (space delimited)
93
Prepping data in excelvideo : ID: [email protected] PW:kenfrank2014 :Time: (1:28-1:39)
Prepping Data in UCINET
Navigate to where you want to save:c:\kliqfind
Navigate to UCINET data
94
Must remove “!” from file. There may be several
!’s points are there because of Multiple data sets
95
Converting data using sasvideo : ID: [email protected] PW:kenfrank2014 : :
Time: (2:10:43-2:19)
data one;infile "badform.list";input chooser chosen wt;
data two; set one;file "ready1.list";if wt ne . then put (chooser chosen wt) (10.);run;
96
A Priori ClustersA line with 99999 in the data file indicates in which a priori cluster an actor is placed. For example, actor 1 is in a priori cluster 3.Run repeat2 setup, and then proceed as usual.
Remember to do “new data” setup when done. KliqueFinder will make pictures based on a priori clusters
97
Comparison of A Priori Clusters and Identified Solution
Data with a priori cluster assignmentsRun as new data
Run as usual then look at cluster output
SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL POSS STANDARDIZED 52. 88. 9.55565
QAP standardized measure, compare with normal distribution
98
Data Containing Cluster Assignments
1.0 1.0 2.0 1.0 3.0 2.0 2.0 2.0 1.0 3.0 3.0 4.0 1.0 1.0 3.0 4.0 19.0 4.0 1.0 3.0 5.0 23.0 4.0 1.0 3.0 6.0 26.0 2.0 1.0 3.0 17.0 6.0 3.0 1.0 3.0 18.0 8.0 3.0 1.0 3.0 19.0 20.0 3.0 1.0 3.0 20.0 15.0 1.0 1.0 3.0 21.0 12.0 2.0 1.0 3.0 22.0 17.0 4.0 1.0 3.0 23.0 16.0 4.0 1.0 3.0 24.0 27.0 4.0 1.0 3.0 -27.0 28.0 4.0 1.0 3.0
File called stanne.place [datafile.place]
Internal ID User ID Cluster ignore: for simulation only
If first number (internal ID) is negative, this indicates a tagalong – an actor connected to only one other. In this case, the last line should be read as the tagee, tagger, and group. So, actor 28 is connected to only one other actor (27) and is therefore assigned to actor 27’s cluster, which is cluster 4.
There may be Slightly different numeric formatsDepending on the version of KliqueFinder
99
Including Cluster Membership in Influence Model
100
SPSS
DATA LIST / intid 1-10 nominee 11-20 cluster 21-30 simx 31-40 extra 41-50.BEGIN DATA 1.0 1.0 1.0 1.0 3.0 2.0 2.0 1.0 1.0 3.0 3.0 3.0 1.0 1.0 3.0 4.0 4.0 2.0 1.0 3.0 5.0 5.0 2.0 1.0 3.0 6.0 6.0 2.0 1.0 3.0END DATA.DATASET NAME clusters WINDOW=FRONT.SORT CASES BY nominee(A).EXECUTE.
MATCH FILES /FILE=yvar1 /FILE='indeg' /FILE=clusters /BY nominee.EXECUTE.
SAS
data clusters;*groups from KLiqueFinder; input intid nominator cluster simx extra;cards; 1.0 1.0 1.0 1.0 3.0 2.0 2.0 1.0 1.0 3.0 3.0 3.0 1.0 1.0 3.0 4.0 4.0 2.0 1.0 3.0 5.0 5.0 2.0 1.0 3.0 6.0 6.0 2.0 1.0 3.0
proc sort data=groups;by nominator;
data withinfl;merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator));by nominator;drop nominee _type_ _freq_;
advanced:run influence model for technologyIdentify clusters from talkt2Include cluster membership the influence model
Adding Patches
101
Patch for Two-mode
Patch for one-mode
home
Alternative community detection algorithms
• http://cs.stanford.edu/people/jure/pubs/communities-www10.pdf
• http://www.uvm.edu/~pdodds/files/papers/others/2009/lancichinetti2009a.pdf
• http://fatweasel.net/analytics/network-analysis/community-detection-in-networks/
102