102
home KliqueFinder: Identifying Clusters in Network Data Kenneth A. Frank Michigan State University Based on: Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56 Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119. Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006). "Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123 * co first authors. https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representa tion 1

KliqueFinder: Identifying Clusters in Network Data

Embed Size (px)

DESCRIPTION

KliqueFinder: Identifying Clusters in Network Data. Kenneth A. Frank Michigan State University Based on: Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56 Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119. - PowerPoint PPT Presentation

Citation preview

Page 1: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder: Identifying Clusters in Network Data

Kenneth A. Frank

Michigan State University

Based on:

• Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56

• Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.

• Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006). "Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123 * co first authors.

• https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representation1

Page 2: KliqueFinder:  Identifying  Clusters in Network Data

home

Overview

• Clustering and Graphical Representations of Networks

• Running KliqueFinder...

– Step 1) Criteria for Determining Group Membership

– Step 2: Maximizing Criterion

– Step 3) Examine evidence of clusters

– Step 4) Evaluating the Performance of the Algorithm : Did...

• Make Sociogram in Netdraw

• Confidentiality/Ethical issues in Collecting Network Data

• Modifying the Image: Adding Node Data or Relations...

• Two mode

• Software Challenge...

• Batch KliqueFinder

• Prepping Converting data

• A Priori Clusters2

Page 3: KliqueFinder:  Identifying  Clusters in Network Data

home

Clustering and Graphical Representations of Networksvideo : (26:09-31:41): ID: [email protected] PW:kenfrank2014

Goal: to identify patterns in the network

• Rearrange rows and columns of social network matrix to reveal clustering

• Plot actors and ties in two dimensions to reveal clustering

3

Page 4: KliqueFinder:  Identifying  Clusters in Network Data

home

Theory for defining cluster membership

• cohesion (clusters are called subgroups): an actor should be in a cluster if the actor has demonstrated a preference for engaging in ties with members of the cluster.

– Result: ties are concentrated within subgroups• structural equivalence (blocks): an actor should be in a cluster if the

actor engages in a similar pattern of ties as members of that cluster.

– Result: blocks represent positions, but ties not necessarily concentrated within blocks.

4

Page 5: KliqueFinder:  Identifying  Clusters in Network Data

Crystallized Sociogram: Friendships Among the French Financial Elite

Lines indicate friendships: solid within subgroups, dotted between subgroups.

numbers represent actors

Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration

Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686

5

Page 6: KliqueFinder:  Identifying  Clusters in Network Data

Crystallized Sociogram: Clusters in Foodwebs

Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure." Nature 426:282-285

6

Page 7: KliqueFinder:  Identifying  Clusters in Network Data

Data Input

7

Old (10 spaces for each) New: flexible columns,

Same results

File name must be less than 20 character. Best if file name is six characters followed by .list: xxxxxx.list . For example stanne.list

Actor 1 interacts with actor 2 at a level of 3Extent of relation can be binary or weighted

Prepping data in excel Prepping Data in UCINETConverting data using sas

ID’s should be 6 digits or less

Page 8: KliqueFinder:  Identifying  Clusters in Network Data

Data

Actor 1 interacts with actor 2 at a level of 3Extent of relation can be binary or weighted

Best if file name is six characters followed by .list.xxxxxx.listFor example stanne.list

New version of KliqueFinder is more flexible About 10 column widths.

ID’s should be 6 digits or lessPrepping data in excel Prepping Data in UCINETConverting data using sas

Edgelist

First two rows do not appear in the data –I put them there to show the format: 10 spaces for each entry

8

Page 9: KliqueFinder:  Identifying  Clusters in Network Data

home

Steps for finding clustersvideo: (31:41-43:30): ID: [email protected] PW:kenfrank2014

1) Determine criterion for defining clusters2) Maximize criterion3) Examine evidence of clusters4) Evaluate performance of the algorithm5) Interpret clusters

commonality of attributesfocal experiencessubsequent behavior

9

Page 10: KliqueFinder:  Identifying  Clusters in Network Data

home

Step 1) Criteria for Determining Group Membership

Structural Equivalence:Factor analyze sociomatrix (Katz & Kahn)iteratively rearrange and revalue rows and

columns (CONCORR -- White el al., 1976)

Cohesionutilize fixed criteria (e.g., must be connected to at

least k others in clusters, or must be minimal path length from k others, etc).

use flexible criterion -- preference relative to group sizes and number of ties:

10

Page 11: KliqueFinder:  Identifying  Clusters in Network Data

Model Based Cohesion

Wii’=1 if tie between actors i and i’, 0 otherwise

samegroupii’ =1 if actors i and i’ are members of the same subgroup,

0 otherwise.

Then θ1 represents subgroups salience:

So ...... Maximize θ1 (odds ratio)

11

Page 12: KliqueFinder:  Identifying  Clusters in Network Data

home

Odds Ratio for Association Between Common Subgroup Membership and

The Occurrence of Ties Between Actors

12

Page 13: KliqueFinder:  Identifying  Clusters in Network Data

home

Step 2: Maximizing Criterion

• 1) find a subgroup seed (3 actors who interact with each other, and with similar others)

• 2) add to the cluster to maximize θ1 until you cannot do any more

• 3) start new subgroup with new seed• 4) shuffle between existing subgroups• 5) make new subgroups as necessary,

dissolve existing ones as necessary.13

Page 14: KliqueFinder:  Identifying  Clusters in Network Data

KliqueFinder Algorithm: Phase I

Find subgroup seed of 2 or 3

Identify single move that most increases objective function θ1

Does move increase function?

yes

Reassign actor that makes best move

No

If assignment moves actor out of a group of 3, reassign reamaining 2 to next best groups

For finding best subgroup seed: 1) can only choose from unaffiliated actors2) Each actor can only be a seed once

Initialize: assign each actor to own subgroup

Computationally intensive, modify for large networks

Page 15: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder Algorithm: Phases II and III

• Phase II: If best move does not increase objective function and there are fewer than 3 actors available for subgroups then– Attach all isolated (or singleton) actors to best

existing subgroups, even if this reduces objective function

• Phase III: shuffle actors between existing subgroups without seeding new ones or disbanding existing ones– Number of subgroups is fixed– This is simple hill climbing and can be cast as EM

algorithm

Page 16: KliqueFinder:  Identifying  Clusters in Network Data

home

Running KliqueFindervideo  :(43:30-1:01:00):

ID: [email protected] PW:kenfrank2014

• Click on “Browse…” button to specify the directory where the data file is located.

• Download KliqueFinder at

–http://hlmsoft.net/wkf/–Follow instructions to install. Put in c:\kliqfind–Mac users: vmware fusion, Windows 7, 32 bit: http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/

16

Page 17: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder

• Choose “Basic setup” and then click “Run setup file” button.

17

Page 18: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder

• Click on the “Browse” button to choose a data file.

18

Page 19: KliqueFinder:  Identifying  Clusters in Network Data

Run AnalysisData file

19

Page 20: KliqueFinder:  Identifying  Clusters in Network Data

New Version of Data Input more Flexible

20

Old (10 spaces for each) New: flexible columns,

Same results

File name must be less than 20 charactersID’s should be 6 digits or less

Actor 1 interacts with actor 2 at a level of 3Extent of relation can be binary or weighted

Prepping data in excel Prepping Data in UCINETConverting data using sas

Page 21: KliqueFinder:  Identifying  Clusters in Network Data

View Clusters Output

21

Page 22: KliqueFinder:  Identifying  Clusters in Network Data

N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037|------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......|------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......|------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......|------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|

θ1 =1.1738

Blocked Network Data

22

Page 23: KliqueFinder:  Identifying  Clusters in Network Data

home

Step 3) Examine evidence of clusters

1) randomly redistribute ties

2) apply algorithm

3) record value of odds ratio and θ1

4) repeat 1000 times to generate distribution

5) use mean of distribution as baseline for comparison

23

Page 24: KliqueFinder:  Identifying  Clusters in Network Data

home

Randomly Redistributing Ties

24

Page 25: KliqueFinder:  Identifying  Clusters in Network Data

home

Apply Algorithm to Random Data,

25θ1=.81822

Page 26: KliqueFinder:  Identifying  Clusters in Network Data

Monte Carlo Sampling Distributionvideo: (1:06:35-1:18:50) ID: [email protected] PW:kenfrank2014

Output in sampdist.dat

θ1=Log odds/2 Odds Ratio

Set up sampling. Remember to do “new data” set up when doneTo prepare for next analysis

Indicate simulate dataData can include weights

26

Page 27: KliqueFinder:  Identifying  Clusters in Network Data

Code for Reading in Sample Distribution Data

GET DATA /TYPE=TXT /FILE="C:\KLIQFIND\sampdist.dat" /FIXCASE=1 /ARRANGEMENT=FIXED /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= /1 theta1 0-29 F30.10 oddsratio 30-59 F30.10 samplesize 60-89 F30.10.CACHE.EXECUTE.DATASET NAME DataSet9 WINDOW=FRONT.

DATASET ACTIVATE DataSet9.GRAPH /HISTOGRAM=theta1.

spss

title "Sampling distribution for theta1";data one;infile "sampdist.dat" missover;Input theta1 odds1;

proc univariate plot;var theta1;

SAS

27

Stata

*This command imports the data fileimport delimited C:\KLIQFIND\sampdist.dat, delimiter(" ", asstring)

*These commands perform data management:drop v1rename v2 theta1rename v3 oddsratiorename v4 samplesize

*This command plots histogram for theta1:hist theta1,freq

Page 28: KliqueFinder:  Identifying  Clusters in Network Data

Comparison of Sampling Distributions

28

Page 29: KliqueFinder:  Identifying  Clusters in Network Data

Distribution of θ1base From Application of the Algorithm to Data Simulated Without Regard for Subgroup Membership

Observed value: 1.17381.1738

29

Page 30: KliqueFinder:  Identifying  Clusters in Network Data

Sampling Distribution Parameters

Edit simulation parameters.First element is number of replications

30

Must keep # of reps in first 5 columns

Page 31: KliqueFinder:  Identifying  Clusters in Network Data

home

Approximate p-value Based on Previous Simulations

PREDICTED THETA (1 base) BASED ON SIMULATIONS.

VALUE BASED ON UNWEIGHTED DATA.

0.76985

ESTIMATE OF THETA (1 subgroup processes)

0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397

THE TOTAL THETA1 IS:

1.1738

APPROXIMATE TEST OF CONCENTRATION OF TIES

WITHIN SUBGROUPS BASED ON

SIZE OF THETA1 subgroup processes:

THETA1 |

SUBGROUP | APPROX | APPROX

PROCESSES| LRT | P-VALUE

0.40 34.82 0.00

Reject null hypotheses of no clusters:H0:Θ1 subgroup processes =0

31

Page 32: KliqueFinder:  Identifying  Clusters in Network Data

home

Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the

Correct Subgroups?

• Many algorithms search for optimal subgroups. KliqueFinder does not, but how different are the subgroups it finds from the optimal or known subgroups?

32

Page 33: KliqueFinder:  Identifying  Clusters in Network Data

Output for Recovery of SubgroupsPREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUPMEMBERSHIP, + OR - .5734 (FOR A 95% CI)

1.4989

The Log odds applies to the following table:

OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | A | B |KNOWN | | |SUBGROUP |--------|--------| | | | SAME | C | D | | | | -------------------

THE LOGODDS TRANSLATES TO AN ODDS RATIO OF

4.4766

WHICH INDICATES THE INCREASE IN THE ODDSTHAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TOTHE SAME SUBGROUP IF THEY ARE TRULY IN THE IN THE SAME SUBGROUP.

33

Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide

Page 34: KliqueFinder:  Identifying  Clusters in Network Data

Odds of Recovery (Toy Example)1 2 3 4 5 6

1 1 1 0 1 0

2 1 0 0 0 0

3 1 1 0 0 1

4 0 1 1 1 1

5 0 0 0 0 1

6 1 0 0 1 1 1

Simulated data with known subgroups

1 2 3 4 5 6

1 1 1 0 1 0

2 1 0 0 0 0

3 1 1 0 0 1

4 0 1 1 1 1

5 0 0 0 0 1

6 1 0 0 1 1

OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | | |KNOWN | A (6)| B (3)|SUBGROUP |--------|--------| | | | SAME | | | | C (2)| D (4)| -------------------

Observed subgroups identified by KliqueFinder

Missassignment of actor 4 contributes 3 to cell B and 2 to cell C

Cell D: 4 pairs correctly assigned to same subgroup:(1,2; 1,3; 2,3; 5,6)

Cell A: 6 pairs correctly assigned to different subgroups:1,5; 2,5; 3,5; 1,6; 2,6; 3,6

Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00

Page 35: KliqueFinder:  Identifying  Clusters in Network Data

Make Sociogram in Netdrawvideo  : (1:01:00-1:06:22):

ID: [email protected] PW:kenfrank2014

35

Page 36: KliqueFinder:  Identifying  Clusters in Network Data

Sometimes Netdraw can’t find fileretrieve manually

36

Page 37: KliqueFinder:  Identifying  Clusters in Network Data

Modifying Image in Netdraw

37

Page 38: KliqueFinder:  Identifying  Clusters in Network Data

38

Page 39: KliqueFinder:  Identifying  Clusters in Network Data

home 39

N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037|------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......|------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......|------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......|------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|

Density = 4/(4x8)=1/8Kliqfinder uses Density =4/(4x5)=.20 because maximum number of nominations is 5

Data used for multidimensional Scaling within subgroups. Distance=maximum value/cell entrye.g., maximum value is 4, So a tie of 2 4/2=2, distance of 2

DIRECT ASSOCIATIONS GROUP 1 2 3 4 LABEL A B C D N 4 6 8 6 GROUP 1 2.42 0.00 0.20 0.05 2 0.25 1.07 0.13 0.27 3 0.38 0.40 2.40 0.28 4 0.21 0.17 0.67 1.17

In xxxxxx.clusters

Distance in multidimensionalScaling between subgroups=maximum value /density

Page 41: KliqueFinder:  Identifying  Clusters in Network Data

Choosing lines: Groups

41

Page 42: KliqueFinder:  Identifying  Clusters in Network Data

home

Confidentiality/Ethical issues in Collecting Network Data

• Need names on survey

• Data can be confidential but not anonymous (especially for longitudinal)

• R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.” Social Networks 27 / 2 (2005): 89 – 93. Read it online. http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf

– (All issues of social networks available via science direct)

• Who benefits from network analysis? Who bears the cost?

– Kadushin, Charles “Who benefits from network analysis: ethics of social network research” Social Networks 27 / 2 (2005): Pages 139-153.

• Issues to raise when dealing with Human Subjects Board:

– Klovdahl, Alden S. Social network research and human subjects protection: Towards more effective infectious disease control Pages 119-137

• Hint on Human Subjects boards: they like precedents. Once you have one network study accepted, refer to it when submitting others!

• https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm

42

video : (1:23:41-1:28)ID: [email protected] PW:kenfrank2014

Page 43: KliqueFinder:  Identifying  Clusters in Network Data

home

The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups

1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc.

Benefit: reveals location of resources relative to social; structureProtection: does not reveal specific responses because all information is at the

cluster level.

2) Provide locations from in a sociogram unique for each respondent, indicating where that person is located (“you are here”). But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses.

Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information.Protection: Specific responses of others not revealed, so confidentiality preserved.

43

Page 44: KliqueFinder:  Identifying  Clusters in Network Data

home 44

Can even include names of actors

Using subgroups for feedback to respondents and in a proposal

Page 45: KliqueFinder:  Identifying  Clusters in Network Data

Choosing Lines: Actor Level Within

45

Page 46: KliqueFinder:  Identifying  Clusters in Network Data

Choosing Lines: Actor Level

Remove group nodes

46

Page 47: KliqueFinder:  Identifying  Clusters in Network Data

Choosing Lines: Actor Level Between

47

Page 48: KliqueFinder:  Identifying  Clusters in Network Data

Choosing Lines: Group Level

48

Page 49: KliqueFinder:  Identifying  Clusters in Network Data

Modifying the Image: Adding Node Data or Relations

video : ID: [email protected] PW:kenfrank2014 : (1:49:35-2:07:48)

http://www.analytictech.com/ucinet/download.htm

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAA&url=http%3A%2F%2Fwww.analytictech.com%2FNetdraw%2FNetdrawGuide.doc&ei=6pC4Tp29Men3sQLv99WoCA&usg=AFQjCNHg_NTjlHOclmeJkwQs2xRaiPYgXQ&sig2=WLwXKSjJq_Yinpfkwv0m4w

http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data

49

Page 50: KliqueFinder:  Identifying  Clusters in Network Data

Files for KliqueFinder

50

xxxxxx.listInput data xxxxxx.ilabel xxxxxx.xnet

Node dataNetwork data Alternative network data

Kliqfind.parPrintoSimulate.par

Parameters

KliqueFinder

Output xxxxxx.clusters

Diagnosticsand matrix formatteddata

xxxxxx.vna

for Netdraw

xxxxxx.placeData containing actor ID’s and subgroup placement

Page 51: KliqueFinder:  Identifying  Clusters in Network Data

Modifying node data by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace

*node dataid type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 *Node propertiesID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE

*Tie datafrom to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1*Tie propertiesFROM TO color size headcolor headsize active"0A " "0B " 12632256 1 12632256 0 TRUE"0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE

Add new node variable here (e.g. gender)then add data

51

Page 52: KliqueFinder:  Identifying  Clusters in Network Data

Adding Node Attributes with Extra FileKliqueFinder will put attributes into vna file

52

File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data filexxxxxx.Ilabelxxxxxx.list

Cut and paste into stanne.Ilabel

stanne.list

1 Jacob 1 3 5 2 Stan 1 2 5 3 Linton 1 2 5 4 Charles 1 3 3 5 Mark 1 3 3 6 Tom 2 3 3 7 Ronald 2 3 5 8 Nan 2 1 3 9 Elizabeth 2 1 4 10 Barry 2 2 3 11 Martin 2 3 1 12 Steve 2 3 1 13 PeterC 2 1 5 14 Patrick 1 1 1 15 Katy 1 1 3 16 Kathleen 3 3 3 17 Ove 2 2 2 18 JamesC 5 5 5 19 Robert 4 4 4 20 JamesM 1 2 3 4 21 Noah 4 3 2 1 22 Marijtje 1 2 1 2 23 Ronald 2 1 2 1 24 Harrison 3 1 3 1 25 Duncan 4 1 4 1

10 columns for ID; Skip a space; Name; Node attribute 1-5

Page 53: KliqueFinder:  Identifying  Clusters in Network Data

53

Page 54: KliqueFinder:  Identifying  Clusters in Network Data

54

Page 55: KliqueFinder:  Identifying  Clusters in Network Data

Interactive: adding node data

or

55

Page 56: KliqueFinder:  Identifying  Clusters in Network Data

56

Page 57: KliqueFinder:  Identifying  Clusters in Network Data

Include Node Data in Image

57

Page 58: KliqueFinder:  Identifying  Clusters in Network Data

Modifying Links

Lines indicate friendships: solid within subgroups, dotted between subgroups.

numbers represent actors

Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration

Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686

58

Page 59: KliqueFinder:  Identifying  Clusters in Network Data

Hostile Actions

59

Page 60: KliqueFinder:  Identifying  Clusters in Network Data

Supportive Actions

60

Page 61: KliqueFinder:  Identifying  Clusters in Network Data

A

B

C

D

E

Within Subgrop Scale Expanded by a Factor of 9 ID/Grade Level; - within subgroups, ... between subgroups.

-45

-35

-25

-15

-5

5

15

25

35

-25 -15 -5 5 15 25

• Each number is a teacher• G_ indicates grade in which teacher teaches• Lines connecting two numbers indicate teachers who are close colleaguesSolid lines within subgroups, dashed between• Circles indicate cohesive subgroups

61

Page 62: KliqueFinder:  Identifying  Clusters in Network Data

home

Ripple Plot

• Overlay talk about technology on social geography of crystallized sociogram

• Lines indicate talk about technology

• Size of dot indicates teacher’s use of technology at time 1

• Ripples indicate increase in use from time 1 to time 2

62

Page 63: KliqueFinder:  Identifying  Clusters in Network Data

Frank, K. A. and Zhao, Y. (2005). "Subgroups as a Meso-Level Entity in the Social Organization of Schools." Chapter 10, pages 279-318. Book honoring Charles Bidwell's retirement, edited by Larry Hedges and Barbara Schneider. New York: Sage publications.

63

Page 64: KliqueFinder:  Identifying  Clusters in Network Data

Modifying Links by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace

*node dataid type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 *Node propertiesID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE

*Tie datafrom to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1*Tie propertiesFROM TO color size headcolor headsize active"0A " "0B " 12632256 1 12632256 0 TRUE"0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE

Add new node variable here (e.g. gender)then add data

Add new relation here (e.g. technology)then add data

64

Page 65: KliqueFinder:  Identifying  Clusters in Network Data

Modifying Links with Extra FileKliqueFinder will put attributes into vna file

65

File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data filexxxxxx.xnetxxxxxx.list

stanne.xnet

stanne.list 1 2 4 19 15 3 22 26 1

Nominator nominee strength of tie

File containing extra network

Page 66: KliqueFinder:  Identifying  Clusters in Network Data

66

Page 67: KliqueFinder:  Identifying  Clusters in Network Data

Modifying Links: Interactive – Finicky

67

Page 68: KliqueFinder:  Identifying  Clusters in Network Data

Interactive Modifying Links

68

Page 69: KliqueFinder:  Identifying  Clusters in Network Data

Two mode*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events.

Social Networks 28:97-123. * co first authors.

1

2

Data source

video : ID: [email protected] PW:kenfrank2014:(1:39:25-1:49:35)69

Page 70: KliqueFinder:  Identifying  Clusters in Network Data

Copy homact.list from c:\kliqfind/setups to c:\kliqfind

70

Page 71: KliqueFinder:  Identifying  Clusters in Network Data

Two-mode Data

Actor 1 participates in event 19 at a level of 1Extent of relation can be binary or weighted

Edgelist

First two rows do not appear in the data –I put them there to show the format: 10 spaces for each entry

71

New version of KliqueFinder is more flexible About 10 column widths.

ID’s should be 6 digits or lessPrepping data in excel Prepping Data in UCINETConverting data using sas

Page 72: KliqueFinder:  Identifying  Clusters in Network Data

Two mode Clusters output

72

Page 73: KliqueFinder:  Identifying  Clusters in Network Data

Blocked Two-Mode Blocked Network Data

73

Page 74: KliqueFinder:  Identifying  Clusters in Network Data

Two-mode Crystallized Sociogram

74

Page 75: KliqueFinder:  Identifying  Clusters in Network Data

home

Centralization & Centrality in KliqueFinder• KliqueFinder produces a measure of Warp.

• Starts with distances defined by– Maximum value in network / observed value

• E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4.

– These are the distances used in the MDS to produce the sociograms (see “running KliqueFinder ppt”)

• Obtains eigen values – within each cluster based on raw data within cluster– Between clusters based on 1/density of ties between clusters

• Density=average value in a given block

• Warp =sum of positive eigen values/sum of all eigen values– Note it does not use the square root of the eigen values (variances are more

additive)• Output into xxxxxx.bcord (9th element) and into netdraw as node

attribute for groups, called “centrality”• Centrality for individuals is distance to the center of their subgroup

(radius).

75

Page 76: KliqueFinder:  Identifying  Clusters in Network Data

Running on a Large Data File (more than 1000 actors)

76

If you start the program and it just sits there, it is looking for the best seed for the first subgroup. Seed is 3 actors, but it looks for all combinations of 3 that share common ties in network. Intensive, and unnecessary for large data (1st subgroup does not matter so much). To shortcut: change value from 12. save & run.

Page 77: KliqueFinder:  Identifying  Clusters in Network Data

home

Software Challenge video : ID: [email protected] PW:kenfrank2014 :(2:07:57-2:08:15)

• Analyze nonpr1.list– Evidence of clusters?– Performance of algorithm?

• Replace lines with nonpr2

• Describe the KliqueFinder algorithm

77

Page 78: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder Applications:Adding Individual Attributes in

SAS:run KliqueFinder

data file collt1.list

make graphuse ID from other file? Yes:

sas file name: c:\kliqfind\indiv [be sure to include full path]

id variable: nominatorstring variable: gradelevSave

In sas, run socgramz in the working directory

78

Page 79: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder Applications:Adding Individual Attributes:

• Select “Yes” for “User ID (character) from other SAS file?”

79

Page 80: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder Applications:Adding Individual Attributes:

• Type the following information in the corresponding boxes

• Then Click “Save”80

Page 81: KliqueFinder:  Identifying  Clusters in Network Data

Choosing an ID Variable

81

Page 82: KliqueFinder:  Identifying  Clusters in Network Data

home

With ID based on Grade

82

Page 83: KliqueFinder:  Identifying  Clusters in Network Data

home

KliqueFinder Applications:Replacing Lines

run KliqueFinder

data file collt1.list

make graph

save

retrieve socgramz.sas in the working directory

replace all occurrences of collt1.list with collt2.list

run

83

Page 84: KliqueFinder:  Identifying  Clusters in Network Data

Opening socgramz.sas

84

Page 85: KliqueFinder:  Identifying  Clusters in Network Data

Changing lines

85

Page 86: KliqueFinder:  Identifying  Clusters in Network Data

Change lines to different source

86

Page 87: KliqueFinder:  Identifying  Clusters in Network Data

New Lines based on Collt2

87

Page 88: KliqueFinder:  Identifying  Clusters in Network Data

Batch KliqueFinder

88

Page 89: KliqueFinder:  Identifying  Clusters in Network Data

home

Basics

• Program runs KliqueFinder on multiple files• Input

– List of filenames– Files containing data– BACK UP YOUR DATA FIRST!

• Output– Clustering output (.place, .clusters, vna) for

each list file

89

Page 90: KliqueFinder:  Identifying  Clusters in Network Data

File containing names of data files: testb.txt

Data file: stanne.listData file: ffe.list

Files

90

BACK UP YOUR DATA FIRST!

Page 91: KliqueFinder:  Identifying  Clusters in Network Data

KliqueFinder

• Browse to directory you want to work in

• Choose “Basic setup” and then click “Run setup file” button.

91

Page 92: KliqueFinder:  Identifying  Clusters in Network Data

Running Batch Mode

92

File with names of data files

Click here to run as batch

BACK UP DATA FILES BEFORE RUNNING!

Page 93: KliqueFinder:  Identifying  Clusters in Network Data

Name your file xxxxxx.liste.g., test01.list

Right click

Choose Formatted text (space delimited)

93

Prepping data in excelvideo : ID: [email protected] PW:kenfrank2014 :Time: (1:28-1:39)

Page 94: KliqueFinder:  Identifying  Clusters in Network Data

Prepping Data in UCINET

Navigate to where you want to save:c:\kliqfind

Navigate to UCINET data

94

Page 95: KliqueFinder:  Identifying  Clusters in Network Data

Must remove “!” from file. There may be several

!’s points are there because of Multiple data sets

95

Page 96: KliqueFinder:  Identifying  Clusters in Network Data

Converting data using sasvideo : ID: [email protected] PW:kenfrank2014 :  :

Time: (2:10:43-2:19)

data one;infile "badform.list";input chooser chosen wt;

data two; set one;file "ready1.list";if wt ne . then put (chooser chosen wt) (10.);run;

96

Page 97: KliqueFinder:  Identifying  Clusters in Network Data

A Priori ClustersA line with 99999 in the data file indicates in which a priori cluster an actor is placed. For example, actor 1 is in a priori cluster 3.Run repeat2 setup, and then proceed as usual.

Remember to do “new data” setup when done. KliqueFinder will make pictures based on a priori clusters

97

Page 98: KliqueFinder:  Identifying  Clusters in Network Data

Comparison of A Priori Clusters and Identified Solution

Data with a priori cluster assignmentsRun as new data

Run as usual then look at cluster output

SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL POSS STANDARDIZED 52. 88. 9.55565

QAP standardized measure, compare with normal distribution

98

Page 99: KliqueFinder:  Identifying  Clusters in Network Data

Data Containing Cluster Assignments

1.0 1.0 2.0 1.0 3.0 2.0 2.0 2.0 1.0 3.0 3.0 4.0 1.0 1.0 3.0 4.0 19.0 4.0 1.0 3.0 5.0 23.0 4.0 1.0 3.0 6.0 26.0 2.0 1.0 3.0 17.0 6.0 3.0 1.0 3.0 18.0 8.0 3.0 1.0 3.0 19.0 20.0 3.0 1.0 3.0 20.0 15.0 1.0 1.0 3.0 21.0 12.0 2.0 1.0 3.0 22.0 17.0 4.0 1.0 3.0 23.0 16.0 4.0 1.0 3.0 24.0 27.0 4.0 1.0 3.0 -27.0 28.0 4.0 1.0 3.0

File called stanne.place [datafile.place]

Internal ID User ID Cluster ignore: for simulation only

If first number (internal ID) is negative, this indicates a tagalong – an actor connected to only one other. In this case, the last line should be read as the tagee, tagger, and group. So, actor 28 is connected to only one other actor (27) and is therefore assigned to actor 27’s cluster, which is cluster 4.

There may be Slightly different numeric formatsDepending on the version of KliqueFinder

99

Page 100: KliqueFinder:  Identifying  Clusters in Network Data

Including Cluster Membership in Influence Model

100

SPSS

DATA LIST / intid 1-10 nominee 11-20 cluster 21-30 simx 31-40 extra 41-50.BEGIN DATA 1.0 1.0 1.0 1.0 3.0 2.0 2.0 1.0 1.0 3.0 3.0 3.0 1.0 1.0 3.0 4.0 4.0 2.0 1.0 3.0 5.0 5.0 2.0 1.0 3.0 6.0 6.0 2.0 1.0 3.0END DATA.DATASET NAME clusters WINDOW=FRONT.SORT CASES BY nominee(A).EXECUTE.

MATCH FILES /FILE=yvar1 /FILE='indeg' /FILE=clusters /BY nominee.EXECUTE.

SAS

data clusters;*groups from KLiqueFinder; input intid nominator cluster simx extra;cards; 1.0 1.0 1.0 1.0 3.0 2.0 2.0 1.0 1.0 3.0 3.0 3.0 1.0 1.0 3.0 4.0 4.0 2.0 1.0 3.0 5.0 5.0 2.0 1.0 3.0 6.0 6.0 2.0 1.0 3.0

proc sort data=groups;by nominator;

data withinfl;merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator));by nominator;drop nominee _type_ _freq_;

advanced:run influence model for technologyIdentify clusters from talkt2Include cluster membership the influence model

Page 101: KliqueFinder:  Identifying  Clusters in Network Data

Adding Patches

101

Patch for Two-mode

Patch for one-mode

Page 102: KliqueFinder:  Identifying  Clusters in Network Data

home

Alternative community detection algorithms

• http://cs.stanford.edu/people/jure/pubs/communities-www10.pdf

• http://www.uvm.edu/~pdodds/files/papers/others/2009/lancichinetti2009a.pdf

• http://fatweasel.net/analytics/network-analysis/community-detection-in-networks/

102