KliqueFinder: Identifying Clusters in Network Data

home

KliqueFinder: Identifying Clusters in Network Data

Kenneth A. Frank

Michigan State University

Based on:

• Frank. K.A. 1995. Identifying Cohesive Subgroups. Social Networks (17): 27-56

• Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.

• Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. (2006). "Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events. Social Networks 28:97-123 * co first authors.

• https://www.msu.edu/user/k/e/kenfrank/web/research.htm#representation1

https://www.msu.edu/~kenfrank/papers/identifying%20cohesive%20subgroups.pdf

https://www.msu.edu/~kenfrank/papers/mapping%20interactions.pdf


https://www.msu.edu/user/k/e/kenfrank/web/research.htm

home

Overview

• Clustering and Graphical Representations of Networks

• Running KliqueFinder...

– Step 1) Criteria for Determining Group Membership

– Step 2: Maximizing Criterion

– Step 3) Examine evidence of clusters

– Step 4) Evaluating the Performance of the Algorithm : Did...

• Make Sociogram in Netdraw

• Confidentiality/Ethical issues in Collecting Network Data

• Modifying the Image: Adding Node Data or Relations...

• Two mode

• Software Challenge...

• Batch KliqueFinder

• Prepping Converting data

• A Priori Clusters2

home

Clustering and Graphical Representations of Networksvideo : (26:09-31:41): ID: [email protected] PW:kenfrank2014

Goal: to identify patterns in the network

• Rearrange rows and columns of social network matrix to reveal clustering

• Plot actors and ties in two dimensions to reveal clustering

3

https://ess.cvm.msu.edu/ess/echo/presentation/7ff34abf-ce03-4ca8-9b96-20949e0f05a8

mailto:[email protected]

home

Theory for defining cluster membership

• cohesion (clusters are called subgroups): an actor should be in a cluster if the actor has demonstrated a preference for engaging in ties with members of the cluster.

– Result: ties are concentrated within subgroups• structural equivalence (blocks): an actor should be in a cluster if the

actor engages in a similar pattern of ties as members of that cluster.

– Result: blocks represent positions, but ties not necessarily concentrated within blocks.

4

Crystallized Sociogram: Friendships Among the French Financial Elite

Lines indicate friendships: solid within subgroups, dotted between subgroups.

numbers represent actors

Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration

Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686

5

Crystallized Sociogram: Clusters in Foodwebs

Krause, A., Frank, K.A., Mason, D.M., Ulanowicz, R.E. and Taylor, W.M. (2003). "Compartments exposed in food-web structure." Nature 426:282-285

6

Data Input

7

Old (10 spaces for each) New: flexible columns,

Same results

File name must be less than 20 character. Best if file name is six characters followed by .list: xxxxxx.list . For example stanne.list

Actor 1 interacts with actor 2 at a level of 3Extent of relation can be binary or weighted

Prepping data in excel Prepping Data in UCINETConverting data using sas

ID’s should be 6 digits or less

Data


Best if file name is six characters followed by .list.xxxxxx.listFor example stanne.list

New version of KliqueFinder is more flexible About 10 column widths.

ID’s should be 6 digits or lessPrepping data in excel Prepping Data in UCINETConverting data using sas

Edgelist

First two rows do not appear in the data –I put them there to show the format: 10 spaces for each entry

8

home

Steps for finding clustersvideo: (31:41-43:30): ID: [email protected] PW:kenfrank2014

1) Determine criterion for defining clusters2) Maximize criterion3) Examine evidence of clusters4) Evaluate performance of the algorithm5) Interpret clusters

commonality of attributesfocal experiencessubsequent behavior

9



home

Step 1) Criteria for Determining Group Membership

Structural Equivalence:Factor analyze sociomatrix (Katz & Kahn)iteratively rearrange and revalue rows and

columns (CONCORR -- White el al., 1976)

Cohesionutilize fixed criteria (e.g., must be connected to at

least k others in clusters, or must be minimal path length from k others, etc).

use flexible criterion -- preference relative to group sizes and number of ties:

10

Model Based Cohesion

Wii’=1 if tie between actors i and i’, 0 otherwise

samegroupii’ =1 if actors i and i’ are members of the same subgroup,

0 otherwise.

Then θ1 represents subgroups salience:

So ...... Maximize θ1 (odds ratio)

11

home

Odds Ratio for Association Between Common Subgroup Membership and

The Occurrence of Ties Between Actors

12

home

Step 2: Maximizing Criterion

• 1) find a subgroup seed (3 actors who interact with each other, and with similar others)

• 2) add to the cluster to maximize θ1 until you cannot do any more

• 3) start new subgroup with new seed• 4) shuffle between existing subgroups• 5) make new subgroups as necessary,

dissolve existing ones as necessary.13

KliqueFinder Algorithm: Phase I

Find subgroup seed of 2 or 3

Identify single move that most increases objective function θ1

Does move increase function?

yes

Reassign actor that makes best move

No

If assignment moves actor out of a group of 3, reassign reamaining 2 to next best groups

For finding best subgroup seed: 1) can only choose from unaffiliated actors2) Each actor can only be a seed once

Initialize: assign each actor to own subgroup

Computationally intensive, modify for large networks

home

KliqueFinder Algorithm: Phases II and III

• Phase II: If best move does not increase objective function and there are fewer than 3 actors available for subgroups then– Attach all isolated (or singleton) actors to best

existing subgroups, even if this reduces objective function

• Phase III: shuffle actors between existing subgroups without seeding new ones or disbanding existing ones– Number of subgroups is fixed– This is simple hill climbing and can be cast as EM

algorithm

home

Running KliqueFindervideo :(43:30-1:01:00):

ID: [email protected] PW:kenfrank2014

• Click on “Browse…” button to specify the directory where the data file is located.

• Download KliqueFinder at

–http://hlmsoft.net/wkf/–Follow instructions to install. Put in c:\kliqfind–Mac users: vmware fusion, Windows 7, 32 bit: http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/

16



http://hlmsoft.net/wkf/

http://hlmsoft.net/wkf/

http://store.vmware.com/store/vmware/pd/productID.165310200/Currency.USD/

home

KliqueFinder

• Choose “Basic setup” and then click “Run setup file” button.

17

home

KliqueFinder

• Click on the “Browse” button to choose a data file.

18

Run AnalysisData file

19

New Version of Data Input more Flexible

20

Old (10 spaces for each) New: flexible columns,

Same results

File name must be less than 20 charactersID’s should be 6 digits or less


Prepping data in excel Prepping Data in UCINETConverting data using sas

View Clusters Output

21

N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037|------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......|------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......|------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......|------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|

θ1 =1.1738

Blocked Network Data

22

home

Step 3) Examine evidence of clusters

1) randomly redistribute ties

2) apply algorithm

3) record value of odds ratio and θ1

4) repeat 1000 times to generate distribution

5) use mean of distribution as baseline for comparison

23

home

Randomly Redistributing Ties

24

home

Apply Algorithm to Random Data,

25θ1=.81822

Monte Carlo Sampling Distributionvideo: (1:06:35-1:18:50) ID: [email protected] PW:kenfrank2014

Output in sampdist.dat

θ1=Log odds/2 Odds Ratio

Set up sampling. Remember to do “new data” set up when doneTo prepare for next analysis

Indicate simulate dataData can include weights

26




Code for Reading in Sample Distribution Data

GET DATA /TYPE=TXT /FILE="C:\KLIQFIND\sampdist.dat" /FIXCASE=1 /ARRANGEMENT=FIXED /FIRSTCASE=1 /IMPORTCASE=ALL /VARIABLES= /1 theta1 0-29 F30.10 oddsratio 30-59 F30.10 samplesize 60-89 F30.10.CACHE.EXECUTE.DATASET NAME DataSet9 WINDOW=FRONT.

DATASET ACTIVATE DataSet9.GRAPH /HISTOGRAM=theta1.

spss

title "Sampling distribution for theta1";data one;infile "sampdist.dat" missover;Input theta1 odds1;

proc univariate plot;var theta1;

SAS

27

Stata

*This command imports the data fileimport delimited C:\KLIQFIND\sampdist.dat, delimiter(" ", asstring)

*These commands perform data management:drop v1rename v2 theta1rename v3 oddsratiorename v4 samplesize

*This command plots histogram for theta1:hist theta1,freq

Comparison of Sampling Distributions

28

Distribution of θ1base From Application of the Algorithm to Data Simulated Without Regard for Subgroup Membership

Observed value: 1.17381.1738

29

Sampling Distribution Parameters

Edit simulation parameters.First element is number of replications

30

Must keep # of reps in first 5 columns

home

Approximate p-value Based on Previous Simulations

PREDICTED THETA (1 base) BASED ON SIMULATIONS.

VALUE BASED ON UNWEIGHTED DATA.

0.76985

ESTIMATE OF THETA (1 subgroup processes)

0.40397 (total-predicted=evidence of groups): 1.1738-.76985=.40397

THE TOTAL THETA1 IS:

1.1738

APPROXIMATE TEST OF CONCENTRATION OF TIES

WITHIN SUBGROUPS BASED ON

SIZE OF THETA1 subgroup processes:

THETA1 |

SUBGROUP | APPROX | APPROX

PROCESSES| LRT | P-VALUE

0.40 34.82 0.00

Reject null hypotheses of no clusters:H0:Θ1 subgroup processes =0

31

home

Step 4) Evaluating the Performance of the Algorithm : Did the Algorithm Recover the

Correct Subgroups?

• Many algorithms search for optimal subgroups. KliqueFinder does not, but how different are the subgroups it finds from the optimal or known subgroups?

32

Output for Recovery of SubgroupsPREDICTED ACCURACY: LOG ODDS OF COMMON SUBGROUPMEMBERSHIP, + OR - .5734 (FOR A 95% CI)

1.4989

The Log odds applies to the following table:

OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | A | B |KNOWN | | |SUBGROUP |--------|--------| | | | SAME | C | D | | | | -------------------

THE LOGODDS TRANSLATES TO AN ODDS RATIO OF

4.4766

WHICH INDICATES THE INCREASE IN THE ODDSTHAT KLIQUEFINDER WILL ASSIGN TWO ACTORS TOTHE SAME SUBGROUP IF THEY ARE TRULY IN THE IN THE SAME SUBGROUP.

33

Specific accuracy for a given data set not known, results predicted from thousands of simulations – see next slide

Odds of Recovery (Toy Example)1 2 3 4 5 6

1 1 1 0 1 0

2 1 0 0 0 0

3 1 1 0 0 1

4 0 1 1 1 1

5 0 0 0 0 1

6 1 0 0 1 1 1

Simulated data with known subgroups

1 2 3 4 5 6

1 1 1 0 1 0

2 1 0 0 0 0

3 1 1 0 0 1

4 0 1 1 1 1

5 0 0 0 0 1

6 1 0 0 1 1

OBSERVED SUBGROUP DIFFERENT SAME ___________________ | | | DIFFERENT | | |KNOWN | A (6)| B (3)|SUBGROUP |--------|--------| | | | SAME | | | | C (2)| D (4)| -------------------

Observed subgroups identified by KliqueFinder

Missassignment of actor 4 contributes 3 to cell B and 2 to cell C

Cell D: 4 pairs correctly assigned to same subgroup:(1,2; 1,3; 2,3; 5,6)

Cell A: 6 pairs correctly assigned to different subgroups:1,5; 2,5; 3,5; 1,6; 2,6; 3,6

Odds of recovery =(AD)/(BC)= 6x4/(3x2)=4.00

Make Sociogram in Netdrawvideo : (1:01:00-1:06:22):

ID: [email protected] PW:kenfrank2014

35



Sometimes Netdraw can’t find fileretrieve manually

36

Modifying Image in Netdraw

37

38

home 39

N Group And Actor Id 24 |AAAA|BBBBBB|CCCCCCCC|DDDDDD| | | | | | | 2 1|221 1| 11 2|111122| Group ID|7445|612214|98133560|796037|------------+----+------+--------+------+ 1 A 7|A213|......|........|...1..| 1 A 24|4A3.|......|.4......|......| 1 A 4|33A.|......|........|......| 1 A 15|433A|......|........|......|------------+----+------+--------+------+ 2 B 26|.2..|B443..|........|......| 2 B 21|.1..|4B....|...4....|....2.| 2 B 12|....|4.B...|........|......| 2 B 2|....|33.B..|........|...1..| 2 B 1|..3.|3..3B.|........|.3..2.| 2 B 14|....|....1B|........|......|------------+----+------+--------+------+ 3 C 9|....|......|C...3.33|.3....| 3 C 8|.4..|..4...|.C.4..4.|4.....| 3 C 11|....|......|33C.4.3.|..4...| 3 C 13|.4..|.4....|444C....|......| 3 C 3|3...|.4....|4.44C...|......| 3 C 5|.1..|.....4|3.2.3C..|......| 3 C 6|....|......|444..4C4|......| 3 C 20|....|......|3..3.44C|......|------------+----+------+--------+------+ 4 D 17|.1..|......|.1......|D.1...| 4 D 19|....|......|4.3.....|3D4...| 4 D 16|....|......|4..4...4|44D...| 4 D 10|..3.|...1..|........|...D3.| 4 D 23|....|.3....|........|.343D.| 4 D 27|.1..|.1....|........|.3..3D|

Density = 4/(4x8)=1/8Kliqfinder uses Density =4/(4x5)=.20 because maximum number of nominations is 5

Data used for multidimensional Scaling within subgroups. Distance=maximum value/cell entrye.g., maximum value is 4, So a tie of 2 4/2=2, distance of 2

DIRECT ASSOCIATIONS GROUP 1 2 3 4 LABEL A B C D N 4 6 8 6 GROUP 1 2.42 0.00 0.20 0.05 2 0.25 1.07 0.13 0.27 3 0.38 0.40 2.40 0.28 4 0.21 0.17 0.67 1.17

In xxxxxx.clusters

Distance in multidimensionalScaling between subgroups=maximum value /density

cohesion Structural similarity

video: (1:19:15-1:23:40)) ID: [email protected] PW:kenfrank201440

Frank, K. 1996. Mapping interactions within and between cohesive subgroups. Social Networks 18: 93-119.






Choosing lines: Groups

41

home

Confidentiality/Ethical issues in Collecting Network Data

• Need names on survey

• Data can be confidential but not anonymous (especially for longitudinal)

• R.L. Breiger, “Ethical Dilemmas in Social Network Research: Introduction to Special Issue.” Social Networks 27 / 2 (2005): 89 – 93. Read it online. http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf

– (All issues of social networks available via science direct)

• Who benefits from network analysis? Who bears the cost?

– Kadushin, Charles “Who benefits from network analysis: ethics of social network research” Social Networks 27 / 2 (2005): Pages 139-153.

• Issues to raise when dealing with Human Subjects Board:

– Klovdahl, Alden S. Social network research and human subjects protection: Towards more effective infectious disease control Pages 119-137

• Hint on Human Subjects boards: they like precedents. Once you have one network study accepted, refer to it when submitting others!

• https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm

42

video : (1:23:41-1:28)ID: [email protected] PW:kenfrank2014

http://www.u.arizona.edu/~breiger/2005BreigerIntroEthics.pdf

https://www.msu.edu/~kenfrank/social%20network/irb%20with%20network%20data.htm



home

The SRI/KLiqueFinder Solution to confidentiality: aggregate to subgroups

1) Provide information about who is in which cluster as well as information regarding the resources embedded in each cluster. Resources could be information, expertise, material resources, etc.

Benefit: reveals location of resources relative to social; structureProtection: does not reveal specific responses because all information is at the

cluster level.

2) Provide locations from in a sociogram unique for each respondent, indicating where that person is located (“you are here”). But figure does not include the lines from a sociogram, so respondents cannot infer others’ responses.

Benefit: Respondents then use this as a guide to individual behavior for identifying further resources or information.Protection: Specific responses of others not revealed, so confidentiality preserved.

43

home 44

Can even include names of actors

Using subgroups for feedback to respondents and in a proposal

https://www.msu.edu/~kenfrank/Network%20Mapping%20intervention.docx

Choosing Lines: Actor Level Within

45

Choosing Lines: Actor Level

Remove group nodes

46

Choosing Lines: Actor Level Between

47

Choosing Lines: Group Level

48

Modifying the Image: Adding Node Data or Relations

video : ID: [email protected] PW:kenfrank2014 : (1:49:35-2:07:48)

http://www.analytictech.com/ucinet/download.htm

http://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CB0QFjAA&url=http%3A%2F%2Fwww.analytictech.com%2FNetdraw%2FNetdrawGuide.doc&ei=6pC4Tp29Men3sQLv99WoCA&usg=AFQjCNHg_NTjlHOclmeJkwQs2xRaiPYgXQ&sig2=WLwXKSjJq_Yinpfkwv0m4w

http://faculty.ucr.edu/~hanneman/nettext/C4_netdraw.html#data

49



Files for KliqueFinder

50

xxxxxx.listInput data xxxxxx.ilabel xxxxxx.xnet

Node dataNetwork data Alternative network data

Kliqfind.parPrintoSimulate.par

Parameters

KliqueFinder

Output xxxxxx.clusters

Diagnosticsand matrix formatteddata

xxxxxx.vna

for Netdraw

xxxxxx.placeData containing actor ID’s and subgroup placement

Modifying node data by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace

*node dataid type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 *Node propertiesID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE

*Tie datafrom to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1*Tie propertiesFROM TO color size headcolor headsize active"0A " "0B " 12632256 1 12632256 0 TRUE"0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE

Add new node variable here (e.g. gender)then add data

51

Adding Node Attributes with Extra FileKliqueFinder will put attributes into vna file

52

File=xxxxxx.ilabel where xxxxxx is the first 6 characters of your data filexxxxxx.Ilabelxxxxxx.list

Cut and paste into stanne.Ilabel

stanne.list

1 Jacob 1 3 5 2 Stan 1 2 5 3 Linton 1 2 5 4 Charles 1 3 3 5 Mark 1 3 3 6 Tom 2 3 3 7 Ronald 2 3 5 8 Nan 2 1 3 9 Elizabeth 2 1 4 10 Barry 2 2 3 11 Martin 2 3 1 12 Steve 2 3 1 13 PeterC 2 1 5 14 Patrick 1 1 1 15 Katy 1 1 3 16 Kathleen 3 3 3 17 Ove 2 2 2 18 JamesC 5 5 5 19 Robert 4 4 4 20 JamesM 1 2 3 4 21 Noah 4 3 2 1 22 Marijtje 1 2 1 2 23 Ronald 2 1 2 1 24 Harrison 3 1 3 1 25 Duncan 4 1 4 1

10 columns for ID; Skip a space; Name; Node attribute 1-5

53

54

Interactive: adding node data

or

55

56

Include Node Data in Image

57

Modifying Links

Lines indicate friendships: solid within subgroups, dotted between subgroups.

numbers represent actors

Rgt,Cen,Soc,Non = political parties; B=Banker, T=treasury; E=Ecole National D’administration

Frank, K.A. & Yasumoto, J. (1998). "Linking Action to Social Structure within a System: Social Capital Within and Between Subgroups." American Journal of Sociology, Volume 104, No 3, pages 642-686

58

Hostile Actions

59

Supportive Actions

60

A

B

C

D

E

Within Subgrop Scale Expanded by a Factor of 9 ID/Grade Level; - within subgroups, ... between subgroups.

-45

-35

-25

-15

-5

5

15

25

35

-25 -15 -5 5 15 25

• Each number is a teacher• G_ indicates grade in which teacher teaches• Lines connecting two numbers indicate teachers who are close colleaguesSolid lines within subgroups, dashed between• Circles indicate cohesive subgroups

61

home

Ripple Plot

• Overlay talk about technology on social geography of crystallized sociogram

• Lines indicate talk about technology

• Size of dot indicates teacher’s use of technology at time 1

• Ripples indicate increase in use from time 1 to time 2

62

Frank, K. A. and Zhao, Y. (2005). "Subgroups as a Meso-Level Entity in the Social Organization of Schools." Chapter 10, pages 279-318. Book honoring Charles Bidwell's retirement, edited by Larry Hedges and Barbara Schneider. New York: Sage publications.

63

Modifying Links by Editing [datafile].vna: File is read by netdraw. Copy relevant data into excel, edit, and replace

*node dataid type group gender "0A " 2 1 0 "0B " 2 2 0 "0C " 2 3 0 "0D " 2 4 0 1 1 2 1 2 1 2 2 *Node propertiesID x y color shape size shortlabel active "0A " -2.01889 -15.04530 16777215 1 30 A TRUE "0B " -9.41864 15.75047 16777215 1 85 B TRUE "0C " 2.06574 2.09162 16777215 1 52 C TRUE "0D " 8.54812 10.10988 16777215 1 79 D TRUE 1 -10.52314 14.16442 16711680 1 10 1 TRUE 2 -8.29999 13.27802 16711680 1 10 2 TRUE

*Tie datafrom to any strength actor group between within technology 1 2 1 3 1 0 0 1 4 1 3 1 0 1 1 19 1 3 1 0 1 1 23 1 2 1 0 1 1 26 1 3 1 0 0 2 26 1 3 1 0 0 2 10 1 1 1 0 1*Tie propertiesFROM TO color size headcolor headsize active"0A " "0B " 12632256 1 12632256 0 TRUE"0A " "0C " 12632256 9 12632256 0 TRUE 1 2 0 3 0 8 TRUE 1 4 12632256 3 0 8 TRUE

Add new node variable here (e.g. gender)then add data

Add new relation here (e.g. technology)then add data

64

Modifying Links with Extra FileKliqueFinder will put attributes into vna file

65

File=xxxxxx.xnet where xxxxxx is the first 6 characters of your data filexxxxxx.xnetxxxxxx.list

stanne.xnet

stanne.list 1 2 4 19 15 3 22 26 1

Nominator nominee strength of tie

File containing extra network

66

Modifying Links: Interactive – Finicky

67

Interactive Modifying Links

68

Two mode*Field, S. *Frank, K.A., Schiller, K, Riegle-Crumb, C, and Muller, C. 2006. “Identifying Social Contexts in Affiliation Networks: Preserving the Duality of People and Events.

Social Networks 28:97-123. * co first authors.

1

2

Data source

video : ID: [email protected] PW:kenfrank2014:(1:39:25-1:49:35)69

Copy homact.list from c:\kliqfind/setups to c:\kliqfind

70

Two-mode Data

Actor 1 participates in event 19 at a level of 1Extent of relation can be binary or weighted

Edgelist

First two rows do not appear in the data –I put them there to show the format: 10 spaces for each entry

71

New version of KliqueFinder is more flexible About 10 column widths.

ID’s should be 6 digits or lessPrepping data in excel Prepping Data in UCINETConverting data using sas

Two mode Clusters output

72

Blocked Two-Mode Blocked Network Data

73

Two-mode Crystallized Sociogram

74

home

Centralization & Centrality in KliqueFinder• KliqueFinder produces a measure of Warp.

• Starts with distances defined by– Maximum value in network / observed value

• E.g. maximum is 4 and a particular tie is 1, then distance is 4/1=4.

– These are the distances used in the MDS to produce the sociograms (see “running KliqueFinder ppt”)

• Obtains eigen values – within each cluster based on raw data within cluster– Between clusters based on 1/density of ties between clusters

• Density=average value in a given block

• Warp =sum of positive eigen values/sum of all eigen values– Note it does not use the square root of the eigen values (variances are more

additive)• Output into xxxxxx.bcord (9th element) and into netdraw as node

attribute for groups, called “centrality”• Centrality for individuals is distance to the center of their subgroup

(radius).

75

Running on a Large Data File (more than 1000 actors)

76

If you start the program and it just sits there, it is looking for the best seed for the first subgroup. Seed is 3 actors, but it looks for all combinations of 3 that share common ties in network. Intensive, and unnecessary for large data (1st subgroup does not matter so much). To shortcut: change value from 12. save & run.

home

Software Challenge video : ID: [email protected] PW:kenfrank2014 :(2:07:57-2:08:15)

• Analyze nonpr1.list– Evidence of clusters?– Performance of algorithm?

• Replace lines with nonpr2

• Describe the KliqueFinder algorithm

77

home

KliqueFinder Applications:Adding Individual Attributes in

SAS:run KliqueFinder

data file collt1.list

make graphuse ID from other file? Yes:

sas file name: c:\kliqfind\indiv [be sure to include full path]

id variable: nominatorstring variable: gradelevSave

In sas, run socgramz in the working directory

78

home

KliqueFinder Applications:Adding Individual Attributes:

• Select “Yes” for “User ID (character) from other SAS file?”

79

home

KliqueFinder Applications:Adding Individual Attributes:

• Type the following information in the corresponding boxes

• Then Click “Save”80

Choosing an ID Variable

81

home

With ID based on Grade

82

home

KliqueFinder Applications:Replacing Lines

run KliqueFinder

data file collt1.list

make graph

save

retrieve socgramz.sas in the working directory

replace all occurrences of collt1.list with collt2.list

run

83

Opening socgramz.sas

84

Changing lines

85

Change lines to different source

86

New Lines based on Collt2

87

Batch KliqueFinder

88

home

Basics

• Program runs KliqueFinder on multiple files• Input

– List of filenames– Files containing data– BACK UP YOUR DATA FIRST!

• Output– Clustering output (.place, .clusters, vna) for

each list file

89

File containing names of data files: testb.txt

Data file: stanne.listData file: ffe.list

Files

90

BACK UP YOUR DATA FIRST!

KliqueFinder

• Browse to directory you want to work in

• Choose “Basic setup” and then click “Run setup file” button.

91

Running Batch Mode

92

File with names of data files

Click here to run as batch

BACK UP DATA FILES BEFORE RUNNING!

Name your file xxxxxx.liste.g., test01.list

Right click

Choose Formatted text (space delimited)

93

Prepping data in excelvideo : ID: [email protected] PW:kenfrank2014 :Time: (1:28-1:39)

Prepping Data in UCINET

Navigate to where you want to save:c:\kliqfind

Navigate to UCINET data

94

Must remove “!” from file. There may be several

!’s points are there because of Multiple data sets

95

Converting data using sasvideo : ID: [email protected] PW:kenfrank2014 : :

Time: (2:10:43-2:19)

data one;infile "badform.list";input chooser chosen wt;

data two; set one;file "ready1.list";if wt ne . then put (chooser chosen wt) (10.);run;

96

A Priori ClustersA line with 99999 in the data file indicates in which a priori cluster an actor is placed. For example, actor 1 is in a priori cluster 3.Run repeat2 setup, and then proceed as usual.

Remember to do “new data” setup when done. KliqueFinder will make pictures based on a priori clusters

97

Comparison of A Priori Clusters and Identified Solution

Data with a priori cluster assignmentsRun as new data

Run as usual then look at cluster output

SIMILARITY BETWEEN THE START AND END GROUPS: ACTUAL POSS STANDARDIZED 52. 88. 9.55565

QAP standardized measure, compare with normal distribution

98

Data Containing Cluster Assignments

1.0 1.0 2.0 1.0 3.0 2.0 2.0 2.0 1.0 3.0 3.0 4.0 1.0 1.0 3.0 4.0 19.0 4.0 1.0 3.0 5.0 23.0 4.0 1.0 3.0 6.0 26.0 2.0 1.0 3.0 17.0 6.0 3.0 1.0 3.0 18.0 8.0 3.0 1.0 3.0 19.0 20.0 3.0 1.0 3.0 20.0 15.0 1.0 1.0 3.0 21.0 12.0 2.0 1.0 3.0 22.0 17.0 4.0 1.0 3.0 23.0 16.0 4.0 1.0 3.0 24.0 27.0 4.0 1.0 3.0 -27.0 28.0 4.0 1.0 3.0

File called stanne.place [datafile.place]

Internal ID User ID Cluster ignore: for simulation only

If first number (internal ID) is negative, this indicates a tagalong – an actor connected to only one other. In this case, the last line should be read as the tagee, tagger, and group. So, actor 28 is connected to only one other actor (27) and is therefore assigned to actor 27’s cluster, which is cluster 4.

There may be Slightly different numeric formatsDepending on the version of KliqueFinder

99

Including Cluster Membership in Influence Model

100

SPSS

DATA LIST / intid 1-10 nominee 11-20 cluster 21-30 simx 31-40 extra 41-50.BEGIN DATA 1.0 1.0 1.0 1.0 3.0 2.0 2.0 1.0 1.0 3.0 3.0 3.0 1.0 1.0 3.0 4.0 4.0 2.0 1.0 3.0 5.0 5.0 2.0 1.0 3.0 6.0 6.0 2.0 1.0 3.0END DATA.DATASET NAME clusters WINDOW=FRONT.SORT CASES BY nominee(A).EXECUTE.

MATCH FILES /FILE=yvar1 /FILE='indeg' /FILE=clusters /BY nominee.EXECUTE.

SAS

data clusters;*groups from KLiqueFinder; input intid nominator cluster simx extra;cards; 1.0 1.0 1.0 1.0 3.0 2.0 2.0 1.0 1.0 3.0 3.0 3.0 1.0 1.0 3.0 4.0 4.0 2.0 1.0 3.0 5.0 5.0 2.0 1.0 3.0 6.0 6.0 2.0 1.0 3.0

proc sort data=groups;by nominator;

data withinfl;merge yvar2 yvar1 infl expanse cluster attract(rename=(nominee=nominator));by nominator;drop nominee _type_ _freq_;

advanced:run influence model for technologyIdentify clusters from talkt2Include cluster membership the influence model

Adding Patches

101

Patch for Two-mode

Patch for one-mode

home

Alternative community detection algorithms

• http://cs.stanford.edu/people/jure/pubs/communities-www10.pdf

• http://www.uvm.edu/~pdodds/files/papers/others/2009/lancichinetti2009a.pdf

• http://fatweasel.net/analytics/network-analysis/community-detection-in-networks/

102

Documents

KliqueFinder: Identifying Clusters in Network Data