Upload
pramod-gowda
View
214
Download
0
Embed Size (px)
Citation preview
7/28/2019 Discov
1/142
Advanced Quantitative Research Methodology, LectureNotes: Text Analysis II: Unsupervised Learning via
Cluster Analysis1
Gary Kinghttp://GKing.Harvard.Edu
December 23, 2011
1 Copyright 2010 Gary King, All Rights Reserved.
Gary King http://GKing.Harvard.Edu () Advanced Quantitative Research Methodology, Lecture Notes: Text Analysis II: Unsupervis
December 23, 2011 1 / 23
http://find/7/28/2019 Discov
2/142
Reading
Justin Grimmer and Gary King. 2010. Quantitative Discovery of
Qualitative Information: A General Purpose Document ClusteringMethodologyhttp://gking.harvard.edu/files/abs/discov-abs.shtml .
Gary King (Harvard, IQSS) Quantitative Discovery from Text 2 / 23
http://gking.harvard.edu/files/abs/discov-abs.shtmlhttp://gking.harvard.edu/files/abs/discov-abs.shtmlhttp://find/7/28/2019 Discov
3/142
7/28/2019 Discov
4/142
7/28/2019 Discov
5/142
7/28/2019 Discov
6/142
7/28/2019 Discov
7/142
The Problem: Discovery from Unstructured Text
Examples: scholarly literature, news stories, medical information, blog
posts, comments, product reviews, emails, social media updates,audio-to-text summaries, speeches, press releases, legal decisions, etc.
10 minutes of worldwide email = 1 LOC equivalent
An essential part of discovery is classification: one of the most
central and generic of all our conceptual exercises. . . . the foundationnot only for conceptualization, language, and speech, but also formathematics, statistics, and data analysis. . . . Without classification,there could be no advanced conceptualization, reasoning, language,data analysis or, for that matter, social science research. (Bailey,
1994).We focus on cluster analysis: discovery through (1) classification and(2) simultaneously inventing a classification scheme
(We analyze text; our methods apply more generally)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 3 / 23
http://find/7/28/2019 Discov
8/142
7/28/2019 Discov
9/142
Why Johnny Cant Classify (Optimally)
Bell(n) = number of ways of partitioning n objects
Bell(2) = 2 (AB, A B)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 4 / 23
http://find/7/28/2019 Discov
10/142
Why Johnny Cant Classify (Optimally)
Bell(n) = number of ways of partitioning n objects
Bell(2) = 2 (AB, A B)
Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 4 / 23
http://find/7/28/2019 Discov
11/142
Why Johnny Cant Classify (Optimally)
Bell(n) = number of ways of partitioning n objects
Bell(2) = 2 (AB, A B)
Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C)
Bell(5) = 52
Gary King (Harvard, IQSS) Quantitative Discovery from Text 4 / 23
http://find/7/28/2019 Discov
12/142
Why Johnny Cant Classify (Optimally)
Bell(n) = number of ways of partitioning n objects
Bell(2) = 2 (AB, A B)
Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C)
Bell(5) = 52
Bell(100)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 4 / 23
http://find/7/28/2019 Discov
13/142
7/28/2019 Discov
14/142
Why Johnny Cant Classify (Optimally)
Bell(n) = number of ways of partitioning n objects
Bell(2) = 2 (AB, A B)
Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C)
Bell(5) = 52
Bell(100) 1028 Number of elementary particles in the universe
Now imagine choosing the optimal classification scheme by hand!
Gary King (Harvard, IQSS) Quantitative Discovery from Text 4 / 23
http://find/7/28/2019 Discov
15/142
Why Johnny Cant Classify (Optimally)
Bell(n) = number of ways of partitioning n objects
Bell(2) = 2 (AB, A B)
Bell(3) = 5 (ABC, AB C, A BC, AC B, A B C)
Bell(5) = 52
Bell(100) 1028 Number of elementary particles in the universe
Now imagine choosing the optimal classification scheme by hand!
That we think of all this as astonishing . . . is astonishing
Gary King (Harvard, IQSS) Quantitative Discovery from Text 4 / 23
http://find/7/28/2019 Discov
16/142
Wh HAL C Cl if Ei h
7/28/2019 Discov
17/142
Why HAL Cant Classify Either
The Goal an optimal application-independent cluster analysismethod is mathematically impossible:
Gary King (Harvard, IQSS) Quantitative Discovery from Text 5 / 23
http://find/7/28/2019 Discov
18/142
Wh HAL C Cl if Ei h
7/28/2019 Discov
19/142
Why HAL Cant Classify Either
The Goal an optimal application-independent cluster analysismethod is mathematically impossible:
No free lunch theorem: every possible clustering method performsequally well on average over all possible substantive applications
Existing methods:
Gary King (Harvard, IQSS) Quantitative Discovery from Text 5 / 23
http://find/7/28/2019 Discov
20/142
Wh HAL C t Cl if Eith
7/28/2019 Discov
21/142
Why HAL Can t Classify Either
The Goal an optimal application-independent cluster analysismethod is mathematically impossible:
No free lunch theorem: every possible clustering method performsequally well on average over all possible substantive applications
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graph-based, fuzzy k-modes, affinity propogation, self-organizing maps,. . .Well-defined statistical, data analytic, or machine learning foundations
Gary King (Harvard, IQSS) Quantitative Discovery from Text 5 / 23
http://find/7/28/2019 Discov
22/142
Why HAL Cant Classify Either
7/28/2019 Discov
23/142
Why HAL Can t Classify Either
The Goal an optimal application-independent cluster analysismethod is mathematically impossible:
No free lunch theorem: every possible clustering method performsequally well on average over all possible substantive applications
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graph-based, fuzzy k-modes, affinity propogation, self-organizing maps,. . .Well-defined statistical, data analytic, or machine learning foundationsHow to add substantive knowledge: With few exceptions, who knows?!
Gary King (Harvard, IQSS) Quantitative Discovery from Text 5 / 23
Why HAL Cant Classify Either
http://find/7/28/2019 Discov
24/142
Why HAL Can t Classify Either
The Goal an optimal application-independent cluster analysismethod is mathematically impossible:
No free lunch theorem: every possible clustering method performsequally well on average over all possible substantive applications
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graph-based, fuzzy k-modes, affinity propogation, self-organizing maps,. . .Well-defined statistical, data analytic, or machine learning foundationsHow to add substantive knowledge: With few exceptions, who knows?!The literature: little guidance on when methods apply
Gary King (Harvard, IQSS) Quantitative Discovery from Text 5 / 23
Why HAL Cant Classify Either
http://find/7/28/2019 Discov
25/142
Why HAL Can t Classify Either
The Goal an optimal application-independent cluster analysismethod is mathematically impossible:
No free lunch theorem: every possible clustering method performsequally well on average over all possible substantive applications
Existing methods:
Many choices: model-based, subspace, spectral, grid-based, graph-based, fuzzy k-modes, affinity propogation, self-organizing maps,. . .Well-defined statistical, data analytic, or machine learning foundationsHow to add substantive knowledge: With few exceptions, who knows?!The literature: little guidance on when methods apply
Deep problem in cluster analysis literature: no way to know whichmethod will work ex ante
Gary King (Harvard, IQSS) Quantitative Discovery from Text 5 / 23
If Ex Ante doesnt work try Ex Post
http://find/7/28/2019 Discov
26/142
If Ex Ante doesn t work, try Ex Post
Gary King (Harvard, IQSS) Quantitative Discovery from Text 6 / 23
If Ex Ante doesnt work try Ex Post
http://find/7/28/2019 Discov
27/142
If Ex Ante doesn t work, try Ex Post
Methods and substance must be connected (no free lunch theorem)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 6 / 23
If Ex Ante doesnt work try Ex Post
http://find/7/28/2019 Discov
28/142
If Ex Ante doesn t work, try Ex Post
Methods and substance must be connected (no free lunch theorem)
The usual approach fails: hard to do it by understanding the model
Gary King (Harvard, IQSS) Quantitative Discovery from Text 6 / 23
http://find/7/28/2019 Discov
29/142
If Ex Ante doesnt work, try Ex Post
7/28/2019 Discov
30/142
If Ex Ante doesn t work, try Ex Post
Methods and substance must be connected (no free lunch theorem)
The usual approach fails: hard to do it by understanding the model
We do it ex post (by qualitative choice). For example:
Create long list of clusterings; choose the best
Gary King (Harvard, IQSS) Quantitative Discovery from Text 6 / 23
http://find/7/28/2019 Discov
31/142
If Ex Ante doesnt work, try Ex Post
7/28/2019 Discov
32/142
, y
Methods and substance must be connected (no free lunch theorem)
The usual approach fails: hard to do it by understanding the model
We do it ex post (by qualitative choice). For example:
Create long list of clusterings; choose the bestToo hard for mere humans!An organized list will make the search possible
Gary King (Harvard, IQSS) Quantitative Discovery from Text 6 / 23
If Ex Ante doesnt work, try Ex Post
http://find/7/28/2019 Discov
33/142
, y
Methods and substance must be connected (no free lunch theorem)
The usual approach fails: hard to do it by understanding the model
We do it ex post (by qualitative choice). For example:
Create long list of clusterings; choose the bestToo hard for mere humans!An organized list will make the search possibleE.g.,: consider two clusterings that differ only because one document(of many) moves from category 5 to 6
Gary King (Harvard, IQSS) Quantitative Discovery from Text 6 / 23
Our Idea: Meaning Through Geography
http://find/7/28/2019 Discov
34/142
g g g p y
Gary King (Harvard, IQSS) Quantitative Discovery from Text 7 / 23
Our Idea: Meaning Through Geography
http://find/7/28/2019 Discov
35/142
g g g p y
Gary King (Harvard, IQSS) Quantitative Discovery from Text 7 / 23
Our Idea: Meaning Through Geography
http://find/7/28/2019 Discov
36/142
Gary King (Harvard, IQSS) Quantitative Discovery from Text 7 / 23
Our Idea: Meaning Through Geography
http://find/7/28/2019 Discov
37/142
We develop a (conceptual) geography of clusterings
Gary King (Harvard, IQSS) Quantitative Discovery from Text 7 / 23
A New Strategy
http://find/7/28/2019 Discov
38/142
Make it easy to choose best clustering from millions of choices
Gary King (Harvard, IQSS) Quantitative Discovery from Text 8 / 23
A New Strategy
http://find/7/28/2019 Discov
39/142
Make it easy to choose best clustering from millions of choices
1 Code text as numbers (in one or more of several ways)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 8 / 23
A New Strategy
http://find/http://goback/7/28/2019 Discov
40/142
Make it easy to choose best clustering from millions of choices
1 Code text as numbers (in one or more of several ways)2 Apply all clustering methods we can find to the data each
representing different (unstated) substantive assumptions (
7/28/2019 Discov
41/142
Make it easy to choose best clustering from millions of choices
1 Code text as numbers (in one or more of several ways)2 Apply all clustering methods we can find to the data each
representing different (unstated) substantive assumptions (
7/28/2019 Discov
42/142
Make it easy to choose best clustering from millions of choices
1 Code text as numbers (in one or more of several ways)2 Apply all clustering methods we can find to the data each
representing different (unstated) substantive assumptions (
7/28/2019 Discov
43/142
Make it easy to choose best clustering from millions of choices
1 Code text as numbers (in one or more of several ways)2 Apply all clustering methods we can find to the data each
representing different (unstated) substantive assumptions (
7/28/2019 Discov
44/142
Make it easy to choose best clustering from millions of choices
1
Code text as numbers (in one or more of several ways)2 Apply all clustering methods we can find to the data each
representing different (unstated) substantive assumptions (
7/28/2019 Discov
45/142
Make it easy to choose best clustering from millions of choices
1
Code text as numbers (in one or more of several ways)2 Apply all clustering methods we can find to the data each
representing different (unstated) substantive assumptions (
7/28/2019 Discov
46/142
You choose one (or more), based on insight, discovery, useful information,. . .
Space of Cluster Solutions
biclust_spectral
clust_convex
mult_dirproc
dismea
rock
som
spec_cos spec_eucspec_man
spec_mink
spec_max
spec_canb
mspec_cos
mspec_euc
mspec_man
mspec_mink
mspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.costs
affprop maximum
divisive stand.euc
divisive euclidean
divisive manhattan
kmedoids stand.euc
kmedoids euclidean
kmedoids manhattan
mixvmf
mixvmfVA
kmeans euclidean
kmeans maxi
kmeans manhattan
kmeans canberra
eans binary
kmeans pearson
kmeans correlation
kmeans spearman
kmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean average
hclust euclidean mcquitty
hclust euclidean median
hclust euclidean centroidhclust maximum w
hclust maximum single
hclust maximum completehclust maximum averagehclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan averagehclust manhattan mcquitty
hclust manhattan median
hclust manhattan centroid
hclust canberra ward
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquittyhclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary single
hclust binary complete
hclust binary average
hclust binary mcquitty
hclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson complete
hclust pearson averagehclust pearson mcquitty
hclust pearson medianhclust pearson centroid
hclust correlation ward
hclust correlation single
hclust correlation complete
hclust correlation averagehclust correlation mcquitty
hclust correlation medianhclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman average
hclust spearman mcquitty
hclust spearman median
hclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall average
hclust kendall mcquitty
hclust kendall median
hclust kendall centroid
q
q
Cluster Solution 1Cluster Solution 2
Gary King (Harvard, IQSS) Quantitative Discovery from Text 9 / 23
Application-Independent Distance Metric: Axioms
http://find/7/28/2019 Discov
47/142
Gary King (Harvard, IQSS) Quantitative Discovery from Text 10 / 23
Application-Independent Distance Metric: Axioms
http://find/7/28/2019 Discov
48/142
Metric based on 3 assumptions
Gary King (Harvard, IQSS) Quantitative Discovery from Text 10 / 23
Application-Independent Distance Metric: Axioms
http://find/7/28/2019 Discov
49/142
Metric based on 3 assumptions1 Distance between clusterings: a function of the pairwise document
agreements (pairwise agreements triples, quadruples, etc.)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 10 / 23
Application-Independent Distance Metric: Axioms
http://find/7/28/2019 Discov
50/142
Metric based on 3 assumptions1 Distance between clusterings: a function of the pairwise document
agreements (pairwise agreements triples, quadruples, etc.)2 Invariance: Distance is invariant to the number of documents (for any
fixed number of clusters)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 10 / 23
Application-Independent Distance Metric: Axioms
http://find/7/28/2019 Discov
51/142
Metric based on 3 assumptions1 Distance between clusterings: a function of the pairwise document
agreements (pairwise agreements triples, quadruples, etc.)2 Invariance: Distance is invariant to the number of documents (for any
fixed number of clusters)3 Scale: the maximum distance is set to log(num clusters)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 10 / 23
Application-Independent Distance Metric: Axioms
http://find/http://goback/7/28/2019 Discov
52/142
Metric based on 3 assumptions1 Distance between clusterings: a function of the pairwise document
agreements (pairwise agreements triples, quadruples, etc.)2 Invariance: Distance is invariant to the number of documents (for any
fixed number of clusters)3 Scale: the maximum distance is set to log(num clusters)
Only one measure satisfies all three (the variation ofinformation)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 10 / 23
Application-Independent Distance Metric: Axioms
http://goforward/http://find/http://goback/7/28/2019 Discov
53/142
Metric based on 3 assumptions1 Distance between clusterings: a function of the pairwise document
agreements (pairwise agreements triples, quadruples, etc.)2 Invariance: Distance is invariant to the number of documents (for any
fixed number of clusters)3 Scale: the maximum distance is set to log(num clusters)
Only one measure satisfies all three (the variation ofinformation)
Meila (2007): derives same metric using different axioms (lattice
theory)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 10 / 23
The FuTure oF PoliTical Science
http://find/7/28/2019 Discov
54/142
Available March 2009: 304pp
Pb: 978-0-415-99701-0: $24.95
www.routledge.com/politics
The list of authors in The Future of Political Scienceis a 'whoswho' of political science. As I was reading it, I came to think of itas a platter of tasty hors doeuvres. It hooked me thoroughly.
Peter Kingstone, University of Connecticut
In this one-of-a-kind collection, an eclectic set of contributorsoffer short but forceful forecasts about the future of the
discipline. The resulting assortment is captivating, consistentlythought-provoking, often intriguing, and sure to spur discussionand debate.
Wendy K. Tam Cho, University of Illinois at Urbana-Champaign
King, Schlozman, and Nie have created a visionary andstimulating volume. The organization of the essays strikes me asnothing less than brilliant. . . It is truly a joy to read.
Lawrence C. Dodd, Manning J. Dauer Eminent Scholar in Political Science,
University of Florida
The FuTure oF PoliTical Science100 Perspectiveseditd by Gary King, harvard univrsity, Kay Lehman Schlozman, Boston collg
and Norman H. Nie, Stanford univrsity
Gary King (Harvard, IQSS) Quantitative Discovery from Text 11 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
55/142
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
56/142
Scale: (1) unrelated, (2) loosely related, or (3) closely related
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
57/142
Scale: (1) unrelated, (2) loosely related, or (3) closely relatedTable reports: mean(scale)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
58/142
Scale: (1) unrelated, (2) loosely related, or (3) closely relatedTable reports: mean(scale)
Pairs from Overall Mean Evaluator 1 Evaluator 2Random Selection 1.38 1.16 1.60
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
59/142
Scale: (1) unrelated, (2) loosely related, or (3) closely relatedTable reports: mean(scale)
Pairs from Overall Mean Evaluator 1 Evaluator 2Random Selection 1.38 1.16 1.60
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
60/142
Scale: (1) unrelated, (2) loosely related, or (3) closely relatedTable reports: mean(scale)
Pairs from Overall Mean Evaluator 1 Evaluator 2Random Selection 1.38 1.16 1.60Hand-Coded Clusters 1.58 1.48 1.68
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
61/142
Scale: (1) unrelated, (2) loosely related, or (3) closely relatedTable reports: mean(scale)
Pairs from Overall Mean Evaluator 1 Evaluator 2Random Selection 1.38 1.16 1.60Hand-Coded Clusters 1.58 1.48 1.68Hand-Coding 2.06 1.88 2.24
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
62/142
Scale: (1) unrelated, (2) loosely related, or (3) closely relatedTable reports: mean(scale)
Pairs from Overall Mean Evaluator 1 Evaluator 2Random Selection 1.38 1.16 1.60Hand-Coded Clusters 1.58 1.48 1.68Hand-Coding 2.06 1.88 2.24Machine 2.24 2.08 2.40
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluators Rate Machine Choices Better Than Their Own
http://find/7/28/2019 Discov
63/142
Scale: (1) unrelated, (2) loosely related, or (3) closely relatedTable reports: mean(scale)
Pairs from Overall Mean Evaluator 1 Evaluator 2Random Selection 1.38 1.16 1.60Hand-Coded Clusters 1.58 1.48 1.68Hand-Coding 2.06 1.88 2.24Machine 2.24 2.08 2.40
p.s. The hand-coders did the evaluation!
Gary King (Harvard, IQSS) Quantitative Discovery from Text 12 / 23
Evaluating Performance
http://find/7/28/2019 Discov
64/142
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/7/28/2019 Discov
65/142
Goals:
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/7/28/2019 Discov
66/142
Goals:
Validate Claim: computer-assisted conceptualization outperformshuman conceptualization
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/7/28/2019 Discov
67/142
Goals:
Validate Claim: computer-assisted conceptualization outperformshuman conceptualizationDemonstrate: new experimental designs for cluster evaluation
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/http://goback/7/28/2019 Discov
68/142
Goals:
Validate Claim: computer-assisted conceptualization outperformshuman conceptualizationDemonstrate: new experimental designs for cluster evaluation
Inject human judgement: relying on insights from survey research
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/7/28/2019 Discov
69/142
Goals:
Validate Claim: computer-assisted conceptualization outperformshuman conceptualizationDemonstrate: new experimental designs for cluster evaluation
Inject human judgement: relying on insights from survey researchWe now present three evaluations
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/7/28/2019 Discov
70/142
Goals:
Validate Claim: computer-assisted conceptualization outperformshuman conceptualizationDemonstrate: new experimental designs for cluster evaluation
Inject human judgement: relying on insights from survey researchWe now present three evaluations
Cluster Quality RA coders
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/7/28/2019 Discov
71/142
Goals:
Validate Claim: computer-assisted conceptualization outperformshuman conceptualizationDemonstrate: new experimental designs for cluster evaluation
Inject human judgement: relying on insights from survey researchWe now present three evaluations
Cluster Quality RA codersInformative discoveries Experienced scholars analyzing texts
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluating Performance
http://find/7/28/2019 Discov
72/142
Goals:
Validate Claim: computer-assisted conceptualization outperformshuman conceptualizationDemonstrate: new experimental designs for cluster evaluation
Inject human judgement: relying on insights from survey researchWe now present three evaluations
Cluster Quality RA codersInformative discoveries Experienced scholars analyzing textsDiscovery Youre the judge
Gary King (Harvard, IQSS) Quantitative Discovery from Text 13 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
73/142
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
74/142
What Are Humans Good For?
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
75/142
What Are Humans Good For?
They cant: keep many documents & clusters in their head
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
76/142
What Are Humans Good For?
They cant: keep many documents & clusters in their headThey can: compare two documents at a time
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
77/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/http://goback/7/28/2019 Discov
78/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Experimental Design to Assess Cluster Quality
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
79/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Experimental Design to Assess Cluster Quality
automated visualization to choose one clustering
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
80/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Experimental Design to Assess Cluster Quality
automated visualization to choose one clusteringmany pairs of documents
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
81/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Experimental Design to Assess Cluster Quality
automated visualization to choose one clusteringmany pairs of documentsfor coders: (1) unrelated, (2) loosely related, (3) closely related
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
82/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Experimental Design to Assess Cluster Quality
automated visualization to choose one clusteringmany pairs of documentsfor coders: (1) unrelated, (2) loosely related, (3) closely relatedQuality = mean(within cluster) - mean(between clusters)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
83/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Experimental Design to Assess Cluster Quality
automated visualization to choose one clusteringmany pairs of documentsfor coders: (1) unrelated, (2) loosely related, (3) closely relatedQuality = mean(within cluster) - mean(between clusters)Bias results against ourselves by not letting evaluators choose clustering
Gary King (Harvard IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
84/142
What Are Humans Good For?They cant: keep many documents & clusters in their headThey can: compare two documents at a time= Cluster quality evaluation: human judgement of document pairs
Experimental Design to Assess Cluster Quality
automated visualization to choose one clusteringmany pairs of documentsfor coders: (1) unrelated, (2) loosely related, (3) closely relatedQuality = mean(within cluster) - mean(between clusters)Bias results against ourselves by not letting evaluators choose clustering
Gary King (Harvard IQSS) Quantitative Discovery from Text 14 / 23
Evaluation 1: Cluster Quality
http://find/7/28/2019 Discov
85/142
(Our Method) (Human Coders)
0.3 0.2 0.1 0.1 0.2 0.3
Gary King (Harvard IQSS) Quantitative Discovery from Text 15 / 23
Evaluation 1: Cluster Quality
http://find/http://find/7/28/2019 Discov
86/142
(Our Method) (Human Coders)
0.3 0.2 0.1 0.1 0.2 0.3
qLautenberg Press Releases
Lautenberg: 200 Senate Press Releases (appropriations, economy,education, tax, veterans, . . . )
Gary King (Harvard IQSS) Quantitative Discovery from Text 15 / 23
Evaluation 1: Cluster Quality
http://find/http://find/7/28/2019 Discov
87/142
(Our Method) (Human Coders)
0.3 0.2 0.1 0.1 0.2 0.3
qLautenberg Press Releases
qPolicy Agendas Project
Policy Agendas: 213 quasi-sentences from Bushs State of the Union(agriculture, banking & commerce, civil rights/liberties, defense, . . . )
Gary King (Harvard IQSS) Quantitative Discovery from Text 15 / 23
Evaluation 1: Cluster Quality
http://find/http://find/7/28/2019 Discov
88/142
(Our Method) (Human Coders)
0.3 0.2 0.1 0.1 0.2 0.3
qLautenberg Press Releases
qPolicy Agendas Project
qReuter's Gold Standard
Reuters: financial news (trade, earnings, copper, gold, coffee, . . . ); goldstandard for supervised learning studies
Gary King (Harvard IQSS) Quantitative Discovery from Text 15 / 23
Evaluation 2: More Informative Discoveries
http://goback/http://find/http://find/http://goback/7/28/2019 Discov
89/142
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
http://find/7/28/2019 Discov
90/142
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
91/142
Created 6 clusterings:
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
92/142
Created 6 clusterings:
2 clusterings selected with our method (biased against us)
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
93/142
Created 6 clusterings:
2 clusterings selected with our method (biased against us)2 clusterings from each of 2 other methods (varying tuning parameters)
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
94/142
Created 6 clusterings:
2 clusterings selected with our method (biased against us)2 clusterings from each of 2 other methods (varying tuning parameters)
Created info packet on each clustering (for each cluster: exemplardocument, automated content summary)
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
95/142
Created 6 clusterings:
2 clusterings selected with our method (biased against us)2 clusterings from each of 2 other methods (varying tuning parameters)
Created info packet on each clustering (for each cluster: exemplardocument, automated content summary)
Asked for6
2
=15 pairwise comparisons
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
96/142
Created 6 clusterings:
2 clusterings selected with our method (biased against us)2 clusterings from each of 2 other methods (varying tuning parameters)
Created info packet on each clustering (for each cluster: exemplardocument, automated content summary)
Asked for6
2
=15 pairwise comparisons
User chooses only care about the one clustering that wins
Gary King (Harvard IQSS) Quantitative Discovery from Text 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
97/142
Created 6 clusterings:
2 clusterings selected with our method (biased against us)2 clusterings from each of 2 other methods (varying tuning parameters)
Created info packet on each clustering (for each cluster: exemplardocument, automated content summary)
Asked for6
2
=15 pairwise comparisons
User chooses only care about the one clustering that wins
Both cases a Condorcet winner:
G Ki (H d IQSS) Q tit ti Dis f T t 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
98/142
C eated 6 c uste gs
2 clusterings selected with our method (biased against us)2 clusterings from each of 2 other methods (varying tuning parameters)
Created info packet on each clustering (for each cluster: exemplardocument, automated content summary)
Asked for6
2
=15 pairwise comparisons
User chooses only care about the one clustering that wins
Both cases a Condorcet winner:
Immigration:
Our Method 1 vMF 1 vMF 2 Our Method 2 K-Means 1 K-Means 2
G Ki (H d IQSS) Q tit ti Di f T t 16 / 23
Evaluation 2: More Informative Discoveries
Found 2 scholars analyzing lots of textual data for their work
Created 6 clusterings:
http://find/7/28/2019 Discov
99/142
g
2 clusterings selected with our method (biased against us)2 clusterings from each of 2 other methods (varying tuning parameters)
Created info packet on each clustering (for each cluster: exemplardocument, automated content summary)
Asked for6
2
=15 pairwise comparisons
User chooses only care about the one clustering that wins
Both cases a Condorcet winner:
Immigration:
Our Method 1 vMF 1 vMF 2 Our Method 2 K-Means 1 K-Means 2
Genetic testing:
Our Method 1 {Our Method 2, K-Means 1, K-means 2} Dir Proc. 1 Dir Proc. 2
G Ki (H d IQSS) Q tit ti Di f T t 16 / 23
Evaluation 3: What Do Members of Congress Do?
http://find/7/28/2019 Discov
100/142
G Ki (H d IQSS) Q tit ti Di f T t 17 / 23
Evaluation 3: What Do Members of Congress Do?
http://find/7/28/2019 Discov
101/142
- David Mayhews (1974) famous typology
G Ki (H d IQSS) Q i i Di f T 17 / 23
Evaluation 3: What Do Members of Congress Do?
http://find/7/28/2019 Discov
102/142
- David Mayhews (1974) famous typology
- Advertising
G Ki (H d IQSS) Q i i Di f T 17 / 23
Evaluation 3: What Do Members of Congress Do?
http://find/7/28/2019 Discov
103/142
- David Mayhews (1974) famous typology
- Advertising- Credit Claiming
Gary King (Harvard, IQSS) Quantitative Discovery from Text 17 / 23
Evaluation 3: What Do Members of Congress Do?
http://find/7/28/2019 Discov
104/142
- David Mayhews (1974) famous typology
- Advertising- Credit Claiming- Position Taking
Gary King (Harvard, IQSS) Quantitative Discovery from Text 17 / 23
Evaluation 3: What Do Members of Congress Do?
http://find/7/28/2019 Discov
105/142
- David Mayhews (1974) famous typology
- Advertising- Credit Claiming- Position Taking
- Data: 200 press releases from Frank Lautenbergs office (D-NJ)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 17 / 23
Evaluation 3: What Do Members of Congress Do?
http://find/7/28/2019 Discov
106/142
- David Mayhews (1974) famous typology
- Advertising- Credit Claiming- Position Taking
- Data: 200 press releases from Frank Lautenbergs office (D-NJ)
- Apply our method
Gary King (Harvard, IQSS) Quantitative Discovery from Text 17 / 23
Example Discovery
mult_dirproc
sot_cordivisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary completeh l l i i
http://find/7/28/2019 Discov
107/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
mixvmfVA
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary single
hclust binary complete
hclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation single
hclust correlation completehclust correlation average
hclust correlation mcquitty
hclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cordivisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary completehclust correlation mcquitty
http://find/7/28/2019 Discov
108/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary single
hclust binary complete
hclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation single
hclust correlation completehclust correlation average
hclust correlation mcquitty
hclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
affprop cosine
Red point: a clustering byAffinity Propagation-Cosine(Dueck and Frey 2007)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cordivisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary completehclust correlation mcquitty
mixvmf
http://find/7/28/2019 Discov
109/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary single
y p
hclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation average
hclust correlation mcquitty
hclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
affprop cosine
Red point: a clustering byAffinity Propagation-Cosine(Dueck and Frey 2007)Close to:Mixture of von Mises-Fisherdistributions (Banerjee et. al.
2005)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cordivisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary completehclust correlation mcquitty
http://find/7/28/2019 Discov
110/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation average
q y
hclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
Space between methods:
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cordivisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
h l t i lhclust correlation mcquitty
http://find/7/28/2019 Discov
111/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
Space between methods:
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cor
ff i
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson singlehclust correlation mcquitty
http://find/7/28/2019 Discov
112/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
Space between methods:local cluster ensemble
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson singlehclust correlation mcquitty
http://find/7/28/2019 Discov
113/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson singleh l t itt
hclust correlation mcquitty
http://find/7/28/2019 Discov
114/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop cosine
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
Found a region with particularlyinsightful clusterings
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson singlehclust pearson mcquittyhclust pearson medianhcl st correlation single
hclust correlation mcquitty Mixture:
http://goforward/http://find/http://goback/7/28/2019 Discov
115/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
p p
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson singlehclust pearson mcquittyhclust pearson medianhclust correlation single
hclust correlation mcquitty Mixture:
http://find/7/28/2019 Discov
116/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson complete
hclust pearson averagehclust pearson mcquittyhclust pearson median
hclust pearson centroid
hclust correlation ward
hclust correlation singlehclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
0.39 Hclust-Canberra-McQuitty
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
mec
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson single
h l thclust pearson mcquittyhclust pearson medianhclust correlation single
hclust correlation complete
hclust correlation mcquitty Mixture:
http://find/7/28/2019 Discov
117/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson complete
hclust pearson averagep q yp
hclust pearson centroid
hclust correlation ward
ghclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
0.39 Hclust-Canberra-McQuitty
0.30 Spectral clusteringRandom Walk(Metrics 1-6)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
mec
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson single
hclust pearson averagehclust pearson mcquittyhclust pearson medianhclust correlation single
hclust correlation complete
hclust correlation mcquitty Mixture:
http://find/7/28/2019 Discov
118/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
mec
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson complete
hclust pearson average
hclust pearson centroid
hclust correlation ward
hclust correlation complete
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
0.39 Hclust-Canberra-McQuitty
0.30 Spectral clusteringRandom Walk(Metrics 1-6)
0.13 Hclust-Correlation-Ward
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
mec
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson single
hclust pearson averagehclust pearson mcquittyhclust pearson medianhclust correlation single
hclust correlation complete
hclust correlation mcquitty
hclust correlation median
Mixture:
http://find/7/28/2019 Discov
119/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson complete
hclust pearson average
hclust pearson centroid
hclust correlation ward
p
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
0.39 Hclust-Canberra-McQuitty
0.30 Spectral clusteringRandom Walk(Metrics 1-6)
0.13 Hclust-Correlation-Ward
0.09 Hclust-Pearson-Ward
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
mec
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson single
hclust pearson averagehclust pearson mcquittyhclust pearson medianhclust correlation single
hclust correlation complete
hclust correlation mcquitty
hclust correlation median
Mixture:
http://find/http://goback/7/28/2019 Discov
120/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson complete
p g
hclust pearson centroid
hclust correlation ward
hclust correlation averagehclust correlation median
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
0.39 Hclust-Canberra-McQuitty
0.30 Spectral clusteringRandom Walk(Metrics 1-6)
0.13 Hclust-Correlation-Ward
0.09 Hclust-Pearson-Ward
0.05 Kmediods-Cosine
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
mec
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson medianhclust correlation singlehclust correlation complete
hclust correlation mcquitty
hclust correlation median
Mixture:
http://find/7/28/2019 Discov
121/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
hclust pearson complete
hclust pearson centroid
hclust correlation ward
hclust correlation average
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
0.39 Hclust-Canberra-McQuitty
0.30 Spectral clusteringRandom Walk(Metrics 1-6)
0.13 Hclust-Correlation-Ward
0.09 Hclust-Pearson-Ward
0.05 Kmediods-Cosine
0.04 Spectral clusteringSymmetric
(Metrics 1-6)
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
mec
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
hclust binary complete
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson medianhclust correlation singlehclust correlation complete
hclust correlation mcquitty
hclust correlation median
http://find/7/28/2019 Discov
122/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mcquitty
hclust maximum medianhclust maximum centroid
hclust manhattan ward
hclust manhattan single
hclust manhattan complete
hclust manhattan average
hclust manhattan mcquitty
hclust manhattan medianhclust manhattan centroid
hclust canberra single
hclust canberra complete
hclust canberra average
hclust canberra mcquitty
hclust canberra median
hclust canberra centroid
hclust binary ward
hclust binary singlehclust binary average
hclust binary mcquittyhclust binary median
hclust binary centroid
hclust pearson ward
p p
hclust pearson centroid
hclust correlation ward
hclust correlation average
hclust correlation centroid
hclust spearman ward
hclust spearman single
hclust spearman complete
hclust spearman averagehclust spearman mcquitty
hclust spearman medianhclust spearman centroid
hclust kendall ward
hclust kendall single
hclust kendall complete
hclust kendall averagehclust kendall mcquittyhclust kendall median
hclust kendall centroid
q
Clusters in this Clustering
Mayhew
Gary King (Harvard, IQSS) Quantitative Discovery from Text 18 / 23
Example Discovery
mult_dirproc
mec
sot_cor
affprop cosine
divisive stand.euc
mixvmf mixvmfVA
kmeans correlationhclust canberra ward
h l bi i l
hclust binary complete
hclust pearson single
hclust pearson completehclust pearson average
hclust pearson mcquittyhclust pearson medianhclust correlation singlehclust correlation complete
h l l i
hclust correlation mcquitty
hclust correlation median
http://find/http://goback/7/28/2019 Discov
123/142
biclust_spectral
clust_convex
dismeadist_cosdist_fbinarydist_ebinarydist_minkowskidist_maxdist_canbdist_binary
rocksom
sot_euc
spec_cosspec_eucspec_man
spec_minkspec_maxspec_canb
mspec_cosmspec_euc
mspec_manmspec_minkmspec_max
mspec_canb
affprop euclidean
affprop manhattan
affprop info.co
affprop maximum
divisive euclidean
divisive manhattan
kmedoids stand.euckmedoids euclidean
kmedoids manhattan
kmeans euclidean
kmeans maximum
kmeans manhattan
kmeans canberra
kmeans binary
kmeans pearson
kmeans spearmankmeans kendall
hclust euclidean ward
hclust euclidean single
hclust euclidean complete
hclust euclidean averagehclust euclidean mcquitty
hclust euclidean medianhclust euclidean centroid
hclust maximum ward
hclust maximum single
hclust maximum completehclust maximum average
hclust maximum mc