Upload
kevin-boyack
View
132
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Replication of Uzzi (2013) Science study on atypical combinations, with additional work to show that journal and disciplinary effects are not insignificant.
Citation preview
Better Maps ● Better Solutions
SCITECH STRATEGIES
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
Atypical combinations are confounded by disciplinary effects
STI 2014Leiden, The NetherlandsSept. 3-5, 2014
Kevin W. Boyack & Richard KlavansSciTech Strategies, Inc.www.mapofscience.com
Better Maps ● Better SolutionsSCITECH STRATEGIES
2
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
BACKGROUND
We have long been interested in indicators of innovative research
Uzzi et al. (UMSJ) recently published an article correlating high impact papers (innovation) with “atypical combinations” (novelty) of reference journals
Intriguing results; we decided to investigate further – to replicate the study and then further explore this idea of novelty
Better Maps ● Better SolutionsSCITECH STRATEGIES
3
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
UZZI STUDY
Hypothesis: “The highest-impact science is primarily grounded in exceptionally conventional combinations of prior work yet simultaneously features an intrusion of unusual combinations”
Data: Used 17.9M articles (1950-2000) from WOS, containing 302M references to 15,613 cited journals
Method: » Journals are used as proxy for “areas of knowledge”» Determine which co-cited journal combinations are “conventional” and which are
“unusual” or “novel”» Develop indicators of “convention” and “novelty” from co-citation statistics» Calculate “convention” and “novelty” for each paper using indicators» Test indicators to see how they correlate with highly cited papers
Finding: Papers with high convention AND high novelty are twice as likely to be highly cited as the average paper
Better Maps ● Better SolutionsSCITECH STRATEGIES
4
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
UMSJ METHOD (1)
To determine which co-cited journal combinations are “conventional” and which are “novel”, UMSJ calculated Z-scores for each co-cited journal pair, where Z is defined:
Z = (Nact – Nexp) / Nvar
Nact is the actual number of journal co-citation counts Nexp is an expected number of journal co-citation counts Nvar is the variance of Nexp
Nexp and Nvar were estimated by calculating (10) randomized citation networks where all citation links were switched using a Monte Carlo technique, keeping citing/cited distributions constant at the paper level
A negative Z-score indicates that a journal pair is co-cited less often than expected; thus is an “atypical combination” of journals
Better Maps ● Better SolutionsSCITECH STRATEGIES
5
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
UMSJ METHOD (2)
Using the computed Z-scores for each co-cited journal pair, the set of Z-scores can then be located for each paper
Two summary statistics were calculated for each paper from its Z-score distribution:» Median Z-score – to characterize
central tendency or “convention”» 10th percentile (left tail) Z-score –
to characterize “novelty” Distributions of these
summary statistics were analyzed
Better Maps ● Better SolutionsSCITECH STRATEGIES
6
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
UMSJ METHOD (3)
Distributions of these paper-level summary statistics were analyzed
Indicators based on these summary statistics were created» Novelty
HIGH – 10th Pctl Z-score < 0 LOW – 10th Pctl Z-score > 0
» Conventionality HIGH – median Z-score > Avg LOW – median Z-score < Avg
Each paper classified in terms of convention and novelty
LowConvention
HighConvention
HighNovelty
LowNovelty
Better Maps ● Better SolutionsSCITECH STRATEGIES
7
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
UMSJ RESULTS
“Hit” papers defined as the top-5% highly cited papers
Using indicators:» Probability of a (N+C+)
HIGH NOVELTY, HIGH CONVENTION paper being a hit paper is 0.0911
» Probability of a (N-C-)LOW NOVELTY,LOW CONVENTIONpaper being a hit paper is 0.0205
Better Maps ● Better SolutionsSCITECH STRATEGIES
8
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
UMSJ ISSUES
“Analyses in the supplementary materials (fig. S6) show that these empirical regularities for the WOS taken as a whole are largely replicated on a field-by-field basis and across time”» Across time – YES» Across fields or disciplines – NOT REALLY! – UMSJ supplemental results show that
the N+C+ bin has the highest probability (of the 4 bins) of containing a hit paper for only 64% of the 243 subject categories
The fact that the N+C+ bin is not ranked first in 36% of subject categories is troubling, suggesting potentially large field effects, or even individual journal effects
Top-5% highly cited not sampled by field Journals may not be the right proxy for “areas of knowledge”
Better Maps ● Better SolutionsSCITECH STRATEGIES
9
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
REPLICATION
We used a different, but parallel, methodology to replicate the UMSJ distributions and results
Scopus data (2001-2010) – 12M articles, 226M references Included conference papers along with articles K50 statistics for co-cited journal pairs rather than Z-scores and Monte
Carlo simulations» K50 has the same conceptual formulation as the Z-score:
(Nact – Nexp) / Normalization» Expected values and normalization are based on row and column sums
UMSJ procedures for calculating distributions, etc. were all followed
Better Maps ● Better SolutionsSCITECH STRATEGIES
10
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
REPLICATION
For the left tail, we used the 5th percentile rather than the 10th percentile to more closely match UMSJ distributions
Indicator distributions for the median and left tail percentile values are very similar to the UMSJ distributions» Differences in the tail percentile
curves have no effect on indicators since the fractions of articles at the zero point of all curves are the same
LowConvention
HighConvention
HighNovelty
LowNovelty
Better Maps ● Better SolutionsSCITECH STRATEGIES
11
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
REPLICATION
Probabilities of hit papers 2001-2005 (top-5% highly cited) as of 2011
Our results are similar to the UMSJ results» Higher probability for N+C+ (0.0959 to 0.0911) coupled with a higher fraction within
that bin (9.5% to 6.7%) suggest that our method does even a bit better at locating highly cited papers.
» High novelty is accentuated overall using our method (N+C- is 0.0659 rather than 0.0533)
Replication was successful, and reproduces the major features of the UMSJ study
UMSJ (1990-2000) This study (2001-2005)
% sample Prob % sample Prob
High Novelty, High Convention (N+C+) 6.7% 0.0911 9.5% 0.0959
High Novelty, Low Convention (N+C-) 26% 0.0533 30.6% 0.0659
Low Novelty, High Convention (N-C+) 44% 0.0582 40.5% 0.0433
Low Novelty, Low Convention (N-C-) 23% 0.0205 19.4% 0.0205
Better Maps ● Better SolutionsSCITECH STRATEGIES
12
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
FIELD EFFECTS?
2x2 matrix probabilities for the top-5% sampled by field were compared to the 2x2 matrix probabilities using the top-5% overall
The bins are in the same order using top-5% by field, but the differences between bins are smaller» N+C+ (0.0834 vs 0.0959)» N-C- (0.0335 vs. 0.0205)
This suggests that “atypical combinations” are influenced by field effects
Better Maps ● Better SolutionsSCITECH STRATEGIES
13
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
FIELD EFFECTS?
Top 20 largest journals (by numbers of co-citations) are plotted in terms of convention and novelty» These 20 journals account for
15.9% of all co-citations Reminder note: Journal are
plotted here based on how they are co-cited, not what is published in them ! %
co-
citat
ions
abo
ve o
vera
ll med
ian
% co-citations below zero
Better Maps ● Better SolutionsSCITECH STRATEGIES
14
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
FIELD EFFECTS?
Three groups appear» PHYSICS (6 journals) – cited as
conventional, but not novel» BIOMED (9 journals) – cited as
both conventional and novel» MULTI (5 journals) – cited as
novel and not conventional Nature, Science, and PNAS
account for 9.4% of ALL atypical co-citation pairs» Multidisciplinary journals are
obviously not good proxies for “areas of knowledge”
» They contribute the most to the notion of “atypical”, suggesting that journals are a poor basis for this study
% c
o-cit
atio
ns a
bove
ove
rall m
edia
n
% co-citations below zero
Better Maps ● Better SolutionsSCITECH STRATEGIES
15
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
SUMMARY
We have replicated the UMSJ study and primary finding that» Papers with high convention AND high novelty are twice as likely to be highly cited
as the average paper This is a real finding! There seems to be something to the notion of
“atypical combinations” that is meaningful and could be predictive However … Field and journal effects are not insignificant, and given that these
studies were based on journal co-citation, journals and fields may be driving “atypical combinations”
Journals are the wrong proxy for “areas of knowledge”; we need an alternative proxy for “areas of knowledge”
Other potential measurements of “atypical-ness” or “novelty” that are relatively independent of field or journal effects should be proposed and tested
Better Maps ● Better SolutionsSCITECH STRATEGIES
16
Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth
QUESTIONS
Thank-you for your attention !