16
Better Maps Better Solutions SCITECH STRATEGIES Physics Chemistry Engineering Biology Disease Medicine Brain Health Social Computer Humanities Earth Atypical combinations are confounded by disciplinary effects STI 2014 Leiden, The Netherlands Sept. 3-5, 2014 Kevin W. Boyack & Richard Klavan SciTech Strategies, Inc www.mapofscience.co

Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Embed Size (px)

DESCRIPTION

Replication of Uzzi (2013) Science study on atypical combinations, with additional work to show that journal and disciplinary effects are not insignificant.

Citation preview

Page 1: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better Solutions

SCITECH STRATEGIES

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

Atypical combinations are confounded by disciplinary effects

STI 2014Leiden, The NetherlandsSept. 3-5, 2014

Kevin W. Boyack & Richard KlavansSciTech Strategies, Inc.www.mapofscience.com

Page 2: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

2

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

BACKGROUND

We have long been interested in indicators of innovative research

Uzzi et al. (UMSJ) recently published an article correlating high impact papers (innovation) with “atypical combinations” (novelty) of reference journals

Intriguing results; we decided to investigate further – to replicate the study and then further explore this idea of novelty

Page 3: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

3

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

UZZI STUDY

Hypothesis: “The highest-impact science is primarily grounded in exceptionally conventional combinations of prior work yet simultaneously features an intrusion of unusual combinations”

Data: Used 17.9M articles (1950-2000) from WOS, containing 302M references to 15,613 cited journals

Method: » Journals are used as proxy for “areas of knowledge”» Determine which co-cited journal combinations are “conventional” and which are

“unusual” or “novel”» Develop indicators of “convention” and “novelty” from co-citation statistics» Calculate “convention” and “novelty” for each paper using indicators» Test indicators to see how they correlate with highly cited papers

Finding: Papers with high convention AND high novelty are twice as likely to be highly cited as the average paper

Page 4: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

4

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

UMSJ METHOD (1)

To determine which co-cited journal combinations are “conventional” and which are “novel”, UMSJ calculated Z-scores for each co-cited journal pair, where Z is defined:

Z = (Nact – Nexp) / Nvar

Nact is the actual number of journal co-citation counts Nexp is an expected number of journal co-citation counts Nvar is the variance of Nexp

Nexp and Nvar were estimated by calculating (10) randomized citation networks where all citation links were switched using a Monte Carlo technique, keeping citing/cited distributions constant at the paper level

A negative Z-score indicates that a journal pair is co-cited less often than expected; thus is an “atypical combination” of journals

Page 5: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

5

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

UMSJ METHOD (2)

Using the computed Z-scores for each co-cited journal pair, the set of Z-scores can then be located for each paper

Two summary statistics were calculated for each paper from its Z-score distribution:» Median Z-score – to characterize

central tendency or “convention”» 10th percentile (left tail) Z-score –

to characterize “novelty” Distributions of these

summary statistics were analyzed

Page 6: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

6

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

UMSJ METHOD (3)

Distributions of these paper-level summary statistics were analyzed

Indicators based on these summary statistics were created» Novelty

HIGH – 10th Pctl Z-score < 0 LOW – 10th Pctl Z-score > 0

» Conventionality HIGH – median Z-score > Avg LOW – median Z-score < Avg

Each paper classified in terms of convention and novelty

LowConvention

HighConvention

HighNovelty

LowNovelty

Page 7: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

7

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

UMSJ RESULTS

“Hit” papers defined as the top-5% highly cited papers

Using indicators:» Probability of a (N+C+)

HIGH NOVELTY, HIGH CONVENTION paper being a hit paper is 0.0911

» Probability of a (N-C-)LOW NOVELTY,LOW CONVENTIONpaper being a hit paper is 0.0205

Page 8: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

8

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

UMSJ ISSUES

“Analyses in the supplementary materials (fig. S6) show that these empirical regularities for the WOS taken as a whole are largely replicated on a field-by-field basis and across time”» Across time – YES» Across fields or disciplines – NOT REALLY! – UMSJ supplemental results show that

the N+C+ bin has the highest probability (of the 4 bins) of containing a hit paper for only 64% of the 243 subject categories

The fact that the N+C+ bin is not ranked first in 36% of subject categories is troubling, suggesting potentially large field effects, or even individual journal effects

Top-5% highly cited not sampled by field Journals may not be the right proxy for “areas of knowledge”

Page 9: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

9

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

REPLICATION

We used a different, but parallel, methodology to replicate the UMSJ distributions and results

Scopus data (2001-2010) – 12M articles, 226M references Included conference papers along with articles K50 statistics for co-cited journal pairs rather than Z-scores and Monte

Carlo simulations» K50 has the same conceptual formulation as the Z-score:

(Nact – Nexp) / Normalization» Expected values and normalization are based on row and column sums

UMSJ procedures for calculating distributions, etc. were all followed

Page 10: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

10

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

REPLICATION

For the left tail, we used the 5th percentile rather than the 10th percentile to more closely match UMSJ distributions

Indicator distributions for the median and left tail percentile values are very similar to the UMSJ distributions» Differences in the tail percentile

curves have no effect on indicators since the fractions of articles at the zero point of all curves are the same

LowConvention

HighConvention

HighNovelty

LowNovelty

Page 11: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

11

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

REPLICATION

Probabilities of hit papers 2001-2005 (top-5% highly cited) as of 2011

Our results are similar to the UMSJ results» Higher probability for N+C+ (0.0959 to 0.0911) coupled with a higher fraction within

that bin (9.5% to 6.7%) suggest that our method does even a bit better at locating highly cited papers.

» High novelty is accentuated overall using our method (N+C- is 0.0659 rather than 0.0533)

Replication was successful, and reproduces the major features of the UMSJ study

UMSJ (1990-2000) This study (2001-2005)

% sample Prob % sample Prob

High Novelty, High Convention (N+C+) 6.7% 0.0911 9.5% 0.0959

High Novelty, Low Convention (N+C-) 26% 0.0533 30.6% 0.0659

Low Novelty, High Convention (N-C+) 44% 0.0582 40.5% 0.0433

Low Novelty, Low Convention (N-C-) 23% 0.0205 19.4% 0.0205

Page 12: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

12

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

FIELD EFFECTS?

2x2 matrix probabilities for the top-5% sampled by field were compared to the 2x2 matrix probabilities using the top-5% overall

The bins are in the same order using top-5% by field, but the differences between bins are smaller» N+C+ (0.0834 vs 0.0959)» N-C- (0.0335 vs. 0.0205)

This suggests that “atypical combinations” are influenced by field effects

Page 13: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

13

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

FIELD EFFECTS?

Top 20 largest journals (by numbers of co-citations) are plotted in terms of convention and novelty» These 20 journals account for

15.9% of all co-citations Reminder note: Journal are

plotted here based on how they are co-cited, not what is published in them ! %

co-

citat

ions

abo

ve o

vera

ll med

ian

% co-citations below zero

Page 14: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

14

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

FIELD EFFECTS?

Three groups appear» PHYSICS (6 journals) – cited as

conventional, but not novel» BIOMED (9 journals) – cited as

both conventional and novel» MULTI (5 journals) – cited as

novel and not conventional Nature, Science, and PNAS

account for 9.4% of ALL atypical co-citation pairs» Multidisciplinary journals are

obviously not good proxies for “areas of knowledge”

» They contribute the most to the notion of “atypical”, suggesting that journals are a poor basis for this study

% c

o-cit

atio

ns a

bove

ove

rall m

edia

n

% co-citations below zero

Page 15: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

15

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

SUMMARY

We have replicated the UMSJ study and primary finding that» Papers with high convention AND high novelty are twice as likely to be highly cited

as the average paper This is a real finding! There seems to be something to the notion of

“atypical combinations” that is meaningful and could be predictive However … Field and journal effects are not insignificant, and given that these

studies were based on journal co-citation, journals and fields may be driving “atypical combinations”

Journals are the wrong proxy for “areas of knowledge”; we need an alternative proxy for “areas of knowledge”

Other potential measurements of “atypical-ness” or “novelty” that are relatively independent of field or journal effects should be proposed and tested

Page 16: Atypical combinations are confounded by disciplinary effects (Boyack & Klavans)

Better Maps ● Better SolutionsSCITECH STRATEGIES

16

Physics Chemistry Engineering Biology Disease Medicine Brain Health SocialComputer HumanitiesEarth

QUESTIONS

Thank-you for your attention !