Download pdf - A systematic empirical comparison of different approaches for normalizing citation impact indicators

A systematic empirical comparison

of different approaches for

normalizing citation impact

indicators

Ludo Waltman and Nees Jan van Eck

Centre for Science and Technology Studies (CWTS), Leiden University

14th ISSI conference, Vienna, Austria

July 16, 2013

2

Introduction

• Citation-based indicators need to be normalized for

differences in citation practices between fields

• Traditional normalization based on WoS subject

categories is problematic because many subject

categories are heterogeneous in terms of citation

practices

3

Clinical Neurology: Citation density

Visualization produced using VOSviewer

(Van Eck et al., PLoS ONE, 2012)

4

Clinical Neurology: Reference density

Density of references instead of citations

Notice the similar patterns in the two visualizations!

5

Normalization approaches

• Normalization based on a classification system (‘cited-

side normalization’)

• Source normalization (‘citing-side normalization’)

6

Normalization based on classification system

• Requires a field classification system in which

publications are assigned to fields

• Following common practice, we use the WoS journal

subject categories

• Citations are compared to the field average

e

cNCS

Number of citations of a publication

Average number of citations of all

publications in the same field

7

Source normalization (1)

• No field classification system is needed

• Citations are weighted differently depending on the

number of references in the citing publication or the

citing journal

• Instead of giving equal weight to each citation, equal

weight is given to each citing publication

8

Source normalization (2)

• Three source normalization variants:

– Audience factor (Zitt & Small, JASIST, 2008):

– Fractional citation counting (Leydesdorff and colleagues):

– Revised SNIP (Waltman et al., JOI, 2013):

• Only ‘active references’ should be considered!

c

i ia1

)1( 1SNCS

c

i ir1

)2( 1SNCS

c

i ii rp1

)3( 1SNCS

Average number of references per publication

in citing journal

Number of references in citing publication

Proportion of publications in citing journal

without references

9

Evaluation

• To what degree does each normalization approach:

– correct for field differences?

– correct for differences in the age of publications?

• Using the same classification system both in the

implementation and in the evaluation of a normalization

approach gives biased results (Sirtes, JOI, 2012)

• For evaluation purposes, we use four classification

systems:

– WoS journal subject categories

– Algorithmically constructed classification systems A, B, and C

• 3.8 million WoS publications from the period 2007–2010

• Classification systems constructed using large-scale

clustering approach (Waltman &Van Eck, JASIST, 2012)

• Clusters defined at the level of individual publications

rather than at the journal level

• Number of clusters (research areas) per classification

system:

– Classification system A: 21

– Classification system B: 161

– Classification system C: 1334

Algorithmically constructed classification

systems

10

Average score per normalization

approach and per publication year

11

0

2

4

6

8

10

12

CS NCS SNCS(1) SNCS(2) SNCS(3)

Ave

rage

sco

re

2007

2008

2009

2010

Similarity of normalized citation

distributions

• Let’s now look beyond averages

• To what degree do fields have identical normalized

citation distributions?

12

Results based on WoS subject categories (1)

• Similarity of normalized citation distributions in different

fields

13

235 WoS subject categories; publication year 2007

Inequality index

• How to summarize the degree to which citation

distributions coincide?

• We use the methodological framework of Crespo et al.

(PLoS ONE, 2013)

• Citation distributions are compared percentile-by-

percentile using the Theil inequality index

14

Results based on WoS subject categories (2)

15

Results based on classification system A

16

Results based on classification system B

17

Results based on classification system C

18

19

Citation density before

normalization

Citation density after

normalization using SNCS(3)

20

Citation density before normalization

Citation density after normalization using SNCS(3)

Conclusions

• Using the same classification system both in the

implementation and in the evaluation of a normalization

approach should be avoided

• NCS (subject-category-based normalization) and SNCS(2)

(‘fractional citation counting’) do not perform so well

• SNCS(1) (‘audience factor’) and SNCS(3) (‘revised

SNIP’) have a good performance

• Need for more practical experience with SNCS(1),

SNCS(3), and alternative normalization approaches, in

particular percentile-rank normalization

21

22 22

Thank you for your attention!

Problem of using the same classification

system both in the implementation and in the

evaluation of the NCS approach

Publication Field No. of citations NCS

1 A 1 0.50

2 A 3 1.50

3 B 2 0.50

4 B 6 1.50

23

Publication Field No. of citations NCS

1 X 1 0.67

2 Y 3 0.67

3 X 2 1.33

4 Y 6 1.33

• Results based on the correct assignment of publications to fields

• Results based on an incorrect assignment of publications to fields

24

Practical issues

• World average

• Document types

• Citation windows

• PPtop indicators

• ‘Trade journal problem’

25

A related problem (1)

26


298 articles in 2008

40 citations

27


Resp. 266 and 486 articles in 2008

Resp. 6% and 15% of all articles and reviews in WoS categories Business and Business, finance

Resp. 29 and 8 citations

28

National journals

Effect of excluding national journals on MNCS

See Waltman & Van Eck (2012)

29

National journals

Effect of excluding national journals on MSNCS3