13
The Effect on Citation Inequality of Differences in Citation Practices at the Web of Science Subject Category Level Juan A. Crespo Departamento de Economía Cuantitativa, Universidad Autónoma de Madrid, Tomas y Valiente, 5, 28049 Cantoblanco, Madrid, Spain. E-mail: [email protected] Neus Herranz Department of Economics, University of Illinois at Urbana-Champaign, Urbana-Champaign, 214 David Kinley Hall, 1407 W. Gregory, Urbana, IL 61801. E-mail: [email protected] Yunrong Li and Javier Ruiz-Castillo Departamento de Economía, Universidad Carlos III, 125 Getafe, Madrid, Spain 28903. E-mail: {yli, jrc}@eco.uc3m.es This article studies the impact of differences in citation practices at the subfield, or Web of Science subject category level, using the model introduced in Crespo, Li, and Ruiz-Castillo (2013a), according to which the number of citations received by an article depends on its underlying scientific influence and the field to which it belongs. We use the same Thomson Reuters data set of about 4.4 million articles used in Crespo et al. (2013a) to analyze 22 broad fields. The main results are the follow- ing: First, when the classification system goes from 22 fields to 219 subfields the effect on citation inequality of differences in citation practices increases from 14% at the field level to 18% at the subfield level. Second, we estimate a set of exchange rates (ERs) over a wide [660, 978] citation quantile interval to express the citation counts of articles into the equivalent counts in the all- sciences case. In the fractional case, for example, we find that in 187 of 219 subfields the ERs are reliable in the sense that the coefficient of variation is smaller than or equal to 0.10. Third, in the fractional case the normal- ization of the raw data using the ERs (or subfield mean citations) as normalization factors reduces the impor- tance of the differences in citation practices from 18% to 3.8% (3.4%) of overall citation inequality. Fourth, the results in the fractional case are essentially replicated when we adopt a multiplicative approach. Introduction From the beginning of scientometrics as a field of study, scholars have been aware of the field dependence on refer- ence and citation counts in scientific articles (see, inter alia, Garfield, 1979; Murugesan & Moravcsik, 1978; Pinski & Narin, 1976). In Crespo, Li, and Ruiz-Castillo (2013a), three of us introduced a measurement framework where, given a classification system—namely, a classification of science into scientific disciplines—it is possible to quantify the importance of differences in publication and citation prac- tices. The framework is based on a simple model in which the number of citations received by an article is a function of two variables: the article’s underlying scientific influence and the discipline to which it belongs. Consequently, the citation inequality of the distribution consisting of all articles in all disciplines—the all-sciences case—is the result of two forces: differences in scientific influence within homogeneous disciplines, and differences in citation prac- tices across them. In the implementation of this model using an additively decomposable inequality index, the citation inequality attributed to the second force is captured by a between-group inequality term in a certain partition by dis- cipline and citation quantile. We denote it as the IDCP (inequality attributable to differences in citation practices) term. For expository reasons, Crespo et al. (2013a) chose a very simple classification system consisting of the 22 broad categories distinguished by Thomson Reuters that will be referred to as fields. This classification system has the impor- tant property that every publication in the periodical litera- ture is assigned to only one field. Received February 26, 2013; revised April 18, 2013; accepted April 18, 2013 © 2014 ASIS&T Published online 22 February 2014 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/asi.23006 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 65(6):1244–1256, 2014

The effect on citation inequality of differences in citation practices at the web of science subject category level

Embed Size (px)

Citation preview

The Effect on Citation Inequality of Differences inCitation Practices at the Web of Science SubjectCategory Level

Juan A. CrespoDepartamento de Economía Cuantitativa, Universidad Autónoma de Madrid, Tomas y Valiente, 5, 28049Cantoblanco, Madrid, Spain. E-mail: [email protected]

Neus HerranzDepartment of Economics, University of Illinois at Urbana-Champaign, Urbana-Champaign, 214 David KinleyHall, 1407 W. Gregory, Urbana, IL 61801. E-mail: [email protected]

Yunrong Li and Javier Ruiz-CastilloDepartamento de Economía, Universidad Carlos III, 125 Getafe, Madrid, Spain 28903.E-mail: {yli, jrc}@eco.uc3m.es

This article studies the impact of differences in citationpractices at the subfield, or Web of Science subjectcategory level, using the model introduced in Crespo, Li,and Ruiz-Castillo (2013a), according to which thenumber of citations received by an article depends on itsunderlying scientific influence and the field to which itbelongs. We use the same Thomson Reuters data set ofabout 4.4 million articles used in Crespo et al. (2013a) toanalyze 22 broad fields. The main results are the follow-ing: First, when the classification system goes from 22fields to 219 subfields the effect on citation inequality ofdifferences in citation practices increases from ∼14% atthe field level to 18% at the subfield level. Second, weestimate a set of exchange rates (ERs) over a wide [660,978] citation quantile interval to express the citationcounts of articles into the equivalent counts in the all-sciences case. In the fractional case, for example, wefind that in 187 of 219 subfields the ERs are reliable inthe sense that the coefficient of variation is smaller thanor equal to 0.10. Third, in the fractional case the normal-ization of the raw data using the ERs (or subfield meancitations) as normalization factors reduces the impor-tance of the differences in citation practices from 18% to3.8% (3.4%) of overall citation inequality. Fourth, theresults in the fractional case are essentially replicatedwhen we adopt a multiplicative approach.

Introduction

From the beginning of scientometrics as a field of study,scholars have been aware of the field dependence on refer-ence and citation counts in scientific articles (see, inter alia,Garfield, 1979; Murugesan & Moravcsik, 1978; Pinski &Narin, 1976). In Crespo, Li, and Ruiz-Castillo (2013a), threeof us introduced a measurement framework where, given aclassification system—namely, a classification of scienceinto scientific disciplines—it is possible to quantify theimportance of differences in publication and citation prac-tices. The framework is based on a simple model in whichthe number of citations received by an article is a function oftwo variables: the article’s underlying scientific influenceand the discipline to which it belongs. Consequently, thecitation inequality of the distribution consisting of allarticles in all disciplines—the all-sciences case—is theresult of two forces: differences in scientific influence withinhomogeneous disciplines, and differences in citation prac-tices across them. In the implementation of this model usingan additively decomposable inequality index, the citationinequality attributed to the second force is captured by abetween-group inequality term in a certain partition by dis-cipline and citation quantile. We denote it as the IDCP(inequality attributable to differences in citation practices)term. For expository reasons, Crespo et al. (2013a) chose avery simple classification system consisting of the 22 broadcategories distinguished by Thomson Reuters that will bereferred to as fields. This classification system has the impor-tant property that every publication in the periodical litera-ture is assigned to only one field.

Received February 26, 2013; revised April 18, 2013; accepted April 18,

2013

© 2014 ASIS&T • Published online 22 February 2014 in Wiley OnlineLibrary (wileyonlinelibrary.com). DOI: 10.1002/asi.23006

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 65(6):1244–1256, 2014

It should be noted that one of the assumptions of themodel requires that citation impact varies monotonicallywith scientific influence. Thus, if one article has greaterscientific influence than another in the same field, thenwe expect the former to also have a greater citationimpact than the latter. As pointed out in Crespo et al.(2013a), given the heterogeneity of at least some ofthe 22 broad fields, adopting this assumptionis not realistic. Consider two publications i and j in thesame field that belong to two research areas with adifferent citation density. Contrary to the assumption,it may very well be the case that publication i hasa greater influence but receives fewer citations thanpublication j.

Consequently, this article extends the analysis to thelowest aggregation level permitted by our data, namely, the219 Web of Science categories, or subfields distinguishedby Thomson Reuters. The conjecture is that the lowerthe aggregation level characterizing the classificationsystem, the greater should be the relative effect on overallcitation inequality of differences in citation practices.As is well known, a practical problem is that in theThomson Reuters (and Scopus) databases, publications inthe periodical literature are assigned to subfields via thejournal in which they have been published. Many journalsare assigned to a single subfield, but many others areassigned to two, three, or more subfields. As a result, onlyabout 58% of all articles in our data set are assigned to asingle subfield. To solve this problem, in this article wefollow two different approaches: a fractional strategyaccording to which each publication is fractioned intoas many equal pieces as necessary, with each pieceassigned to a corresponding subfield, and a multiplicativestrategy in which each paper is wholly counted as manytimes as necessary in the several subfields to which it isassigned.

Since its inception, practitioners of scientometrics haverecognized that differences in citation practices—regardlessof how their impact is measured and independently of theaggregation level—pose fundamental difficulties for directcomparisons of the absolute number of citations receivedby articles in different scientific disciplines. However,Crespo et al. (2013a) show that the striking similaritybetween citation distributions at the field level, documentedin Albarrán and Ruiz-Castillo (2011), causes the citationinequality attributable to differences in citation practices tobe approximately constant over a wide range of citationquantiles. This makes it possible to estimate a set ofaverage-based indicators, called exchange rates (ERs here-after) that serve to answer the following two questions:First, how many citations received by an article in a givenfield are equivalent to, say, 10 citations in the all-sciencescase? Second, how much can we reduce the effect of dif-ferences in citation practices by normalizing the raw cita-tion data with the ERs? Based on the similarity betweencitation distributions at the subfield level—recently docu-mented in Albarrán, Crespo, Ortuño, and Ruiz-Castillo

(2011), Radicchi, Fortunato, and Castellano (2008), andRadicchi and Castellano (2012) in the multiplicative case,and in Herranz and Ruiz-Castillo (2012) in the fractionalcase—this article extends this empirical strategy to the sub-field level.

Naturally, the difficulty of comparing citation countsacross scientific disciplines is a well-known issue. Differ-ences in citation practices are usually taken into account bychoosing the world mean citation rates as normalizationfactors (see, inter alia, Braun, Glänzel, & Schubert, 1985;Moed, De Bruin, & van Leeuwen, 1995; Moed, Burger,Frankfort, & van Raan, 1985; Moed & van Raan, 1988;Schubert & Braun, 1986, 1996; Schubert, Glänzel, & Braun,1983, 1987, 1988; Vinkler, 1986, 2003). More recently,other contributions support this traditional procedure ondifferent grounds (Radicchi & Castellano, 2012a; Radicchiet al., 2008). Crespo et al. (2013a) find that, for the 22-fieldclassification system, this procedure leads to a slightlygreater reduction of the IDCP term than the reduction gen-erated by the ERs. Thus, this article also investigates therelative performance of ERs and mean citation rates as nor-malization factors for the classification system consisting of219 subfields.

To place this article in its context, it is useful to distin-guish between two types of normalization procedures. First,target or “cited-side” procedures, including the use of ERsand mean citation rates as normalization factors as well asthe recent proposals by Glänzel (2011) and Radicchi andCastellano (2012a). Beyond the two cases studied here, awide set of target normalization procedures at the subfieldlevel are extensively analyzed in Li, Radicchi, Castellano,and Ruiz-Castillo (2013). Second, we have source or“citing-side” procedures (see, inter alia, Glänzel, Schubert,Thijs, & Debackere, 2011; Leydesdorff & Opthof, 2010;Moed, 2010; Waltman & Van Eck, 2012; Zitt & Small,2008). Because our data set lacks citing side information,applying the latter is beyond the scope of this article. At anyrate, given a classification system, the performance of thetwo types of procedures are compared in Radicchi andCastellano (2012b), Leydesdorff, Radicchi, Bornmann,Castellano, and de Nooye (2012), and Waltman and Van Eck(2013).

The rest of the article consists of four sections. The nextsection summarizes the model for the measurement of theeffect on overall citation inequality of differences in cita-tion practices and presents the corresponding empiricalevidence for both fractional and the multiplicative strate-gies at the subfield level. The Normalization Procedures:The Fractional Case section presents the estimation ofaverage-based ERs and their standard deviations (SDshereafter) over a large citation quantile interval in the frac-tional case and explores the consequences of using themversus subfield mean citations as normalization factors.The Normalization Procedures: The Multiplicative Casesection studies the same issues under the multiplicativeapproach, and the final section contains some concludingcomments.

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014 1245DOI: 10.1002/asi

Measurement of the Effect on Citation Inequalityof Differences in Citation Practices at theSubfield Level

The Fractional Case

Suppose we have an initial citation distribution Q = {cl}consisting of N distinct articles, indexed by l = 1,. . ., N,where cl is the number of citations received by article l. Thetotal number of citations is denoted by γ = Σl cl. Assume thatthere are S subfields, indexed by s = 1,. . ., S. As indicated inthe Introduction, the problem is that about 42% of all articlesin our data set are assigned to two or more subfields. For laterreference, let Ns be the number of distinct articles in subfields under the multiplicative approach, indexed by i = 1,. . ., Ns.

For any l, let Xl be the nonempty set of subfields to whicharticle l is assigned, and denote by xl the cardinal of this set,that is, xl = |Xl|. Since, at most, an article is assigned to sixsubfields, we have that xl ∈ [1, 6]. In the fractional strategy,subfield s’s citation distribution can be described bycs = {wsi, csi}, where csi = cl for some article l in the initialdistribution Q, wsi = (1/xl) for all s ∈ Xl, and i = 1, . . . , Ns.Therefore, Σ s X sil w∈ = 1. The fractional number of articles insubfield s is ns = Σi wsi, the citations received by each frac-tional article are wsi csi, and the fractional number of citationsin subfield s is Σi wsi csi. It should be noted that∑ = ∑ ∑ = ∑ ∑ =∈s s s i si l s X sin w w Nl and ∑ ∑ =s i si siw c γ,that is, in the fractional strategy the total number of articlesand citations in the original data set, and hence the citationmean, are preserved.

Any distinct article i in subfield s with csi = cl for some lin the initial distribution Q, is assumed to have a scientificinfluence qsi that, for simplicity, is taken to be a single-dimensional variable. We assume that the citations receivedcsi are a function of two variables: the subfield s to which thearticle belongs, and the scientific influence of the article inquestion, qsi. Thus, for every s we write:

c s q i Nsi si s= ( ) =φ , , , , .1 … (1)

Let qs = {wsi, qsi} with qs1 ≤ qs2 ≤. . .≤ qsNs be the ordereddistribution of scientific influence in every subfield in thefractional case. Each distribution qs is assumed to be a char-acteristic of subfield s. No restriction is a priori imposed ondistributions qs, s = 1,. . ., S. Consequently, for any twoarticles i and j in two different fields s and t the values wsi qsi

and wtj qtj cannot be directly compared. To overcome thisdifficulty, we adopt the following key assumption.

Assumption 1 (A1). Articles at the same quantile π of anysubfield scientific influence distribution have the same degree ofscientific influence in their respective field.

Typically, scientific influence is an unobservable vari-able. However, although the form of ϕ in Equation 1 isunknown, we adopt the following assumption about it:

Assumption 2 (A2). The function ϕ in expression (1) is assumedto be monotonic in scientific influence; that is, for every pair ofarticles i and j in subfield s, if qsi ≤ qsj, then csi ≤ csj.

Under A2, the degree of scientific influence uniquelydetermines the location of an article in its subfield citationdistribution. Consequently, for every s, the partition of dis-tribution qs into Π quantiles qs

π of size ns/Π, induces acorresponding partition of the citation distribution cs

into Π quantiles csπ with the number of citations received

by the ns/Π articles in the π-th quantile qsπ. Note that

csπ = {wsk

π, cskπ}, with csk

π= csi = cl, and wskπ = 1/xl for some

k = 1,. . ., Ns and some l in Q. Assume for a moment that wedisregard the citation inequality within every vector cs

π byassigning to every article in that vector the (fractional)mean citation of the vector itself, μπ

s, defined byμπ

π πs i si si i siw c w= ∑( ) ∑∈ ∈ . Since the quantiles of citationimpact correspond—as we have already seen—to quantilesof the underlying scientific influence distribution, holdingconstant the degree of scientific influence at any πas in A1 is equivalent to holding constant the degree ofcitation impact at that quantile. Thus, for any π, the differ-ence between μs

π and μtπ for articles with the same degree of

scientific influence is entirely attributable to differences incitation practices between the two subfields. For any s and π,let μs

π = {wskπ, μs

π} be the (ns/Π)-vector where every cskπ in

csπ = {wsk

π, cskπ} has been replaced by the mean citation μs

π.As before, the citation inequality of distribution (μ1

π,. . .,μs

π,. . ., μSπ) is entirely due to differences in citation practices

between the S subfields.To implement our measurement framework, it is conve-

nient to work with additively decomposable citationinequality indices. For reasons explained in Crespo et al.(2013a), we choose a member of the so-called GeneralizedEntropy family of inequality indices, which are the onlymeasures of relative inequality that satisfy the usual proper-ties required from any inequality index and, in addition, aredecomposable by population subgroup. This is the first Theilindex, denoted by I1, and defined by:

I QN

c cl l

l11( ) = ⎛

⎝⎜⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟∑ μ μ

log (2)

where μ is the mean of distribution Q. Let c be the union ofall subfield distributions cs, that is, let c = ∪s cs. As we haveseen, the number of articles and the mean citation of distri-butions Q and c coincide. Clearly, citation inequality is alsothe same, that is, I1(c) = I1(Q). Therefore, in the sequel wewill work with distribution c.

For each π, let cπ = (c1π,. . ., cs

π,. . ., cSπ). Note that the

vector cπ has dimension Σs (ns/Π) = N/Π, and that the set cπ,π = 1,. . ., Π, form a partition of distribution c. For any π, letμπ be the (N/Π)-vector where every element in cπ has beenreplaced by the mean citation μπ = Σs [(ns/N]μs

π. As in Crespoet al. (2013a), applying the decomposability property of cita-tion inequality index I1 first to the partition c = (c1,. . ., cπ,. . .,cΠ), and then to the partition cπ = (c1

π,. . ., csπ,. . . cS

π) for eachπ, the overall citation inequality I1(c) can be seen to bedecomposable into the following three terms:

I W S IDCP1 c( ) = + + , (3)

1246 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014DOI: 10.1002/asi

with: ,W v Iss= ( )Σ Σπ

π1 cs

p

S I= ( )11m m P, ,…

IDCP v I v IS= ( ) = ∑ ( )Σππ

ππ π1 1m mp p, , ,…

where vπ,s is the share of total citations in quantile π ofsubfield s, and vπ = Σs vπ,s is the share of total citations invector cπ. The term W is a within-group term that capturesthe weighted citation inequality within each quantile inevery subfield. For large Π, W is expected to be small. Theterm S is the citation inequality of the distribution (μ1,. . .,μΠ), and therefore it is a measure of citation inequality atdifferent degrees of citation impact in the all-sciences case.Due to the skewness of science, S is expected to be large.Finally, for any π, the expression I1(μ1

π,. . ., μSπ), abbreviated

I(π), is the citation inequality attributable to differences incitation practices according to I1. Thus, the weightedaverage that constitutes the third term in expression (3),denoted by IDCP, provides a good measure of the citationinequality due to such differences at the subfield level. Thequestion of interest, of course, is how large is the IDCP termin relation to overall citation inequality I1(c).

Multiplicative Approach

In the multiplicative approach, each article is whollycounted as many times as necessary in the several subfieldsto which it is assigned. In this way, the space of articles isexpanded as much as necessary beyond the initial size inwhat we call the subfield extended count, say distribution C.In this approach, subfield s’s citation distribution can bedescribed by Cs = {csi} with i = 1,. . ., Ns, where csi is thenumber of citations of article i in subfield s, and csi = cl forsome article l in the initial distribution Q. Of course, C = ∪s

Cs, and the total number of articles in the subfield extendedcount is M = Σs Ns > N.

In what follows, let us order subfield citation distribu-tions, so that for any s we have Cs = (cs1,. . ., csi,. . ., csNs) withcs1 ≤ cs2 ≤ . . . ≤ csNs. Consider the partition of distribution Cs

into Π quantiles, Cs = (Cs1, . . ., Cs

π, . . ., CsΠ), where each

vector Csπ = {csj} with j = 1, . . ., Ns/Π. For each π, define the

citation distribution Cπ = (C1π, . . ., Cs

π, . . ., CSπ). Clearly, the

number of articles in Cπ is Σs Ns/Π = M/Π, and the set ofvectors (C1, . . ., Cπ, . . ., CΠ) form a partition of distributionC. For any s and π, let ms

π be the (Ns/Π)-vector where everycsj

π in Csπ = {csj

π} has been replaced by the mean citationms

π = (Σj csjπ)/(Ns/Π). Similarly, for any π, let mπ be the

(N/Π)-vector where every element in Cπ has been replacedby the mean citation mπ= Σs (ns/N)ms

π. Applying the decom-posability property of citation inequality index I1 first to thepartition C = (C1, . . ., Cπ, . . ., CΠ), and then to the partitionCπ = (C1

π, . . ., Csπ, . . ., CS

π) for each π, the overall citationinequality I1(C) can be seen to be decomposable into thefollowing three terms analogous to what we had in expres-sion (3):

I W S IDCP1 C( ) = ′ + ′ + ′, (4)

with: ,′ = ( )W V IssΣ Σπ

π1 Cs

p

′ = ( )S I11m m, ,… P

IDCP V I S′ = ( )Σππ

1 1m mp p, , ,…

where Vπ,s is the share of total citations in quantile π ofsubfield s, and Vπ = Σs Vπ,s is the share of total citations invector Cπ. As before, the term W’ is a within-group citationinequality term, S’ captures the skewness of science, andIDCP’ is the citation inequality that can be attributedto differences in citation practices in the multiplicativecase.

Empirical Results

In this article, only research articles or, simply, articles,are studied. Our data set consists of 4.4 million articlespublished in 1998–2003, and the 35 million citations theyreceive after a common 5-year citation window for everyyear.1 The extended count is 7,027,037, or 57.4% larger thanthe total number of articles in the fractional approach.Table A in the Appendix in the Working Paper version of thispaper (Crespo et al., 2013b), presents the number of articlesand mean citation rates in the fractional case. For conve-nience, subfields are classified in terms of 19 fields, andfour large groups, Life Sciences, Physical Sciences, OtherNatural Sciences, and Social Sciences, which represent,respectively, 40.1%, 30.2%, 25.8%, and 3.9% of all articles(the same information for the multiplicative case is availableon request).

Table 1, which includes the decompositions of I1(c) andI1(C) presented in expressions (3) and (4) for the valueΠ = 1,000, deserves the following three comments.2 First, asin Crespo et al. (2013a), the terms W and W’ are small,whereas the terms S and S’ are large. Second, the importanceof the effect on overall citation inequality of differences incitation practices is larger when working with 219 subfieldsthan with 22 broad fields. In particular, the IDCP term thatrepresents in Crespo et al. (2013a) about 14% of overallcitation inequality increases 4 percentage points, up to17.95%, in the fractional case. Third, interestingly enoughthe IDCP’ term in the multiplicative case represents 18.1%of overall citation inequality, a figure remarkably close to thecorresponding one in the fractional case.

1It should be noted that, due to some missing variables, this data set hasonly 4,465,348 articles, or 6,984 articles fewer than the data set in Crespoet al. (2013a). Because of this slight change, overall citation inequality is0.8644 rather than 0.8755 as in Crespo et al. (2013a).

2As in Crespo et al. (2013a), in the definition of the inequality index I1

in expressions (3) and (4), we have followed the convention 0 log(0) = 0 forarticles without citations.

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014 1247DOI: 10.1002/asi

Normalization Procedures: The Fractional Case

This section analyzes two empirical problems in the frac-tional case: (a) how to compare the citations received by twoarticles in any pair of the 219 subfields in our data set by usingERs that are approximately constant over a large quantileinterval, and (b) how much the IDCP term is reduced whenthese ERs, or the field mean citations are used as normaliza-tion factors. The robustness of these results in the multipli-cative approach is studied in the following main section.

Comparison of Citation Counts Across Different Fields

For any s, what we call the exchange rates at quantile π,es(π), defined by:

es sπ μ μπ π( ) = ,

allow us to answer the following question: how many cita-tions for an article at the degree π of scientific influence insubfield s are equivalent on average to one citation in theall-fields case? In the metaphor according to which a sub-field’s citation distribution is like an income distribution in agiven currency, the exchange rates es(π) permit expressingall citations in the same reference currency for that π.

Naturally, if for many fields es(π) were to drastically varywith π, then we might not be able to claim that differences incitation practices have a common element that can be pre-cisely estimated. However, it has been established that theshapes of subfield citation distributions are highly skewedand, what is more important for our purposes, very similarindeed.3 As we will presently see, the similarity betweensubfield citation distributions imply that exchange rates aresufficiently constant over a wide range of quantiles.

Figure 1 represents how the effect of differences in cita-tion practices, measured by I(π), changes with π whenΠ = 1,000 (since I(π) is very high for π < 260, for claritythese quantiles are omitted from Figure 1). It is observedthat I(π) is particularly high until π ) 600, as well as for afew quantiles at the very upper tail of citation distributions.However, as in Crespo et al. (2013a) I(π) is rather similar fora wide range of intermediate values, indicating that, overthat interval, subfield citation distributions behave as if they

3In particular, in the fractional case, on average over the 219 subfields68.3% of all articles (with an SD of 3.4) receive citations below the mean,and account for 21.5% (4.2) of all citations, while articles with a remarkableor outstanding number of citations represent 10.2% (1.6) of the total, andaccount for 44.7% (3.9) of all citations (see Herranz & Ruiz-Castillo,2012).

TABLE 1. Citation inequality decomposition at the subfield level.

A. Fractional case Within-group term, W Skewness of science term, S IDCP term Overall inequality Percentages in %:(1) (2) (3) (4) (1)/(4) (2)/(4) (3)/(4)

0.0030 0.7062 0.1552 0.8644 0.35 81.70 17.95B. Multiplicative case W’ S’ IDCP’ Overall inequality Percentages in %:

(1) (2) (3) (4) (1)/(4) (2)/(4) (3)/(4)0.0030 0.6950 0.1544 0.8524 0.35 81.54 18.11

FIG. 1. Citation inequality due to differences in citation practices, I(π) as a function of π. [Color figure can be viewed in the online issue, which is availableat wileyonlinelibrary.com.]

1248 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014DOI: 10.1002/asi

essentially differ by a scale factor. In this situation, for eachs it is reasonable to define an average-based ER over someinterval [πm, πM] in that range as:

ER esM

m s= −( ) ( )[ ][1 π π ππΣ , (5)

where, for each π,

es sπ μ μπ π( ) = .

We find that the choice [πm, πM] = [661, 978]—where I(π)for most π is equal to I(πm) = 0.1356 and I(πM) = 0.1392—isa good one. The ERs, as well as the SD, and the coefficientof variation (CV) are in columns 1 to 3 in Table 2. Forconvenience, ERs are multiplied by 10. Thus, for example,the first row indicates that 10.3 citations with an SD of 0.3for an article in biology between, approximately, the 66st andthe 98th percentile of its citation distribution, are equivalentto 10 citations for an article in that interval in the all-sciencescase. We find it useful to divide fields into four groupsaccording to the CV. Group I (colored in dark green inTable 2), consisting of 69 subfields, has a CV smaller than orequal to 0.05. This means that the SD of the ER is less thanor equal to 5% of the ER itself. Hence, we consider ERs inthis group as highly reliable. Group II (pale green), consist-ing of 118 subfields, has a CV between 0.05 and 0.10. Weconsider ERs in this group as fairly reliable. Group III(orange), consists of 22 subfields, has a CV between 0.10and 0.15. This group includes some important subfields,such as physics, particles and fields; information and libraryscience, and political science (subfields 97, 210, and 189), aswell as seven out of eight subfields within the broad fieldcomputer science (the exception is mathematical and com-putational biology) that is known to behave as an outlier(Crespo et al., 2013a; Herranz & Ruiz-Castillo, 2012). Somewould find ERs in this group as minimally reliable, whereasothers will find them quite unreliable. Finally, Group IV(red), consisting of nine subfields, has a CV greater than0.15. This group includes multidisciplinary sciences andphysics, multidisciplinary, hybrid subfields some of whichalso behave badly in Radicchi and Castellano (2012a). ERsin this group can be considered unreliable.

As observed in column 4 in Table 2, on average the [661,978] interval includes 62.2% of all citations (with an SD of3.0). Although this is a relatively large percentage, expand-ing the interval in either direction would bring a largerpercentage of citations. It turns out that, when we do this, theERs do not change much. However, they exhibit greatervariability (see the details in Crespo et al., 2013b). There-fore, we retain the interval [661, 978] in the sequel.

Normalization Results

First, we want to assess the normalization procedurebased on ERs whereby the citations received by any articlei in subfield s, csi, are converted into normalized citations csi*as follows: ci* = csi/ERs. The numerical results before and

after this normalization are in Panels A and B in Table 3. Asin Crespo et al. (2013a), the terms W and S remain essen-tially constant after normalization by the ERs. In absoluteterms the IDCP term is reduced from 0.1552 to 0.0293, an81.1% difference. Of course, total citation inequality afternormalization is also reduced. On balance, the IDCP termafter normalization only represents 3.85% of total citationinequality—an important reduction from the 17.95% withthe raw data.

However, it should be recognized that in the last 22quantiles and, above all, in the [1, 660] interval normaliza-tion results quickly deteriorate. Figure 2, which focuses onthe product vπ I(π) as a function of π, illustrates the situation.Of course, the term IDCP introduced in expression (3) isequal to the integral of this expression (for clarity, quantilesπ < 600, and π > 994, are omitted from Figure 2). Relative tothe blue curve, the red curve illustrates the correctionachieved by normalization with the 219 ERs: the size ofthe IDCP term is very much reduced, particularly in the[661, 978] interval.

Finally, as in Crespo et al. (2013a) it is interesting toexamine the consequences of the traditional procedure inwhich subfield mean citations are taken as normalizationfactors. The ERs based on mean citations, es(μs) = μs/μ (seecolumn 5 in Table 2) are close to our own ERs (for anillustration, see Figure 3 in Crespo et al. 2013b). As a matterof fact, they are between 1 SD of the ERs for 50 out of 69subfields in Group I, 102 out of 118 in Group II, 22 out of 23in Group III, and in all nine cases in Group IV. Whensubfield mean citations are used as normalization factors, theIDCP term only represents 3.45% of total citation inequality(see Panel C in Table 3). The two solutions are so near thatwe refrain from illustrating the latter in Figure 2 because itwill be indistinguishable from the red curve after normal-ization by our ERs.4

The similarity between the results of the two normaliza-tion procedures lies in the fact that, as we have seen inFigure 1, subfield citation distributions appear to differ by aset of scale factors only in the [660, 978] interval. Thesescale factors are well captured by any average-basedmeasure of what takes place in that interval—such as ourERs. However, as indicated in footnote 3, subfield meancitations in the fractional approach, μs, are reached, onaverage, at the 68.3 percentile with an SD of 3.4, that is,within the [661, 978] interval. This is the reason why theERs based on mean citations also work so well.

Normalization Procedures: TheMultiplicative Case

The information about the evolution of I(π) as a functionof π (available on request) as well as the aim of facilitatingthe comparison with the fractional case justifies the same

4This confirms the good results obtained in Crespo et al. (2013a),Radicchi and Castellano (2012a), and Li et al. (2013) when using subfieldmean citations as normalization factors.

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014 1249DOI: 10.1002/asi

TABLE 2. Exchange rates, standard deviations, and coefficients of variation for the [661, 978] interval in the fractional approach. [Color figure can beviewed in the online issue, which is available at wileyonlinelibrary.com.]

Exchangerates

Standarddeviation

Coefficientof variation

% ofcitations

Exch. rates basedon mean citations

(1) (2) (3) (4) (5)

A. LIFE SCIENCESI. BIOSCIENCES

1 BIOLOGY 10.3 0.3 0.032 64.1 9.82 BIOLOGY, MISCELLANEOUS 5.0 0.3 0.063 65.4 4.63 EVOLUTIONARY BIOLOGY 16.1 1.8 0.109 56.3 16.44 BIOCHEMICAL RESEARCH METHODS 11.5 0.7 0.060 52.9 12.85 BIOCHEMISTRY & MOLECULAR BIOLOGY 20.6 0.5 0.023 58.2 21.26 BIOPHYSICS 14.0 0.7 0.053 58.7 14.17 CELL BIOLOGY 26.9 0.9 0.032 60.3 27.38 GENETICS & HEREDITY 19.4 0.4 0.022 57.7 20.59 DEVELOPMENTAL BIOLOGY 23.4 0.4 0.016 59.0 24.0

II. BIOMEDICAL RESEARCH10 PATHOLOGY 11.8 0.3 0.023 62.3 11.511 ANATOMY & MORPHOLOGY 7.7 0.5 0.066 60.9 7.412 ENGINEERING, BIOMEDICAL 9.5 0.5 0.053 61.3 9.113 BIOTECHNOLOGY & APPLIED MICROBIOLOGY 11.5 0.3 0.024 58.0 11.914 MEDICAL LABORATORY TECHNOLOGY 8.1 0.3 0.031 62.0 7.915 MICROSCOPY 8.6 0.7 0.077 60.8 8.316 PHARMACOLOGY & PHARMACY 10.6 0.5 0.046 60.0 10.517 TOXICOLOGY 9.7 0.7 0.071 58.9 9.618 PHYSIOLOGY 14.0 1.4 0.102 59.4 13.519 MEDICINE, RESEARCH & EXPERIMENTAL 15.4 2.6 0.171 61.2 16.5

III. CLINICAL MEDICINE I (INTERNAL)20 CARDIAC & CARDIOVASCULAR SYSTEMS 14.9 1.0 0.070 61.6 15.121 RESPIRATORY SYSTEM 13.7 0.7 0.051 60.6 13.422 ENDOCRINOLOGY & METABOLISM 16.9 1.1 0.066 58.3 16.923 ANESTHESIOLOGY 9.2 0.3 0.037 62.8 8.824 CRITICAL CARE MEDICINE 14.8 0.5 0.036 61.9 14.225 EMERGENCY MEDICINE 5.8 0.3 0.050 62.8 5.526 GASTROENTEROLOGY & HEPATOLOGY 13.5 0.3 0.022 60.1 13.627 MEDICINE, GENERAL & INTERNAL 12.0 4.9 0.405 52.1 16.728 TROPICAL MEDICINE 7.2 0.5 0.074 62.1 6.829 HEMATOLOGY 22.2 0.3 0.014 60.2 22.330 ONCOLOGY 18.0 0.6 0.031 58.6 18.331 ALLERGY 12.2 0.5 0.038 63.1 11.532 IMMUNOLOGY 17.8 0.3 0.017 59.0 18.333 INFECTIOUS DISEASES 15.4 1.0 0.068 59.6 15.1

IV. CLINICAL MEDICINE II (NON-INTERNAL)34 GERIATRICS & GERONTOLOGY 11.2 0.6 0.051 60.9 10.935 OBSTETRICS & GYNECOLOGY 9.2 0.4 0.044 62.3 8.836 ANDROLOGY 7.3 0.5 0.068 60.3 7.137 REPRODUCTIVE BIOLOGY 12.5 1.1 0.089 59.0 12.338 GERONTOLOGY 10.2 0.5 0.049 62.7 9.639 DENTISTRY & ORAL SURGERY 7.2 0.6 0.077 60.6 6.940 DERMATOLOGY 8.2 0.3 0.038 62.1 7.941 UROLOGY & NEPHROLOGY 12.3 0.3 0.025 61.6 12.042 OTORHINOLARYNGOLOGY 6.0 0.4 0.069 62.5 5.643 OPHTHALMOLOGY 9.5 0.3 0.034 61.7 9.244 INTEGRATIVE & COMPLEMENTARY MEDICINE 6.3 0.6 0.097 61.4 5.945 CLINICAL NEUROLOGY 12.4 0.3 0.023 61.3 12.146 PSYCHIATRY 13.1 0.3 0.019 62.0 12.747 RADIOLOGY, NUCLEAR MED. & MED. IMAGING 10.1 0.3 0.026 61.5 9.948 ORTHOPEDICS 7.9 0.3 0.043 61.6 7.649 RHEUMATOLOGY 14.6 0.6 0.041 59.7 14.550 SPORT SCIENCES 8.1 0.5 0.064 62.2 7.751 SURGERY 8.5 0.2 0.028 61.9 8.352 TRANSPLANTATION 9.5 0.2 0.026 61.9 9.253 PERIPHERAL VASCULAR DISEASE 20.2 0.3 0.013 59.8 20.454 PEDIATRICS 7.7 0.3 0.035 62.1 7.5

1250 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014DOI: 10.1002/asi

TABLE 2. (Continued)

Exchangerates

Standarddeviation

Coefficientof variation

% ofcitations

Exch. rates basedon mean citations

(1) (2) (3) (4) (5)

V. CLINICAL MEDICINE III55 HEALTH CARE SCIENCES & SERVICES 7.9 0.5 0.061 60.3 7.756 HEALTH POLICY & SERVICES 8.4 0.4 0.042 59.3 8.557 MEDICINE, LEGAL 5.8 0.4 0.072 60.5 5.658 NURSING 4.3 0.4 0.090 61.9 4.159 PUBLIC, ENV. & OCCUPATIONAL HEALTH 9.7 0.3 0.034 60.8 9.560 REHABILITATION 5.9 0.4 0.065 62.2 5.661 SUBSTANCE ABUSE 9.8 0.9 0.096 59.2 9.662 EDUCATION, SCIENTIFIC DISCIPLINES 4.0 0.3 0.068 64.9 3.763 MEDICAL INFORMATICS 5.7 0.3 0.045 62.9 5.5

VI. NEUROSCIENCES & BEHAVIORAL64 NEUROIMAGING 14.6 0.4 0.025 63.1 14.065 NEUROSCIENCES 16.9 0.5 0.031 59.6 16.966 BEHAVIORAL SCIENCES 11.5 1.4 0.119 56.0 11.767 PSYCHOLOGY, BIOLOGICAL 9.9 0.9 0.086 56.9 10.168 PSYCHOLOGY 10.3 0.7 0.068 60.6 9.969 PSYCHOLOGY, APPLIED 6.4 0.4 0.070 62.4 6.070 PSYCHOLOGY, CLINICAL 9.9 0.4 0.042 60.6 9.771 PSYCHOLOGY, DEVELOPMENTAL 10.6 0.5 0.051 60.8 10.272 PSYCHOLOGY, EDUCATIONAL 6.8 0.3 0.040 64.2 6.573 PSYCHOLOGY, EXPERIMENTAL 10.2 0.5 0.046 61.2 9.974 PSYCHOLOGY, MATHEMATICAL 6.9 0.3 0.038 61.3 6.875 PSYCHOLOGY, MULTIDISCIPLINARY 6.2 0.5 0.087 63.3 6.276 PSYCHOLOGY, PSYCHOANALYSIS 3.7 0.4 0.106 67.8 3.477 PSYCHOLOGY, SOCIAL 8.3 0.3 0.032 61.5 8.278 SOCIAL SCIENCES, BIOMEDICAL 7.2 0.3 0.047 61.2 7.0

B. PHYSICAL SCIENCESVII. CHEMISTRY

79 CHEMISTRY, MULTIDISCIPLINARY 11.9 1.2 0.103 65.4 11.580 CHEMISTRY, INORGANIC & NUCLEAR 9.2 0.7 0.074 61.4 8.881 CHEMISTRY, ANALYTICAL 9.9 0.4 0.044 60.5 9.782 CHEMISTRY, APPLIED 7.6 0.5 0.070 62.3 7.283 ENGINEERING, CHEMICAL 6.0 0.3 0.044 63.7 5.784 CHEMISTRY, MEDICINAL 9.8 0.8 0.083 59.4 9.685 CHEMISTRY, ORGANIC 10.7 1.0 0.096 59.3 10.486 CHEMISTRY, PHYSICAL 10.5 0.5 0.047 60.5 10.387 ELECTROCHEMISTRY 10.2 0.8 0.076 60.4 9.988 POLYMER SCIENCE 8.2 0.3 0.031 61.4 8.1

VIII. PHYSICS89 PHYSICS, MULTIDISCIPLINARY 10.0 1.7 0.169 61.8 10.590 SPECTROSCOPY 7.6 0.4 0.050 62.1 7.391 ACOUSTICS 5.5 0.3 0.055 63.3 5.292 OPTICS 7.3 0.3 0.036 62.7 7.093 PHYSICS, APPLIED 7.5 0.4 0.048 60.7 7.694 PHYSICS, ATOMIC, MOLECULAR & CHEMICAL 11.0 0.8 0.074 59.8 10.795 THERMODYNAMICS 4.8 0.4 0.080 61.6 4.696 PHYSICS, MATHEMATICAL 7.3 0.3 0.035 61.7 7.297 PHYSICS, NUCLEAR 6.2 0.4 0.065 62.0 6.298 PHYSICS, PARTICLES & SUB-FIELDS 10.8 1.1 0.102 59.8 11.499 PHYSICS, CONDENSED MATTER 7.4 0.3 0.045 61.4 7.4

100 PHYSICS OF SOLIDS, FLUIDS & PLASMAS 9.3 0.6 0.063 59.8 9.1101 CRYSTALLOGRAPHY 5.1 0.3 0.053 58.8 5.2

IX. SPACE SCIENCES102 ASTRONOMY & ASTROPHYSICS 14.8 0.3 0.018 60.6 14.8

X. MATHEMATICS103 MATHEMATICS, APPLIED 3.9 0.2 0.062 65.7 3.6104 STATISTICS & PROBABILITY 5.2 0.5 0.098 52.5 6.2105 MATH., INTERDISCIPLINARY APPLICATIONS 5.6 0.3 0.045 60.8 5.6

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014 1251DOI: 10.1002/asi

TABLE 2. (Continued)

Exchangerates

Standarddeviation

Coefficientof variation

% ofcitations

Exch. rates basedon mean citations

(1) (2) (3) (4) (5)

106 SOCIAL SCIENCES, MATHEMATICAL METHODS 5.5 0.3 0.045 61.4 5.5107 PURE MATHEMATICS 2.8 0.2 0.087 66.4 2.6

XI. COMPUTER SCIENCE108 COMP. SCIENCE, ARTIFICIAL INTELLIGENCE 5.4 0.6 0.118 63.3 5.4109 COMPUTER SCIENCE, CYBERNETICS 3.6 0.4 0.108 66.7 3.4110 COMP. SCIENCE, HARDWARE & ARCHITECTURE 4.0 0.5 0.124 61.4 4.1111 COMPUTER SCIENCE, INFORMATION SYSTEMS 4.4 0.6 0.143 62.4 4.5112 COMP. SC., INTERDISCIPLINARY APPLICATIONS 5.5 0.6 0.102 58.1 6.0113 COMP. SCIENCE, SOFTWARE ENGINEERING 3.6 0.4 0.107 65.5 3.4114 COMPUTER SCIENCE, THEORY & METHODS 3.1 0.4 0.115 65.5 3.0115 MATHEMATICAL & COMPUTATIONAL BIOLOGY 9.8 0.4 0.044 52.9 11.4

C. OTHER NATURAL SCIENCESXII. ENGINEERING

116 ENGINEERING, ELECTRICAL & ELECTRONIC 4.7 0.4 0.077 63.1 4.6117 TELECOMMUNICATIONS 3.8 0.5 0.144 62.2 3.9118 CONSTRUCTION & BUILDING TECHNOLOGY 3.5 0.3 0.090 65.4 3.1119 ENGINEERING, CIVIL 3.4 0.3 0.086 67.0 3.1120 ENGINEERING, ENVIRONMENTAL 9.1 0.3 0.035 62.4 8.7121 ENGINEERING, MARINE 1.6 0.3 0.212 71.5 1.4122 TRANSPORTATION SCIENCE & TECHNOLOGY 2.1 0.5 0.227 69.9 2.0123 ENGINEERING, INDUSTRIAL 3.3 0.3 0.091 66.6 2.9124 ENGINEERING, MANUFACTURING 3.6 0.3 0.089 64.8 3.2125 ENGINEERING, MECHANICAL 3.9 0.2 0.060 63.7 3.7126 MECHANICS 5.2 0.3 0.050 63.8 4.9127 ROBOTICS 3.8 0.2 0.065 65.0 3.6128 INSTRUMENTS & INSTRUMENTATION 5.1 0.3 0.051 65.0 4.7129 IMAGING SCIENCE & PHOTOGR. TECHNOLOGY 7.4 0.4 0.061 64.6 7.0130 ENERGY & FUELS 5.0 0.3 0.064 64.9 4.7131 NUCLEAR SCIENCE & TECHNOLOGY 4.4 0.3 0.061 64.0 4.1132 ENGINEERING, PETROLEUM 1.7 0.4 0.255 73.5 1.5133 AUTOMATION & CONTROL SYSTEMS 4.1 0.2 0.059 63.8 3.9134 ENGINEERING, MULTIDISCIPLINARY 3.9 0.4 0.089 66.0 3.7135 ERGONOMICS 4.8 0.4 0.088 63.0 4.4136 OPERATIONS RES. & MANAGEMENT SCIENCE 4.1 0.2 0.060 63.6 3.8

XIII. MATERIALS SCIENCE137 MATERIALS SCIENCE, MULTIDISCIPLINARY 6.4 0.4 0.056 60.7 6.4138 MATERIALS SCIENCE, BIOMATERIALS 13.0 1.1 0.085 59.3 12.7139 MATERIALS SCIENCE, CERAMICS 4.7 0.3 0.074 68.3 4.2140 MAT. SC., CHARACTERIZATION & TESTING 2.2 0.4 0.167 70.6 2.0141 MATERIALS SCIENCE, COATINGS & FILMS 7.5 0.4 0.057 61.0 7.3142 MATERIALS SCIENCE, COMPOSITES 3.4 0.3 0.087 65.9 3.1143 MATERIALS SCIENCE, PAPER & WOOD 2.9 0.3 0.092 68.1 2.6144 MATERIALS SCIENCE, TEXTILES 2.9 0.3 0.095 65.5 2.7145 METALL. & METALLURGICAL ENGINEERING 4.7 0.4 0.089 63.5 4.7146 NANOSCIENCE & NANOTECHNOLOGY 8.0 0.3 0.036 60.0 8.1

XIV. GEOSCIENCES147 GEOCHEMISTRY & GEOPHYSICS 9.7 0.6 0.066 61.5 9.3148 GEOGRAPHY, PHYSICAL 9.1 0.9 0.097 59.8 8.8149 GEOLOGY 8.0 0.5 0.061 62.4 7.5150 ENGINEERING, GEOLOGICAL 3.8 0.3 0.093 62.1 3.6151 PALEONTOLOGY 6.5 0.4 0.057 63.7 6.1152 REMOTE SENSING 7.8 0.3 0.037 60.8 7.8153 OCEANOGRAPHY 10.1 1.0 0.101 61.6 9.5154 ENGINEERING, OCEAN 3.6 0.4 0.106 66.7 3.4155 METEOROLOGY & ATMOSPHERIC SCIENCES 10.9 0.5 0.047 61.3 10.5156 ENGINEERING, AEROSPACE 2.5 0.2 0.095 68.4 2.2157 MINERALOGY 6.9 0.4 0.060 61.4 6.6158 MINING & MINERAL PROCESSING 4.0 0.3 0.069 65.5 3.7159 GEOSCIENCES, MULTIDISCIPLINARY 7.3 0.4 0.055 62.7 6.9

1252 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014DOI: 10.1002/asi

TABLE 2. (Continued)

Exchangerates

Standarddeviation

Coefficientof variation

% ofcitations

Exch. rates basedon mean citations

(1) (2) (3) (4) (5)

XV. AGRICULTURAL & ENVIRONMENT160 AGRICULTURAL ENGINEERING 5.0 0.4 0.073 61.6 4.7161 AGRICULTURE, MULTIDISCIPLINARY 6.8 0.3 0.045 63.8 6.6162 AGRONOMY 5.8 0.3 0.050 62.9 5.5163 LIMNOLOGY 9.7 0.8 0.078 60.8 9.3164 SOIL SCIENCE 6.9 0.5 0.072 62.5 6.5165 BIODIVERSITY CONSERVATION 8.8 0.4 0.046 62.1 8.5166 ENVIRONMENTAL SCIENCES 8.9 0.5 0.056 60.1 8.8167 ENVIRONMENTAL STUDIES 5.0 0.4 0.072 61.4 4.8168 FOOD SCIENCE & TECHNOLOGY 7.1 0.5 0.075 61.9 6.7169 NUTRITION & DIETETICS 11.4 0.4 0.037 61.3 11.1170 AGRICULTURE, DAIRY & ANIMAL SCIENCE 5.4 0.3 0.051 66.5 4.9171 HORTICULTURE 6.0 0.3 0.045 62.9 5.8

XVI. BIOLOGY (ORGANISMIC ANDSUPRAORGONISMIC LEVEL)

172 ORNITHOLOGY 5.5 0.5 0.082 59.7 5.4173 ZOOLOGY 7.5 0.5 0.068 61.8 7.1174 ENTOMOLOGY 5.5 0.4 0.071 62.9 5.1175 WATER RESOURCES 6.3 0.5 0.075 61.7 5.9176 FISHERIES 7.1 0.8 0.115 59.3 6.9177 MARINE & FRESHWATER BIOLOGY 8.2 0.9 0.115 59.2 7.9178 MICROBIOLOGY 14.3 1.1 0.077 59.3 14.0179 PARASITOLOGY 8.1 0.6 0.070 59.6 8.0180 VIROLOGY 18.8 1.6 0.083 57.7 18.9181 FORESTRY 7.2 0.6 0.089 60.0 7.0182 MYCOLOGY 6.8 0.3 0.046 62.1 6.5183 PLANT SCIENCES 9.6 0.3 0.029 60.1 9.8184 ECOLOGY 11.4 1.0 0.087 59.7 11.0185 VETERINARY SCIENCES 5.2 0.3 0.056 65.9 4.8

XVII. MULTIDISCIPLINARY186 MULTIDISCIPLINARY SCIENCES 4.0 0.6 0.158 64.3 4.0

D. SOCIAL SCIENCESXVIII. SOCIAL SCIENCES, GENERAL

187 CRIMINOLOGY & PENOLOGY 4.8 0.3 0.058 66.5 4.4188 LAW 4.3 0.3 0.076 65.1 4.1189 POLITICAL SCIENCE 3.3 0.4 0.119 65.5 3.2190 PUBLIC ADMINISTRATION 3.6 0.3 0.075 66.2 3.3191 ETHNIC STUDIES 2.5 0.3 0.115 65.7 2.4192 FAMILY STUDIES 5.7 0.3 0.057 62.1 5.5193 SOCIAL ISSUES 3.4 0.3 0.091 64.4 3.3194 SOCIAL WORK 3.9 0.3 0.078 63.2 3.7195 SOCIOLOGY 4.2 0.3 0.065 65.6 3.9196 WOMEN’S STUDIES 4.1 0.2 0.061 63.8 3.8197 EDUCATION & EDUCATIONAL RESEARCH 3.3 0.3 0.085 64.6 3.1198 EDUCATION, SPECIAL 5.0 0.3 0.065 62.7 4.7199 AREA STUDIES 1.9 0.3 0.157 67.0 1.8200 GEOGRAPHY 5.8 0.3 0.057 60.5 5.7201 PLANNING & DEVELOPMENT 4.4 0.3 0.059 61.3 4.4202 TRANSPORTATION 5.3 0.4 0.079 61.8 5.0203 URBAN STUDIES 4.4 0.3 0.068 61.7 4.2204 ETHICS 3.3 0.3 0.092 65.6 3.0205 MEDICAL ETHICS 5.2 0.4 0.075 62.1 4.9206 ANTHROPOLOGY 4.4 0.3 0.074 66.3 4.1207 COMMUNICATION 4.6 0.3 0.060 64.1 4.3208 DEMOGRAPHY 5.5 0.3 0.053 61.8 5.3209 HISTORY OF SOCIAL SCIENCES 2.1 0.3 0.140 69.2 1.8210 INFORMATION SCIENCE & LIBRARY SCIENCE 4.1 0.4 0.103 65.2 3.9211 INTERNATIONAL RELATIONS 2.9 0.4 0.134 65.4 2.8212 LINGUISTICS 6.1 0.3 0.049 63.0 5.8213 SOCIAL SCIENCES, INTERDISCIPLINARY 3.6 0.4 0.100 66.7 3.3

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014 1253DOI: 10.1002/asi

TABLE 3. Citation inequality decomposition at the sub-field level in the fractional case.

Quantiles Within-group Skew. of sc. IDCP Total citation Percentages in %:

term, W term, S term inequality (1)/(4) (2)/(4) (3)/(4)

(1) (2) (3) (4) (5) (6) (7)

A. Raw data 1,000 0.0030 0.7062 0.1552 0.8644 0.35 81.70 17.95[1, 660] 0.0463 5.36[661, 978] 0.0750 8.68[979, 1000] 0.0338 3.91

B. Sub-field ERNormalization

1,000 0.0032 0.7301 0.0293 0.7627 0.42 95.73 3.85[1, 660] 0.0162 2.13[661, 978] 0.0027 0.35[979, 1000] 0.0104 1.37

C. Sub-field MeanNormalization

1,000 0.0030 0.7240 0.0260 0.7531 0.40 96.14 3.45[1, 660] 0.0168 2.23[661, 978] 0.0026 0.35[979, 1000] 0.0066 0.87

FIG. 2. Weighted citation inequality due to differences in citation practices, vπI(π), as a function of π. Raw (blue line) versus normalized (red line) data.[Color figure can be viewed in the online issue, which is available at wileyonlinelibrary.com.]

TABLE 2. (Continued)

Exchangerates

Standarddeviation

Coefficientof variation

% ofcitations

Exch. rates basedon mean citations

(1) (2) (3) (4) (5)

XIX. ECONOMICS & BUSINESS214 AGRICULTURAL ECONOMICS & POLICY 3.8 0.3 0.082 63.9 3.5215 ECONOMICS 4.6 0.3 0.074 61.9 4.6216 INDUSTRIAL RELATIONS & LABOR 4.6 0.4 0.086 63.3 4.2217 BUSINESS 6.7 0.3 0.047 64.0 6.4218 BUSINESS, FINANCE 6.3 0.5 0.087 63.6 6.2219 MANAGEMENT 6.4 0.4 0.055 63.5 6.2

Mean 0.071 62.2SD 0.043 3.0

1254 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014DOI: 10.1002/asi

choice as before: [πm, πM] = [661, 978]. On average, thepercentage of citations covered in this interval is 62.3%(with an SD equal to 3.0). As far as the rest of results isconcerned, the following four points should be noted.

First, Groups I, II, III, and IV consist now of 77, 113, 19,and 10 subfields (see Columns 1 to 3 in Table 4 in Crespoet al., 2013b)—figures that slightly improve on thoseobtained in the fractional case. Second, the normalizationusing our own ERs or those based on subfield mean citationsreduces the IDCP term to 3.57% and 3.27%, respectively(see Table 5 in Crespo et al., 2013b). Thus, in both casesnormalization results slightly improve what was obtainedunder the fractional approach. Third, it should be empha-sized that the success of our empirical strategy in the mul-tiplicative case is again based on the similarity of the shapesof subfield citation distributions.5 Fourth, the results in thefractional and the multiplicative cases are extremely similar:except for two subfields, the multiplicative ERs are alwayswithin 1 SD of the fractional ones (for an illustration, seeFigure 4 in Crespo et al., 2013b). As indicated in Herranzand Ruiz-Castillo (2012), the similarity of the citation char-acteristics of articles published in journals assigned to one orseveral subfields guarantees that choosing one of the twostrategies may not lead to a radically different picture inpractical applications.

Conclusions

The lessons that can be drawn from this article can besummarized in the following four points:

1. As expected, the relative importance of the citationinequality attributable to differences in citation practicesis greater at lower aggregation levels. In particular, theIDCP term that represents about 14% of overall citationinequality in the case of 22 broad fields (Crespo et al.,2013a), represents ∼18% with the 219 subfields identifiedwith the Web of Science subject-categories distinguishedby Thomson Reuters.

2. The regularities found in Crespo et al. (2013a) for 22fields characterize also the subfield level studied in thispaper. The citation inequality attributable to differencesin citation practices is very high and variable for both along lower tail—consisting of uncited and poorly citedarticles below the mean—and a small number of quan-tiles at the very upper tail of citation distributionswhere citation excellence possibly resides. However, theIDCP term remains relatively constant for a wide rangeof intermediate quantiles. This constancy reflects thefact that, approximately, citation distributions over thatrange behave as if they differ only by a scale factor. Thisallows us to estimate a set of ERs to express the citation

counts of articles in that interval into the equivalentcounts in the all-sciences case. For example, in the frac-tional case we find that in 187 out of 219 subfields, or85% of the total, the ERs have a tolerably low CV, thatis, a CV smaller than or equal to 0.10. The ERs areestimated over a [660, 978] interval that, on average,covers about 62% of all citations in each subfield.

3. The normalization of the raw data using the ERs as nor-malization factors is rather successful: In the fractionalcase, we find that the IDCP term at the subfield level isreduced from 18% to 3.8%, whereas the procedure usingmean citations as normalization factors achieves evenslightly better results. The reason for this coincidence isthat mean citations are essentially located at approxi-mately the 69th percentile of citation distributions, insidethe quantile interval where citation distributions appear todiffer only by a scale factor.

4. Interestingly enough, our results at the lowest aggregatelevel about the ERs and their role as normalization factorsin the fractional case are essentially replicated when weadopt the multiplicative approach.

One limitation of this study is that we cannot take intoaccount possible differences in citation practices within sub-fields. For example, large differences between basic andclinical research areas within medical Web of Sciencesubject-categories have been recently revealed in Van Eck,Waltman, Van Raan, Klautz, and Peul (2012). Naturally, ourmethods can be applied to future classification systems con-sisting of more homogeneous subfields than the Web ofScience constructs available to us in this paper.

Among the possible extensions of our work, we willcomment on three. First, as pointed out in Crespo et al.(2013a), because the citation process evolves at a differentvelocity in different scientific domains, using variable cita-tion windows to ensure that the process has reached a similarstage in all domains should improve the comparability ofcitation distributions at the lower tail. Second, we should testour results on the estimation of ERs and normalization ina statistical framework using, for example, a bootstrapapproach. Third, as indicated in the Introduction, in a com-panion paper, Li et al. (2013) study by how much the IDCPterm is reduced when using a number of alternative normal-ization procedures that includes the two-parameter schemeadvocated by Radicchi and Castellano (2012a).

It should be concluded that the striking similarity ofcitation distributions at different aggregate levels seems toprovide a firm basis for the solution of the following twocrucial practical problems: the comparison of citation countsacross different scientific disciplines and the normalizationof the raw citation data before aggregating closely relatedbut heterogeneous subfields into larger categories, or beforeaggregating all subfields in the all-sciences case.

Acknowledgments

The authors acknowledge financial support by SantanderUniversities Global Division of Banco Santander. Crespo andRuiz-Castillo also acknowledge financial help from the

5On average, over the 219 subfields 68.6% of all articles (with an SD of3.7) receive citations below the mean, and account for 21.1% (5.0) of allcitations, while articles with a remarkable or outstanding number of cita-tions represent 10.2% (1.6) of the total, and account for 44.9% (4.6) of allcitations (see Albarrán et al., 2011).

JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014 1255DOI: 10.1002/asi

Spanish MEC through grants SEJ2007-67436 and ECO2011-29762. Conversations with Pedro Albarrán are deeply appre-ciated. All shortcomings are the authors’ responsibility.

References

Albarrán, P., & Ruiz-Castillo, J. (2011). References made and citationsreceived by scientific articles. Journal of the American Society for Infor-mation Science and Technology, 62, 40–49.

Albarrán, P., Crespo J., Ortuño I., & Ruiz-Castillo, J. (2011). The skewnessof science in 219 sub-fields and a number of aggregates. Scientometrics,88, 385–397.

Braun, T., Glänzel W., & Schubert, A. (1985). Scientometrics indicators. A32 country comparison of publication productivity and citation impact.Singapore, Philadelphia: World Scientific Publishing.

Crespo, J.A., Li, Y., & Ruiz-Castillo, J. (2013a). The measurement of theeffect on citation inequality of differences in citation practices acrossscientific fields. PLOS ONE, 7(3), e33833.

Crespo, J.A., Li, Y., Herranz, N., & Ruiz-Castillo, J. (2013b). The effect oncitation inequality of differences in citation practices at the web ofscience subject category level. Working Paper, 13-03, Universidad CarlosIII (http://hdl.handle.net/10016/16327).

Garfield, E. (1979). Citation indexing: Its theory and applications inscience, technology, and humanities. New York: Wiley.

Glänzel, W. (2011). The application of characteristic scores and scales tothe evaluation and ranking of scientific journals. Journal of InformationScience, 37, 40–48.

Glänzel, W., Schubert, A., Thijs, B., & Debackere, K. (2011). A priori vs.a posteriori normalization of citation indicators. The case of journalranking. Scientometrics, 87, 415–424.

Herranz, N., & Ruiz-Castillo, J. (2012). Multiplicative and fractional strat-egies when journals are assigned to several sub-fields. Journal of theAmerican Society for Information Science and Technology, 63, 2195–2205.

Leydesdorff, L., & Opthof, T. (2010). Normalization at the field level:Fractional counting of citations. Journal of Informetrics, 4, 644–646.

Leydesdorff, L., Radicchi, F., Bornmann, L., Castellano, C., & de Nooye,W. (2012). Field-normalized impact factors: A comparison of rescalingversus fractionally counted IFs. Journal of the American Society forInformation Science and Technology, [Epub ahead of print].

Li, Y., Radicchi, F., Castellano, C., & Ruiz-Castillo, J. (2013). Quantitativeevaluation of alternative field normalization procedures. Journal ofInformetrics, 7, 746–755.

Moed H.F. (2010). Measuring contextual citation impact of scientific jour-nals. Journal of Informetrics, 4, 265–277.

Moed, H.F., Burger, W.J. Frankfort, J.G., & van Raan, A.F.J. (1985). Theuse of bibliometric data for the measurement of university researchperformance. Research Policy, 14, 131–149.

Moed, H.F., De Bruin, R.E., & van Leeuwen, Th.N. (1995). New biblio-metrics tools for the assessment of national research performance:Database description, overview of indicators, and first applications.Scientometrics, 33, 381–422.

Moed, H.F., & van Raan, A.F.J. (1988). Indicators of research performance.In A.F.J. van Raan (Ed.), Handbook of quantitative studies of science andtechnology (pp. 177–192). Amsterdam: North Holland.

Murugesan, P., & Moravcsik, M.J. (1978). Variation of the nature of citationmeasures with journal and scientific specialties. Journal of theAmerican Society for Information Science and Technology, 105, 17268–17272.

Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates ofscientific publications: Theory, with applications to the literature ofphysics. Information Processing and Management, 12, 297–312.

Radicchi, F., & Castellano, C. (2012a). A reverse engineering approach tothe suppression of citation biases reveals universal properties of citationdistributions. Plos One, 7, e33833, 1–7.

Radicchi, F., & Castellano, C. (2012b). Testing the fairness of citationindicators for comparisons across scientific domains: The case of frac-tional citation counts. Journal of Informetrics, 6, 121–130.

Radicchi, F., Fortunato, S., & Castellano, C. (2008). Universality of citationdistributions: Toward an objective measure of scientific impact. Proceed-ings of the National Academy of the Sciences, 105, 17268–17272.

Schubert, A., & Braun, C. (1986). Relative indicators and relational chartsfor comparative assessment of publication output and citation impact.Scientometrics, 9, 281–291.

Schubert, A., & Braun, T. (1996). Cross-field normalization of sciento-metric indicators. Scientometrics, 36, 311–324.

Schubert, A., Glänzel, W., & Braun, T. (1983). Relative citation rate: A newindicator for measuring the impact of publications. In D. Tomov & L.Dimitrova (Eds.), Proceedings of the First National Conference withInternational Participation in Scientometrics and Linguistics of ScientificText, Varna, Bulgaria.

Schubert, A., Glänzel, W., & Braun, T. (1987). A new methodology forranking scientific institutions. Scientometrics, 12, 267–292.

Schubert, A., Glänzel, W., & Braun, T. (1988). Against absolute methods:Relative scientometric indicators and relational charts as evaluationTools. In A.F.J. van Raan (Ed.), Handbook of quantitative studies ofscience and technology (pp. 137–176).

Van Eck, N.J., Waltman, L., Van Raan, A.F.J., Klautz, R.J.M., & Peul, W.C.(2012). Citation analysis may severely underestimate the impact of clini-cal research as compared to basic research. Centre for Science andTechnology Studies, Leiden University (arXiv:1210.0442).

Vinkler, P. (1986). Evaluation of some methods for the relative assessmentof scientific publications. Scientometrics, 10, 157–177.

Vinkler, P. (2003). Relations of relative scientometric indicators. Sciento-metrics, 58, 687–694.

Waltman, L., & Van Eck, N.J. (2012). Source normalized indicators ofcitation impact: An overview of different approaches and an empiricalcomparison. Scientometrics. arXiv:1208.6122.

Waltman, L., & van Eck, N.J. (2013). A systematic empirical comparison ofdifferent approaches for normalizing citation impact indicators. Mimeo,Centre for Science and Technology Studies, Leiden University(arXiv:1301.4941).

Zitt, M., & Small, H. (2008). Modifying the journal impact factor byfractional citation weighting: The audience factor. Journal of the Ameri-can Society for Information Science and Technology, 59, 1856–1860.

1256 JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY—June 2014DOI: 10.1002/asi