Técnicas estadísticas para la construcción de ... · practitioners in measuring and benchmarking "excellence" across ... Multivariate analysis Step 3. ... Country 1 10 20 30 40

1

Técnicas estadísticas para la construcción de indicatores

compuestos

CONFERENCIA INTERNACIONAL Los indicatores

como Herramienta Estratégica en la Universidad

07-09/03/2013, Valencia, Spain

Michaela Saisana [email protected]

European Commission, Joint Research Centre,

Econometrics and Applied Statistics Unit

2

Outline

Global rankings at the forefront of the policy debate

Overview of two global university rankings (ARWU, THES)

Statistical Coherence Tests

Uncertainty analysis

Sensitivity analysis

Policy Implications

Conclusions

Extra: compensability, tradeoffs

3

Outline






Policy Implications

Conclusions

4

• Definition of the university is broad:

A university – as the name suggests – tends to encompass a broad range of purposes and dimensions, focus and missions difficult to condense into a compact measure

• Still, for reasons of governance, accountability and transparency, there is an increasing interest among policymakers as well as among practitioners in measuring and benchmarking "excellence" across universities.

• The growing mobility of students and researchers has also created a market for these measures among the prospective students and their families.


5

• Global rankings have raised debates and policy responses (EU,

national level):

to improve the positioning of a country within the existing measures,

to create new measures,

to discuss regional performance (e.g. show that USA is well ahead of

Europe in terms of cutting-edge university research)


6

19 48407

727

1,310

4,590

8,090

0

1,000

2,000

3,000

4,000

5,000

6,000

7,000

8,000

9,000

1940 1950 1960 1970 1980 1990 2000 2010 2020

Year

Sc

ho

lar

Go

og

le h

its

on

"u

niv

ers

ity

ra

nk

ing

s"

10-fold increase in

the last 10 years

Guess how many

contain the word

“THES ranking” or

“ARWU ranking”?

20%


7

(p.7): “the role of statistical

indicators has increased over the

last two decades”

(i) more literacy,

(ii) more complexity,

(iii) more information society


8

University rankings are used to judge about

the performance of university systems …

whether intended or not on by their

proponents


9

France:

Creation of 10 centres of HE excellence

The minister of Education set a target to

put at least 10 French universities among

the top 100 in ARWU by 2012

President has put French standing in

these international ranking at the forefront

of the policy debate (Le Monde, 2008).

Italy (0 Uni in the top 100 of the ARWU

ranking seen as failure of the national

educational system).

Spain ( 1 Uni in the top 200 of the ARWU

hailed as a great national achievement)


10

A recent OECD study (Hazelkorn, 2007) shows that

worldwide university leaders are concerned about

ranking systems with consequences on the strategic

and operational decisions they take to improve their

research performance.


11

• These rankings are relevant to today’s discourse on

Higher Education reform in the EU

• Also academics use ARWU

P. Aghion, M. Dewatripont, C. Hoxby, A.

Sapir, A., “Higher aspirations: An agenda

for reforming European universities”

(Bruegel Blueprint Series N.5, 2008).


http://www.bruegel.org/Public/

12

An extreme impact of Global Rankings

What - 2005 THES created a major controversy in

Malaysia: country’s top two universities slipping by

almost 100 places compared to 2004.

Why - change in the ranking methodology (not well

known fact and of limited comfort)

Impact - Royal commission of inquiry to investigate

the matter. A few weeks later, the Vice-Chancellor of

the University of Malaysia stepped down.


13






Policy Implications

Conclusions

14

Criteria Indicator Weight

Quality of

Education

Alumni of an institution winning Nobel

Prizes and Fields Medals

10%

Staff of an institution winning Nobel

Prizes and Fields Medals

20%

Quality of

Faculty Highly cited researchers in 21 broad

subject categories

20%

Articles published in Nature and Science 20% Research

Output Articles in Science Citation Index-

expanded, Social Science Citation Index

20%

Academic

performance

Academic performance with respect to

the size of an institution

10%

PROS and CONS

6 « objective » indicators

Focus on research performance, overlooks other U. missions.

Biased towards hard-science institutions

Favours large institutions

METHODOLOGY

6 indicators

Best performing institution

=100; score of other

institutions calculated as a

percentage

Weighting scheme chosen by

rankers

Linear aggregation of the 6

indicators

Overview - ARWU ranking

15

PROS and CONS

Attempt to take into account teaching quality

Two expert-based indicators: 50% of total (Subjective indicators, lack

of transparency)

yearly changes in methodology

Measures research quantity

METHODOLOGY

6 indicators

z-score calculated for each

indicator; best performing

institution =100; other

institutions are calculated as a

percentage

Weighting scheme: chosen by

rankers

Linear aggregation of the 6

indicators

Criteria Indicator Weight

Academic Opinion: Peer review, 6,354 academics 40%

Research Quality Citations per Faculty: Total citation/ Full Time Equivalent

faculty 20%

Graduate Employability

Recruiter Review: Employers’ opinion, 2,339 recruiters 10%

International Faculty: Percentage of international staff 5% International Outlook

International Students: Percentage of international students 5%

Teaching Quality Student Faculty: Full Time Equivalent faculty/student ratio 20%

Overview - THES ranking

16

1 – Same top10: Harvard,

Cambridge, Princeton, Cal-

tech, MIT and Columbia

2 - Greater variations in

the middle to lower end

of the rankings

3 - Europe is lagging

behind: both ARWU (else

SJTU) and THES rankings

Overview- Comparison (2007)

4 – THES favours UK

universities: all UK

universities below the line

(in red)

17

University rankings- yearly published

+ Very appealing for capturing a university’s multiple missions

in a single number

+ Allow one to situate a given university in the worldwide

context

- Can lead to misleading and/or simplistic policy conclusions

18

Question:

Can we say something about the quality of the

university rankings and the reliability of the results?

19

Step 10. Presentation & dissemination

Step 9. Association with other variables

Step 8. Back to the indicators

Step 7. Robustness & sensitivity

Step 6. Weighting & aggregation

Step 5. Normalisation of data

Step 4. Multivariate analysis

Step 3. Data treatment (missing, outliers)

Step 2. Selection of indicators

Step 1. Development of a conceptual framework

Decalogue for composite indicators Consecutive

steps but with an iterative

nature

2 rounds of consultation with OECD high level statistical committee

Finally endorsed in March 2008

20

Upon request of their developers, almost 60 international composite indicators were assessed by the JRC along the lines of the OECD/JRC Handbook on constructing composite indicators + recent JRC research

(topics ranging from lifelong learning to sustainability, environmental performance, corruption, innovation, poverty, drug consequences)

More information at:

http://composite-indicators.jrc.ec.europa.eu

21






Policy Implications

Conclusions

22

The Stiglitz report (p.65):

[…] a general criticism that is frequently addressed at composite

indicators, i.e. the arbitrary character of the procedures used to weight

their various components.

[…] The problem is not that these weighting procedures are hidden, non-

transparent or non-replicable – they are often very explicitly presented by

the authors of the indices, and this is one of the strengths of this

literature. The problem is rather that their normative implications are

seldom made explicit or justified.

Statistical coherence

23

Y = 0.5 x1+ 0.5 x2

Statistical coherence - Dean’s example

X1: hours of teaching X2: # of publications

Estimated R12 = 0.0759, R2

2 = 0.826, corr(x1, x2) =−0.151,

V(x1) = 116, V(x2) = 614, V(y) = 162

24

To obviate this, the dean substitutes the model

A professor comes by, looks at the last formula, and

complains that publishing is disregarded in the department

…

X1: hours of teaching

X2: number of publications

Statistical coherence - Dean’s example

Y = 0.5 x1+ 0.5 x2

Y = 0.7 x1+ 0.3 x2

with

25

Question:



26


First order sensitivity index

Pearson’s correlation ratio

Smoothed curve

Unconditional variance

Our suggestion: to assess the quality of

a composite indicator using – instead of

Ri2 (Pearson product moment

correlation coefficient of the regression

of y on xi) its non-parametric equivalent

27

Features: • it offers a precise definition of importance, that is ‘the expected reduction in

variance of the CI that would be obtained if a variable could be fixed’;

• it can be used regardless of the degree of correlation between variables;

• it is model-free, in that it can be applied also in non-linear aggregations;

• it is not invasive, in that no changes are made to the CI or to the correlation

structure of the indicators (unlike what we will see next on uncertainty analysis).


Pearson’s correlation ratio

‐ First order effect

‐ Top marginal variance

- Main effect

…

Source: Paruolo, Saisana, Saltelli, 2013, J.Royal Stat. Society A

28

Using these points we can compute a statistics that tells us:

How much (on average) would the variance of the ARWU

scores be reduced if I could fix the variable ‘Papers in

Nature & Science’?

This measure Si shall be our

ruler for ‘importance’;

Si =0.6 I could reduce

the variation of the ARWU

scores by 60% by fixing

‘Papers in Nature &

Science’.


ARWU score

29

One can hence compare the importance of an

indicator as given by the nominal weight

(assigned by developers) with the importance as

measured by the first order effect (Si) to test the

index for coherence.


30

Statistical coherence - ARWU

Si’s are more similar to each other than the nominal weights, i.e. ranging between 0.14 and 0.19 (normalized Si’s to unit sum; CV estimates) when weights should either be 0.10 or 0.20.


31

Statistical coherence - THES

In THES, the combined importance of peer-

review variables (recruiters and academia) appears

larger than stipulated by developers, indirectly

supporting the hypothesis of linguistic bias at

times addressed to this measure. Further, the

teacher/student ratio, a key variable aimed at

capturing the teaching dimension, is much less

important than it should be (normalized Si is 0.09,

nominal weight is 0.20).


32






Policy Implications

Conclusions

33

• Notwithstanding recent attempts to establish good practice in

composite indicator construction (OECD, 2008), “there is no recipe

for building composite indicators that is at the same time universally applicable

and sufficiently detailed” (Cherchye et al., 2007).

• Booysen (2002, p.131) summarises the debate on composite

indicators by noting that “not one single element of the methodology of

composite indexing is above criticism”.

• Andrews et al. (2004)] argue that “many indices rarely have adequate

scientific foundations to support precise rankings: […] typical practice is to

acknowledge uncertainty in the text of the report and then to present a table

with unambiguous rankings”

Uncertainty analysis - Why?

34

Space of alternatives

Including/

excluding variables

Normalisation

Missing data Weights

Aggregation

Country 1

10

20

30

40

50

60

Model averaging: whenever a choice in the composite setting-up

may not be strongly supported or if you may not trust one

single model, we’ll recommend you to use more models

Country 2 Country 3

Uncertainty analysis - How?

35

As a result, an uncertainty analysis should naturally

include a careful mapping of all these uncertainties

onto the space of the output.

Two things can happen:

The space of the

inference is still narrow

enough, so as to be

meaningful.

The space of the

inference is too wide to be

meaningful.

Revise the CI, or

further collect

indicators

GREAT!!!


36

How to shake coupled

stairs

How coupled stairs are shaken in

most of available literature


37

Objective of UA:

NOT to verify whether the two global university rankings are legitimate models to measure university performance

To test whether the rankings and/or their associated inferences are robust or volatile with respect to changes in the methodological assumptions within a plausible and legitimate range.

Uncertainty analysis – ARWU & THES

Question:



Source: Saisana, D’Hombres, Saltelli, 2011, Research Policy 40, 165–177

38

Activate simultaneously different sources of uncertainty

that cover a wide spectrum of methodological assumptions

Estimate the FREQUENCY of the university

ranks obtained in the different simulations

imputation weighting

normalization

Number of indicators

Aggregation

Assumption Alternatives

Number of indicators all six indicators included or

one-at-time excluded (6 options)

Weighting method original set of weights,

factor analysis,

equal weighting,

data envelopment analysis

Aggregation rule additive,

multiplicative,

Borda multi-criterion

70 scenarios

Uncertainty analysis – ARWU & THES

39

Harvard, Stanford, Berkley, Cambridge, MIT: top 5 in more

than 75% of our simulations.

Univ California: original rank 18th but could be ranked anywhere

between the 6th and 100th position

Impact of assumptions: much stronger for the middle ranked

universities

Legend:Frequency lower 15%Frequency between 15 and 30%Frequency between 30 and 50%Frequency greater than 50%Note: Frequencies lower than 4% are not shown

1-5

6-1

0

11-1

5

16-2

0

21-2

5

26-3

0

31-3

5

36-4

0

41-4

5

46-5

0

51-5

5

56-6

0

61-6

5

66-7

0

71-7

5

76-8

0

81-8

5

86-9

0

91-9

5

96-1

00

Original

rankHarvard Univ 100 1 USA

Stanford Univ 89 11 2 USA

Univ California - Berkeley 97 3 USA

Univ Cambridge 90 10 4 UK

Massachusetts Inst Tech (MIT) 74 26 5 USA

California Inst Tech 27 53 19 6 USA

Columbia Univ 23 77 7 USA

Princeton Univ 71 9 11 7 8 USA

Univ Chicago 51 34 13 9 USA

Univ Oxford 99 10 UK

Yale Univ 47 53 11 USA

Cornell Univ 27 73 12 USA

Univ California - Los Angeles 9 84 7 13 USA

Univ California - San Diego 41 46 9 14 USA

Univ Pennsylvania 6 71 23 15 USA

Univ Washington - Seattle 7 71 21 16 USA

Univ Wisconsin - Madison 27 70 17 USA

Univ California - San Francisco 14 9 14 11 7 10 6 6 18 USA

Tokyo Univ 16 16 49 20 19 Japan

Johns Hopkins Univ 7 54 21 17 20 USA

Simulated rank range - SJTU 2008

Uncertainty analysis – ARWU

40

Impact of uncertainties on the university ranks is even more apparent.

M.I.T.: ranked 9th, but confirmed only in 13% of simulations

(plausible range [4, 35])

Very high volatility also for universities ranked 10th-20th position, e.g.,

Duke Univ, John Hopkins Univ, Cornell Univ.

Legend:Frequency lower 15%Frequency between 15 and 30%Frequency between 30 and 50%Frequency greater than 50%Note: Frequencies lower than 4% are not shown

1-5

6-1

0

11-1

5

16-2

0

21-2

5

26-3

0

31-3

5

36-4

0

41-4

5

46-5

0

51-5

5

56-6

0

61-6

5

66-7

0

71-7

5

76-8

0

81-8

5

86-9

0

91-9

5

96-1

00

HARVARD University 44 56 1 USA

YALE University 40 49 11 2 USA

University of CAMBRIDGE 99 3 UK

University of OXFORD 93 7 4 UK

CALIFORNIA Institute of Technology 46 50 5 USA

IMPERIAL College London 74 24 6 UK

UCL (University College London) 73 23 7 UK

University of CHICAGO 80 19 8 USA

MASSACHUSETTS Institute of Technology 14 13 17 16 11 11 7 9 USA

COLUMBIA University 6 13 17 11 10 7 10 14 10 USA

University of PENNSYLVANIA 37 56 6 11 USA

PRINCETON University 6 59 27 9 12 USA

DUKE University 27 11 9 7 10 6 9 6 13 USA

JOHNS HOPKINS University 20 10 9 9 7 10 6 6 7 6 13 USA

CORNELL University 6 24 11 7 6 7 9 9 7 15 USA

AUSTRALIAN National University 10 30 29 31 16 Australia

STANFORD University 10 14 7 10 9 10 6 6 7 17 USA

University of MICHIGAN 6 27 17 9 10 7 14 6 18 USA

University of TOKYO 16 7 13 7 6 6 19 Japan

MCGILL University 7 19 41 13 9 7 20 Canada

Simulated rank range - THES 2008

Uncertainty analysis – THES

41

1

51

101

151

201

251

301

351

401

451

501Me

dia

n r

an

k (

an

d 9

9%

co

nfid

en

ce

in

terv

al) a

cco

un

tin

g f

or

meth

odolo

gic

al uncert

ain

ties

Seoul National University

University of Frankfurt

University of Hamburg

University of California-Davis

University of Alaska-

Fairbanks

Hanyang University

54 universities outside the interval (total of 503)

[43 universities in the Top 100]

Uncertainty analysis – ARWU results

42

•

1

51

101

151

201

251

301

351

401

Me

dia

n r

an

k (

an

d 9

9%

co

nfid

en

ce

in

terv

al) a

cco

un

tin

g f

or

meth

odolo

gic

al uncert

ain

ties

250 universities outside the interval (total of 400)

[61 universities in the Top 100]

University of California, Santa

Barbara

Stockholm School of Economics

University of st.

Gallen

University of Tokyo

University of

LeichesterUniversity La Sapienza,

Roma

Uncertainty analysis – THES results

43






Policy Implications

Conclusions

44

Complementary to the uncertainty analysis, a sensitivity

analysis makes it possible to assess the impact of each

assumption /scenario on the Index ranking.

HOW?

• Variance-based sensitivity measures

(for those more familiar with statistics)

• Non-variance based sensitivity measures

(for those less familiar with statistics)


45

free software

Our recommended practice is based on two fractional variance measures – one is a first order effect – one

factor influence by itself

The other is a factors’ total influence inclusive of all interaction

with other factors

Y

iX

TiV

YVES ii

XX

Y

iX

iV

XYEVS ii

X

Variance-based sensitivity measures

46

Variance-based sensitivity measures

Uncertainty and Sensitivity

analysis techniques as

tools for the quality

assessment of composite

indicators

Saisana, Saltelli, Tarantola

(2005), Journal of the Royal

Statistical Society - A, 168(2),

307-323.

In cases where partial overlapping

between two countries occurs, the

difference in the Index values for

that pair of countries can be

further analyzed via sensitivity

analysis

Expert Panel Opinion and

Global Sensitivity Analysis for

Composite Indicators

Saisana and Saltelli

Chapter 11 in Book: Computational

Methods in Transport: Verification

and Validation, Vol. 62, ISSN 1439-

7358, Ed. Frank Graziani, Springer

Berlin Heidelberg, 2008, pp.251-275

47


- which uncertain input affects the

difference Netherlands - Singapore?

48

Non-variance based sensitivity measures

Or maybe the RMSE

Results for the 2007 THES (88 universities, 70 models/scenarios)

49

Robustness can also be

used in the process of

building an index …

…not only to

criticize an

existing one!

Robustness analysis

http://wallpapers.free-review.net/r?12

50

1. HEI provide an array of services and positive externalities to society (universal education, innovation and growth, active citizens, capable entrepreneurs and administrators, etc.) which call for multi-dimensional measures of effectiveness and/or efficiency.

2. A clear statement of the purpose of any such measure is also needed, as measuring scientific excellence is not the same as measuring e.g. employability or innovation potential, or where to study, or how to reform the university system so as to increase the visibility of national universities.

Policy implications

51

3. Indicators and league tables are enough to start a discussion on

higher education issues BUT not sufficient to conclude it.

4. Assigned university rank largely depends on the methodological

assumptions made in compiling the two rankings.

• 9 in 10 universities shift over 10 positions in the 2008

SJTU.

• 92 positions (Univ Autonoma Madrid) and 277 positions

(Univ Zaragoza) in Spain,

• 71 positions (Univ Milan) and 321 positions (Polytechnic

Inst Milan) in Italy,

• 22 positions (Univ Paris 06) and 386 positions (Univ Nancy

1) in France.

Policy implications

52

5. THES ranking: less robust, less coherent than the SJTU ranking

6. A multi-modeling approach can offer a representative picture of

the classification of universities by ranking institutions in a range

bracket, as opposed to assigning a specific rank which is not

representative of the plurality of opinions on how to assess

university performance.

7. The compilation of university rankings should always be

accompanied by coherence tests & robustness analysis.

Policy implications

53

• “rankings are here to stay, and it is therefore worth the time

and effort to get them right”

(Alan Gilbert, Nature News, 2007)

• “because they define what “world-class” is to the

broadest audience, these measures cannot be ignored by

anyone interested in measuring the performance of

tertiary education institutions”

(Jamil Salmi, 2009)

Conclusions

54

1. Compensability among the dimensions of

an index

2. Tradeoffs between the dimensions

of an index

Extra

55

17.1) During the last 12 months, for how many months was your household’s main source

of water sufficient to meet your household’s drinking, cooking, bathing and cleaning

needs?

Months: Don’t remember (-1)

17.2) How often do you worry there will not be enough water from your household’s main

water source to satisfy your household’s drinking, cooking, bathing and cleaning needs?

Never (1) Rarely (2) Sometimes (3) Often (4) Always (5)

Example: Multidimensional Poverty Assessment

Component: Domestic Water Supply, Subcomponent: Availability

104 HHs

8

3 HHs

4

Too worried? Careless?

Suggestion: Given some unavoidable inconsistencies (in part due to the way the human mind

works), use a (weighted) arithmetic average of the indicators (rule of thumb: 5-10 could suffice)

within a subcomponent to reduce this “measurement error”.

Compensability

56

Advantages of the geometric mean versus the arithmetic mean for the HDI

1) implies only partial compensability, i.e. poor performance in one HD dimension cannot be fully

compensated by good performance in another,

2) rewards balance by penalizing uneven performance between dimensions,

3) encourages improvements in the weak dimensions, i.e. the lower the performance in a particular

HD dimension, the more urgent it becomes to improve in that dimension.

Life Edu GNI stdev

HDI

(arithmetic)

HDI 2011

(geometric)

Liberia’s

improvement

Mali .496 .270 .346 .115 .371 (176) .359 (175)

Liberia .580 .439 .140 .225 .386 (175) .329 (182)

Option A .680 .439 .140 .419 .347 5.5%

Option B .580 .439 .240 .419 .394 19.8%

More on the geometric mean in the case of the HDI…

Compensability

57

Country X1 X2 I1 I2 Y Rank

H 2 000 500 100 100 100 1

A 160 435 8 87 47.5 2

B 400 370 20 74 47.0 3

C 640 305 32 61 46.5 4

D 880 240 44 48 46.0 5

E 1 120 175 56 35 45.5 6

F 1 360 110 68 22 45.0 7

G 1 600 45 80 9 44.5 8

A paradox…

Tradeoffs

21 5.5. IIY

58

Country X1 X2 I1 I2 Y Rank

H 2 000 700 100 100 100 1

A 160 435 8 62.14 35.07 8

B 400 370 20 52.86 36.43 7

C 640 305 32 43.57 37.79 6

D 880 240 44 34.29 39.14 5

E 1 120 175 56 25 40.5 4

F 1 360 110 68 15.71 41.86 3

G 1 600 45 80 6.43 43.21 2

21 5.5. IIY

Only the best performer (H) improves BUT

the ranking gets completely reversed and

country A is last as opposed to 2nd !

Rank

1

2

3

4

5

6

7

8

Tradeoffs

59

0

100

200

300

400

500

600

0 500 1000 1500 2000 2500

X1

X2

r (X1,X2) = -0.26

How did that happen?

Tradeoffs

60

Careful when aggregating the dimensions of an index that are negatively associated to each other !

Tradeoffs

61

Prerequisites for any Index

Sound framework

Carefully selected

indicators

Sound model

can help to depict reasonably

reality

62

But even then…

can only offer an imperfect

mirror of reality Disclaimer: The example of “La Gioconda” has been adapted from a presentation of M. Carroll, Audit Director, AUQA for AIR, Tampa, 18-21 May, 2003, given in another context

Sound framework

Carefully selected

indicators

Sound model

63

More at:

http://composite-indicators.jrc.ec.europa.eu

(first Google hit on “composite indicators” over

the last 10 years!)

64

1. Paruolo P., Saisana M., Saltelli A., 2013, Ratings and Rankings: voodoo or

science?. J Royal Statistical Society A 176(2).

2. Saisana M., Saltelli A., 2012, JRC audit on the 2012 WJP Rule of Law Index, In

Agrast, M., Botero, J., Martinez, J., Ponce, A., & Pratt, C. WJP Rule of Law

Index® 2012. Washington, D.C.: The World Justice Project.

3. Saisana M., Philippas D., 2012, Sustainable Society Index (SSI): Taking societies’

pulse along social, environmental and economic issues, EUR 25578, Joint Research

Centre, Publications Office of the European Union, Italy.

4. Saisana M., D’Hombres B., Saltelli A., 2011, Rickety Numbers: Volatility of

university rankings and policy implications. Research Policy 40, 165–177.

5. Saisana M., Saltelli A., Tarantola S., 2005, Uncertainty and sensitivity analysis

techniques as tools for the analysis and validation of composite indicators. J

Royal Statistical Society A 168(2), 307-323.

6. OECD/JRC, 2008, Handbook on Constructing Composite Indicators. Methodology and

user Guide, OECD Publishing, ISBN 978-92-64-04345-9.

References and Related Reading

Documents

Técnicas estadísticas para la construcción de ... · practitioners in measuring and benchmarking "excellence" across ... Multivariate analysis Step 3. ... Country 1 10 20 30 40