81
The Google Scholar Revolution: a big data bibliometric tool Google Scholar Day: Changing current evaluation paradigms Cybermetrics Lab (IPP–CSIC) Madrid, 20 February 2017 Enrique Orduña-Malea, Alberto Martín-Martín, Juan M. Ayllón, Emilio Delgado López-Cózar EC3 Research Group-Scholar Division

The Google Scholar Revolution: a big data bibliometric tool

Embed Size (px)

Citation preview

Page 1: The Google Scholar Revolution:  a big data bibliometric tool

The Google Scholar Revolution:

a big data bibliometric tool

Google Scholar Day: Changing current evaluation paradigms

Cybermetrics Lab (IPP–CSIC) Madrid, 20 February 2017

Enrique Orduña-Malea, Alberto Martín-Martín, Juan M. Ayllón,

Emilio Delgado López-Cózar

EC3 Research Group-Scholar Division

Page 2: The Google Scholar Revolution:  a big data bibliometric tool

EC3 Research Group – Google Scholar Division

Page 3: The Google Scholar Revolution:  a big data bibliometric tool

se

arc

h e

ng

ine

Bib

lio

gra

ph

ic s

ea

rch

to

ol

Bib

liom

etric

too

l The two faces of Google Scholar

Page 4: The Google Scholar Revolution:  a big data bibliometric tool

The google search familly. Its goal: finding information

Scholar Search

Scholar Citations

Scholar Metrics

Page 5: The Google Scholar Revolution:  a big data bibliometric tool

Simple

Easy

Fast

Easy to understand and use

Universal, international, global

Multilingual

Free

Why is it successful?

Page 6: The Google Scholar Revolution:  a big data bibliometric tool

2012 2011

2004

Google’s incursion in Bibliometrics

Page 7: The Google Scholar Revolution:  a big data bibliometric tool

7

Studying it from the bibliometric perspective:

EC3-Scholar Division

Page 8: The Google Scholar Revolution:  a big data bibliometric tool

Opening the academic Pandora’s Box 2008-

Page 9: The Google Scholar Revolution:  a big data bibliometric tool

9

Journals Authors Publishers

Multifaceted model

Library & Information Sciences (Spain)

http://www.biblioteconomia-documentacion-española.infoec3.es

Bibliometrics & Scientometrics (International)

http://www.scholar-mirrors.infoec3.es

What have we analyzed? Intensive, extensive, and obsessive work

Conferences

Page 10: The Google Scholar Revolution:  a big data bibliometric tool
Page 11: The Google Scholar Revolution:  a big data bibliometric tool

11

Page 12: The Google Scholar Revolution:  a big data bibliometric tool

12

• Publication data about 49,930 A&H and SS professors working in public

Spanish universities was extracted from Google Scholar in 2012

• Only authors in the first tercile are displayed

• 68 discipline rankings (49 in Social Sciences and Law, 39 in Arts and

Humanities)

Page 13: The Google Scholar Revolution:  a big data bibliometric tool

13

Indicators

Page 14: The Google Scholar Revolution:  a big data bibliometric tool

14

Collecting data Max. number of hits per page

February 2013: from 100 to 20

Page 15: The Google Scholar Revolution:  a big data bibliometric tool

15

Page 16: The Google Scholar Revolution:  a big data bibliometric tool

16

Page 17: The Google Scholar Revolution:  a big data bibliometric tool

17

Sample of highly cited books (top 3%)

published by ~49k A&H and SS

professors working

in public Spanish universities

Data collected from Google Scholar in

2012 (n ~ 7203)

68 discipline rankings

(49 in Social Sciences and Law,

39 in Arts and Humanities)

Page 18: The Google Scholar Revolution:  a big data bibliometric tool

18

Indicators: Nº of books, and sum

of citations (relative to highest

element in the ranking)

Page 19: The Google Scholar Revolution:  a big data bibliometric tool

19

Page 20: The Google Scholar Revolution:  a big data bibliometric tool
Page 21: The Google Scholar Revolution:  a big data bibliometric tool

Indicators

H5 Index

H5 Median

Page 22: The Google Scholar Revolution:  a big data bibliometric tool

22

Spanish Journals ± 2.500

SJR-Scopus

506

JCR

114 Google Scholar Metrics

1299

Page 23: The Google Scholar Revolution:  a big data bibliometric tool

23

Page 24: The Google Scholar Revolution:  a big data bibliometric tool

24

Page 25: The Google Scholar Revolution:  a big data bibliometric tool

25

Page 26: The Google Scholar Revolution:  a big data bibliometric tool

26

Page 27: The Google Scholar Revolution:  a big data bibliometric tool

27

Indicators

Extracted

directly from

Google Scholar

Metrics

Computed using the article and

citation data available in Google

Scholar Metrics

H Index of

documents

published in the

last 5 years

Median of

citation counts

for articles

published in last

5 years

Sum of

citations for

articles above

h5-index

threshold

Page 28: The Google Scholar Revolution:  a big data bibliometric tool

28

Classification

Core Related

Page 29: The Google Scholar Revolution:  a big data bibliometric tool

29

Coverage

IMPORTANT: Google Scholar Metrics

only covers journals that are indexed in

Google Scholar, have published at

least 100 articles in the last 5-year

period, and have received at least 1

citation

Page 30: The Google Scholar Revolution:  a big data bibliometric tool

30

PROCEEDINGS

Page 31: The Google Scholar Revolution:  a big data bibliometric tool

31

Page 32: The Google Scholar Revolution:  a big data bibliometric tool

32 Data sources

Citations

Multifaceted model

Page 33: The Google Scholar Revolution:  a big data bibliometric tool

33

LIS researchers

in Spain 336 authors in GSC

68 not in GSC

Other sources ResearcherID (WoS)

ResearchGate

Indicators Sum of citations

H Index

Nº of documents

RG Score

Impact Points

Aggregating data Highly cited docs (HCD),

% of HCD by journal, book

publisher, and institution

Page 34: The Google Scholar Revolution:  a big data bibliometric tool

34

The «Mirrors» approach

There are many platforms that reflect (mirror) scientific activity

on the Web. An inclusive study of the impact of scientific

activity must contemplate as many of them as possible.

Page 35: The Google Scholar Revolution:  a big data bibliometric tool

35

Page 36: The Google Scholar Revolution:  a big data bibliometric tool

Bibliometric potential of Google Scholar

We have proved

Yes, we can

What do we know about Google Scholar?

Its strengths and weaknesses

Page 37: The Google Scholar Revolution:  a big data bibliometric tool
Page 38: The Google Scholar Revolution:  a big data bibliometric tool

38

It’s the most used academic

search engine

Page 39: The Google Scholar Revolution:  a big data bibliometric tool

Why do we call it a big data bibliometric tool?

Source

COVERAGE

All documents

Geographic

COVERAGE

All countries

Linguistic

COVERAGE

All languages

Discipline

COVERAGE

All of them

GROWTH

Faster than WoS and

Scopus

FAST TRACK

CITATIONS

Newly published

documents indexed in

a matter of days

Big Data

Big Data

SIZE Largest bibliographic database in the world

Page 40: The Google Scholar Revolution:  a big data bibliometric tool

The search engine with the largest coverage

Size matters

Orduña-Malea, E., Ayllón, J. M., Martín-Martín, A., Delgado López-Cózar, E.. (2014). About the size of Google

Scholar: playing the numbers. arXiv preprint arXiv:1407.6239. EC3 Working Papers 18

Orduna-Malea, E., Ayllón, J. M., Martín-Martín, A., Delgado López-Cózar, E. Methods for estimating the size of Google

Scholar. Scientometrics 104 (3), 931-949

2015

Page 41: The Google Scholar Revolution:  a big data bibliometric tool

The search engine with the largest coverage

Size matters

2017

Nº documents

Page 42: The Google Scholar Revolution:  a big data bibliometric tool

2017

Page 43: The Google Scholar Revolution:  a big data bibliometric tool
Page 44: The Google Scholar Revolution:  a big data bibliometric tool

Larger coverage, larger citation graph

0

0,5

1

1,5

2

2,5

3

Art & Humanities Social Sciences Sciences TOTAL

Ra

tio

of

GS

Cit

ati

on

s t

o W

oS

Cit

ati

on

s (

avg

)

Documents published in 2009 Documents published in 2014

Analysis of most documents with a DOI published in 2009 and 2014 covered by

Web of Science (~2.5 million documents)

Page 45: The Google Scholar Revolution:  a big data bibliometric tool

45

International

Page 46: The Google Scholar Revolution:  a big data bibliometric tool

46

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

18

00

18

05

18

10

18

15

18

20

18

25

18

30

18

35

18

40

18

45

18

50

18

55

18

60

18

65

18

70

18

75

18

80

18

85

18

90

18

95

19

00

19

05

19

10

19

15

19

20

19

25

19

30

19

35

19

40

19

45

19

50

19

55

19

60

19

65

19

70

19

75

19

80

19

85

19

90

19

95

20

00

20

05

20

10

20

15

Proportion of documents covered by Google Scholar by language over the years

Turkish

Dutch

Chinese

Polish

Korean

Italian

Japanese

Portuguese

Spanish

French

Simplified

Chinese

German

English

Page 47: The Google Scholar Revolution:  a big data bibliometric tool

47

Multilingual Documents covered by Google Scholar, Web of Science & Scopus by 13 languages

(1800-2016)

Page 48: The Google Scholar Revolution:  a big data bibliometric tool

48

Multilingual

Page 49: The Google Scholar Revolution:  a big data bibliometric tool

49

Covers all

document

typologies

Page 50: The Google Scholar Revolution:  a big data bibliometric tool

Th

e a

lte

rna

tive

Page 51: The Google Scholar Revolution:  a big data bibliometric tool

51

Google Scholar offers a different vision of scientific production

Page 52: The Google Scholar Revolution:  a big data bibliometric tool

Web of Science documents (2009/2014) found in Google Scholar

found in GS

96%

not found in

GS

4%

96% of the searched WoS documents were found in GS. 98% if we only consider

journal articles. The rest might have been found as well if alternative search strategies

had been used.

Page 53: The Google Scholar Revolution:  a big data bibliometric tool

Martín-Martín, A., Orduña-Malea, E., Ayllón, J. M., Delgado López-Cózar, E. (2014). Does Google Scholar contain all highly cited

documents (1950-2013)?. arXiv preprint arXiv:1410.8464.

Highly cited documents in Google Scholar (1950-2013)

Half of them are not in WoS

The ones who are in WoS: very high citation correlation

Page 54: The Google Scholar Revolution:  a big data bibliometric tool

54

Confirmation

PRELIMINARY RESULTS:

Analysis of most documents with a DOI published in 2009 covered by Web

of Science (~1 million documents)

Citation Index N spearman.cor p.value prop.cited.gs prop.cited.wos ratio of gs_cit to wos_cit (avg)

Sciences 863801 0,94 0,00 0,97 0,95 1,68

Social Sciences 109232 0,90 0,00 0,97 0,94 2,58

Art & Humanities 13487 0,83 0,00 0,84 0,69 2,52

Sciences Social Sciences Art & Humanities

Page 55: The Google Scholar Revolution:  a big data bibliometric tool

Logical when you see their sources

elsevier.com 7.200.000

wiley.com 4,590,000

springer 3.290.000

taylor and francis 1.440.000

lww.com 1.030.000

sagepub.com 781.000

nature.com 428.000

bmj.com 370.000

Routledge 293.000

karger.com 188.000

degruyter.com 196.000

biomedcentral.com 121.000

liebertpub.com 90.900

emerald 167.000

books.google.com 14.000.000

ieee.org 2.990.000

jstor.org 2.210.000

acs.org 987.000

cambridge.org 893.000

oxfordjournals.org 872.000

acm.org 472.000

iop.org 462000

aip.org 451.000

arxiv.org 355.000

pnas.org 101.000

ams.org 98.000

sciencemag.org 62.600

nber.org 26.900

Scopus 2009

Web of Science

Google Scholar

Page 56: The Google Scholar Revolution:  a big data bibliometric tool

56

It has a better coverage of areas like Humanities, Social Sciences, Engineering…

Page 57: The Google Scholar Revolution:  a big data bibliometric tool

GS measures scientific impact and more

Scientific Professional

Educational Social

Page 58: The Google Scholar Revolution:  a big data bibliometric tool

What impact does it measure?

Page 59: The Google Scholar Revolution:  a big data bibliometric tool

A professional journal in Google Scholar

Page 60: The Google Scholar Revolution:  a big data bibliometric tool

We question ourselves

Drawbacks Google Scholar

Page 61: The Google Scholar Revolution:  a big data bibliometric tool

61

Lack of transparency

There isn’t a public master list of the sources

Google Scholar indexes (publishers,

repositories, catalogues, bibliographic

databases and repertoires, aggregators,

journals…)

There is no accurate method to estimate the

size of Google Scholar

Page 62: The Google Scholar Revolution:  a big data bibliometric tool

62

Similarly, even if a document has received more than 1000 citations, only the first 1000 can be displayed

when clicking the “Cited by” link

We have no control over the results we get

Are the relevant results for my needs among those 1000 results?

Usually yes… thanks to the ranking algorithm they use

Only(!) returns 1000 results for any given query

Is this really a bibliographic problem? Who is interesed in bibliographic searches of that size?

Weaknesses

Page 63: The Google Scholar Revolution:  a big data bibliometric tool

63

How does Google Scholar rank results?

Page 64: The Google Scholar Revolution:  a big data bibliometric tool

64

There is no native method to easily extract bibliographic data massively. Only one by one.

Weaknesses

There is no API. What can we use for large downloads?

A scraper

Page 65: The Google Scholar Revolution:  a big data bibliometric tool

65

Weaknesses

Limited filtering and sorting options (year and relevance)

Page 66: The Google Scholar Revolution:  a big data bibliometric tool

66

Weaknesses

The advanced search form is limited to four search dimensions:

keywords, authors, source of publication (journal, conference…),

and year of publication

Page 67: The Google Scholar Revolution:  a big data bibliometric tool

67

No quality control of sources indexed. Peer-reviewed documents coexist with documents that haven’t gone through that process.

Is that really

a problem?

But… Google Scholar also shows which documents are covered by the Web of Science, and which of them are available from your library. YOUR CHOICE…

Weaknesses

It shows

richness rather

than a flaw

Page 68: The Google Scholar Revolution:  a big data bibliometric tool

68

Weaknesses

• Institutional affiliation of the authors of the documents is

available (institution, country)

• The language in which documents are written

• The typology of each document is not clear (book, journal

article, conference communication, thesis, report…). Only

books are marked as such, usually when they have been

found on Google Books

• Not all documents have an abstract

• The author-supplied keywords are not available

• The list of cited references in each article is not available

either

It doesn’t offer information regarding

Page 69: The Google Scholar Revolution:  a big data bibliometric tool

69

Greatest danger: manipulation

Delgado López‐Cózar, E., Robinson‐García, N., Torres‐Salinas, D. (2014). The Google

Scholar experiment: How to index false papers and manipulate bibliometric indicators.

Journal of the Association for Information Science and Technology, 65(3), 446-454.

Page 70: The Google Scholar Revolution:  a big data bibliometric tool

70

Errors in the data

Enough quality?

Even with «dirty» data,

it measures more and

better

Large units of analysis: no problem

Individuals: check data first

Page 71: The Google Scholar Revolution:  a big data bibliometric tool

71

Page 72: The Google Scholar Revolution:  a big data bibliometric tool

The Googledependency

Page 73: The Google Scholar Revolution:  a big data bibliometric tool

Drawbacks:

Google Scholar Metrics

Page 74: The Google Scholar Revolution:  a big data bibliometric tool

74

Page 75: The Google Scholar Revolution:  a big data bibliometric tool

75

Page 76: The Google Scholar Revolution:  a big data bibliometric tool

Drawbacks:

Google Scholar Citations

Page 77: The Google Scholar Revolution:  a big data bibliometric tool

77

Page 78: The Google Scholar Revolution:  a big data bibliometric tool

78

Page 79: The Google Scholar Revolution:  a big data bibliometric tool

79

Page 80: The Google Scholar Revolution:  a big data bibliometric tool

80

Google Scholar Citations:

Laissez faire laissez passer

Be wary of CUT / COPY – BUTTON / COMAND

bibliometric products

Don’t mix apples and oranges

Page 81: The Google Scholar Revolution:  a big data bibliometric tool

A final consideration… To what end are we measuring scientific

activity?

Spreading light where there was

darkness

Thank you very

much!