14
Thed van Leeuwen, Tung Tung Chan, Ingeborg Meijer January – 2019 EN Expanding data sources for the measurement of Open Science Open Science Monitor Case Study

Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

Thed van Leeuwen, Tung Tung Chan, Ingeborg Meijer January – 2019

EN

Expanding data sources for the measurement of Open Science

Open Science Monitor Case Study

Page 2: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

Expanding data sources for the measurement of Open Science - Open Science Monitor Case Study

European Commission Directorate-General for Research and Innovation Directorate A — Policy Development and Coordination Unit A.2 — Open Data Policy and Science Cloud E-mail [email protected] [email protected] European Commission B-1049 Brussels

Manuscript completed in January 2019.

This document has been prepared for the European Commission however it reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

More information on the European Union is available on the internet (http://europa.eu).

Luxembourg: Publications Office of the European Union, 2019

EN PDF ISBN 978-92-76-00901-6 doi: 10.2777/339230 KI-03-19-167-EN-N

© European Union, 2019. Reuse is authorised provided the source is acknowledged. The reuse policy of European Commission documents is regulated by Decision 2011/833/EU (OJ L 330, 14.12.2011, p. 39).

For any use or reproduction of photos or other material that is not under the EU copyright, permission must be sought directly from the copyright holders.

Page 3: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

EUROPEAN COMMISSION

Expanding data sources for the measurement of Open Science

Open Science Monitor Case Study

2019 Directorate-General for Research and Innovation EN

Page 4: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

2

Table of Contents1 Introduction ........................................................................................................... 4 2 Methodology .......................................................................................................... 4

2.1 Our approach 4 2.2 Sources of Open Access evidence 5

3 Results of the analysis ............................................................................................. 6 4 Conclusions ........................................................................................................... 8 References................................................................................................................. 9 Appendix: Full data table underlying the figures in the analysis ........................................... 10

Page 5: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

3

ACKNOWLEDGEMENTS

Disclaimer: The information and views set out in this study report are those of the author(s) and do not necessarily reflect the official opinion of the Commission. The Commission does not guarantee the accuracy of the data included in this case study. Neither the Commission nor any person acting on the Commission’s behalf may be held responsible for the use which may be made of the information contained therein.

The case study part of Open Science Monitor led by the Lisbon Council together with CWTS, ESADE and Elsevier.

Authors

Thed van Leeuwen – CWTS

Tung Tung Chan – CWTS

Ingeborg Meijer – CWTS

Page 6: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

4

EXPANDING DATA SOURCES FOR THE MEASUREMENT OF OPEN SCIENCE Comparing the Dutch case using Scopus and WoS 1 Introduction

On July 26, 2018, Elsevier Scopus database joins Clarivate Analytics Web of Science (WoS) database as paying subscribers to the Unpaywall Data Feed. Unpaywall is a free service created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific search engines could now provide users with up-to-date access to legal versions of Gold and Green content, which substantially increase the availability of scientific knowledge. Scopus and WoS have since integrated document-level OA data from Unpaywall database for identification and tagging of OA peer-reviewed journal articles. In this case study, we disclose our methodology and approach in tracking open access status of research output in the Open Science Monitor.

Recently, debates occurred on the data underlying the Open Science Monitor. One of the issues raised concerned the selection of data, as provide by Elsevier Science to the Consortium conducting the studies underlying the Open Science Monitor. In the Open Science Monitor, the Elsevier Scopus database was the database selected for the analysis of Open Access versus non-Open Access publishing behaviour among the countries selected in the study. In this first quantitative case study we will present a comparison between the outcomes of the first analysis on OA publishing as performed for the Open Science Monitor, with results of a study CWTS has performed for another European project. An important function of this comparison is to show the effect of having a more pluralist selection of data sources for monitoring OA publishing. This other European project concerns the KTD-project, which stands for key Technology Domains, and involves CWTS delivery of bibliometric statistics on country level. For this particular case study, we want to select only the Netherlands, also due to the alignment with the more qualitative case study performed focusing on the Dutch National Plan for Open Science.

2 Methodology

2.1 Our approach

The methodological approach that we propose mainly focuses at adding different OA labels to the complete in-house version of the Web of Science database (period 2009-2016), using various data sources to establish this OA status (see also van Leeuwen et al, 2017). Basic principles for this OA label are sustainability and legality. With sustainability we mean that it should, in principle, be possible to reproduce the OA labelling from the various sources used, again and again, in an open fashion, with a relatively limited risk of the source disappearing behind a pay-wall. The second aspect relates to the usage of data sources that represent legal OA evidence for publications, excluding rogue or illegal OA publications, and without any copyright breaches by the authors publishing papers at places where these do not belong. As such, our method does not include articles viewed by users of Unpaywall, an open-source browser extension that allows users find OA articles using the oaDOI.

As main data sources we used:

• the DOAJ list (Directory of Open Access Journals) [https://doaj.org/],

• the ROAD list (Directory of Open Access scholarly Resources) [http://road.issn.org/],

• PMC (PubMed Central) [https://www.ncbi.nlm.nih.gov/pmc/],

Page 7: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

5

• CrossRef [https://www.crossref.org/], and

• OpenAIRE [https://www.openaire.eu/]

All these sources fulfil the above-mentioned requirements while other popular ‘apparent’ OA sources such as ResearchGate and SciHub fail to meet these two principle requirements. Thus, it is important to highlight here that our approach has a more policy and strategy perspective than a utilitarian one. In other words, our approach aims to inform the number and share of sustainable and legal OA publications (i.e. publications that have been published in OA journals or archived in official and legal repositories), instead of the mere identification of publications whose full text can be retrieved online (regardless the source or the legal status of the access to the publication).

2.2 Sources of Open Access evidence

The sources that were mentioned above were fully downloaded (as provided by the original sources) using their public Application Programming Interface (API). The obtained metadata has been parsed and incorporated into an SQL environment in the form of relational databases.

• DOAJ A first source we used is the DOAJ list of OA journals. This list was linked to the WoS database on the basis of the ISSN code available in both the DOAJ list as well as in the WoS database.

• ROAD A next source used to add labels to the WoS database is the ROAD list. ROAD has been developed with the support of the UNESCO, and is related to ISSN International Centre. The list provides access to a subset of the ISSN Register. This subset comprises bibliographic records which describe scholarly resources in OA identified by an ISSN: journals, monographic series, conference proceedings and academic repositories. The linking of the ROAD list is based upon the ISSN code available in both the WoS as well as in the ROAD list.

• CrossRef A third source that was used to establish an Open Access is CrossRef was based upon the DOI’s available in both systems.

• PubMed Central A fourth source used is the PubMed Central database. This is done in two ways, the first based upon the DOI’s available in both the PMC database as well as in the WoS database, while the second approach was based upon the PMID code (where PMID stands for PubMedID) in the PMC database as well as in the WoS database.

• OpenAIRE A fifth and final data source used to add OA labels to the WoS database is the openAIRE database. OpenAIRE is a European database that aggregates metadata on OA publications from multiple institutional repositories (mostly in Europe), including also thematic repositories such as arxiv.org. The matching is done in two different ways, based upon matching by using the DOI’s or PMIDs available in both OpenAIRE and in WoS; and second, on a fuzzy matching principle of diverse bibliographic metadata both in WoS and OpenAIRE (including articles’ titles, publication years and other bibliographic characteristics, this methodology is similar to the methodology for citation matching employed at CWTS (Olensky et al. 2016).

Page 8: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

6

3 Results of the analysis

Below we will show the outcomes of the comparison in four figures. These figures present both the absolute numbers of publications for the Netherlands in total, the overall OA tagged number of publications, as well as the Green and Gold numbers of publications. This is shown for the Scopus based analysis (in Figure 1a) as well as for the WoS based analysis (Figure 2a). Next, we show the percentages of the OA, the Gold and the Green as shares of the total. So, each section is computed as compared to the total numbers of publications. This is again done for the Scopus based analysis (Figure 1b), as well as the WoS based analysis (in Figure 2b). The analysis is limited to the period 2009-2016, as the KTD study does not provide data for the year 2017 yet.

Figure 1a. Output development of OA tagged publications (Art-Rev) in Scopus for the Netherlands, 2009-2016

Figure 1b: Shares of the output of OA tagged publications (Art-Rev) in Scopus for the Netherlands, 2009-2016

In Figure 1a, the absolute numbers are shown, which form the basis for Figure 1b. In Figure 1b, the overall share of OA publications for the Netherlands increases from 25% to 33%. We observe a slight fluctuation around 2013, after which the increase continues. The

0%

5%

10%

15%

20%

25%

30%

35%

2009 2010 2011 2012 2013 2014 2015 2016

All OA

Gold

Green

%

Page 9: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

7

increase of OA output share sis mainly due to the increase of Gold OA publishing, while the fluctuation described above is due to the decrease of Green OA publishing from 2012 onwards.

In Figure 2a, the absolute numbers of publications are shown, based on WoS data. This forms the basis for Figure 2b. In Figure 2b, we observe an increase of 27% of all Dutch publications being published in OA format, in 2009, to 32% in 2016. This development further shows some relapses, so it is not a continuous positive development during the period 2009-2016. The decrease in overall OA tagged publications is caused by the Green OA share of the Dutch output, as can be measured through WoS data, while the Gold share of the Dutch OA output is, similar as is depicted in Figure 1b, showing a continuous growth. The reason for the decrease in shares of Green OA output in WoS is reason for further research, the fact this is not visible in Figure 1b, based on Scopus data also requires further scrutinising of these data.

Figure 2a: Output development of OA tagged publications (Art-Rev) in WoS for the Netherlands, 2009-2016

Figure 2b: Shares of the output of OA tagged publications (Art-Rev) in Scopus for the Netherlands, 2009-2016

0%

5%

10%

15%

20%

25%

30%

35%

2009 2010 2011 2012 2013 2014 2015 2016

All OA

Gold

Green

%

Page 10: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

8

4 Conclusions

The results of the two studies and the use therein of two different datasets show differences in the absolute numbers, as Scopus has a somewhat wider coverage as compared to WoS. This has been widely reported on (see for example Archambault et al, 2009, Lopez-Illescas et al, 2009, Ball & Tunger, 2006), in itself this is not a topic of debate in this moment. In this case study we followed the exact same method of tagging publications in both Scopus and WoS, so the starting point is similar. The patterns of the shares of OA publishing shown in the two studies, lead to the conclusion that it is more or less indifferent which data set is used to monitor OA uptake and publishing. If one further focuses on the final point of measurement, that is, 2016, we observe nearly identical shares of OA output (33% versus 32%, in Scopus and WoS, respectively), of the Green OA shares (22% in both datasets), and Gold OA (12% versus 11% in Scopus and WoS, respectively). Further comparative analyses like this will be made, on comparing Scopus with WoS, as well as comparisons between data sets that contains social media metrics (PlumX and Altmetric.com).

Page 11: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

9

References

Archambault, E, Campbell, D; Gingras, Y; Lariviere, V (2009) Comparing of Science Bibliometric Statistics Obtained From the Web and Scopus, Journal of the Association for Information Science and Technology, 60 (7), 1320-1326, doi: 10.1002/asi.21062

Ball, R and Tunger, D (2006) Science indicators revisited - Science Citation Index versus Scopus: A bibliometric comparison of both citation databases, Information Services & Use, 26 (4), 293-301

Van Leeuwen, T.N., Meijer, I. Yegros-Yegros, A.& Costas, R. (2017) Developing indicators on Open Access by combining evidence from diverse data sources. Paper presented at the 2017 Science & Technology Indicators Conference, 6-8 September, Paris, France (arXiv:1802.02827)

Lopez-Illescas, C; Anegon, FD; Moed, HF (2009) Comparing bibliometric country-by-country rankings derived from the Web of Science and Scopus: the effect of poorly cited journals in oncology. Journal of Information Science 35 (2), 244-256. doi: 10.1177/0165551508098603

Olensky, M., Schmidt, M., & Van Eck, N.J. (2016). Evaluation of the Citation Matching Algorithms of CWTS and iFQ in Comparison to the Web of Science. Journal of the Association for Information Science and Technology, 67(10), 2550-2564. doi:10.1002/asi.23590.

Page 12: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

STUDY ON OPEN SCIENCE: MONITORING TRENDS AND DRIVERS (Reference: PP-05622-2017)

10

Appendix: Full data table underlying the figures in the analysis

Scopus

2009 2010 2011 2012 2013 2014 2015 2016

All 47087 49403 51115 55622 55933 57081 56674 55579

All OA 11639 14053 15008 16753 16395 17546 18516 18309

Gold 2015 2482 3056 3681 4050 4959 5532 6129

Green 9624 11571 11952 13072 12345 12587 12984 12180

2009 2010 2011 2012 2013 2014 2015 2016

All OA 25% 28% 29% 30% 29% 31% 33% 33%

Gold 4% 5% 6% 7% 7% 9% 10% 11%

Green 20% 23% 23% 24% 22% 22% 23% 22%

WoS

2009 2010 2011 2012 2013 2014 2015 2016

All 39698 42069 44444 47582 50092 50697 50689 50184

All OA 10650 12708 13807 15223 14717 14779 15877 16218

Gold 1518 1963 2394 2836 3395 4124 4498 5148

Green 9132 10745 11413 12387 11322 10655 11379 11070

2009 2010 2011 2012 2013 2014 2015 2016

All OA 27% 30% 31% 32% 29% 29% 31% 32%

Gold 4% 5% 5% 6% 7% 8% 9% 10%

Green 23% 26% 26% 26% 23% 21% 22% 22%

Page 13: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

Getting in touch with the EU IN PERSON All over the European Union there are hundreds of Europe Direct Information Centres. You can find the address of the centre nearest you at: http://europa.eu/contact ON THE PHONE OR BY E-MAIL Europe Direct is a service that answers your questions about the European Union. You can contact this service – by freephone: 00 800 6 7 8 9 10 11 (certain operators may charge for these calls), – at the following standard number: +32 22999696 or – by electronic mail via: http://europa.eu/contact Finding information about the EU ONLINE Information about the European Union in all the official languages of the EU is available on the Europa website at: http://europa.eu EU PUBLICATIONS You can download or order free and priced EU publications from EU Bookshop at: http://bookshop.europa.eu. Multiple copies of free publications may be obtained by contacting Europe Direct or your local information centre (see http://europa.eu/contact) EU LAW AND RELATED DOCUMENTS For access to legal information from the EU, including all EU law since 1951 in all the official language versions, go to EUR-Lex at: http://eur-lex.europa.eu OPEN DATA FROM THE EU The EU Open Data Portal (http://data.europa.eu/euodp/en/data) provides access to datasets from the EU. Data can be downloaded and reused for free, both for commercial and non-commercial purposes.

Page 14: Expanding data sources for the measurement of Open Science · created by Impactstory that locates open-access articles. The integration of Unpaywall data meant that both scientific

Debates occurred on the data underlying the Open Science Monitor, as Elsevier Scopus database was used for the analysis of Open Access versus non-Open Access publishing behaviour for the countries selected in the study. In this quantitative case study, a comparison between the outcomes of the first analysis on OA publishing as performed for the OSM, with results of a study CWTS has performed for another European project, Key Technology Domains (KTD) is analysed. The results show that while the two datasets differ in absolute numbers, patterns of the shares of OA publishing shown in the two studies is more or less the same. Both datasets can be used to monitor OA uptake and publishing. Further comparative analyses will be conducted to show the effect of having a more pluralist selection of data sources for monitoring OA publishing.

Studies and reports

[Catalogue num

ber]