41
Maryann E. Martone, Ph. D. Executive Director Professor of Neuroscience, University of California, San Diego Future of Research Communications and E-Scholarship Creating a data and tools ecosystem

FORCE11: Creating a data and tools ecosystem

Embed Size (px)

DESCRIPTION

Describes FORCE11 and the recent successes through the Data Citation Synthesis Working and the Resource Identification Initiative working groups

Citation preview

Page 1: FORCE11:  Creating a data and tools ecosystem

Maryann E. Martone, Ph. D.Executive Director

Professor of Neuroscience, University of California, San Diego

Future of Research Communications and E-Scholarship

Creating a data and tools ecosystem

Page 2: FORCE11:  Creating a data and tools ecosystem

What is FORCE11?Future of Research Communications and E-Scholarship: A grass roots effort to accelerate the pace and nature of scholarly communications and e-scholarship through technology, education and community

Why 11? We were born in 2011 in Dagstuhl, Germany

Principles laid out in the FORCE11 Manifesto

FORCE11 launched in July 2012

Page 3: FORCE11:  Creating a data and tools ecosystem

Who is FORCE11?

Anyone who has a stake in moving scholarly communication into the 21st century

Publishers

Library and Information

scientists

Policy makers

Tool builders

Funders

Scholars

Science HumanitiesSocial

Sciences

Page 4: FORCE11:  Creating a data and tools ecosystem

FORCE11 Vision• Modern technologies enable vastly improve knowledge transfer and far wider

impact; freed from the restrictions of paper, numerous advantages appear

• We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge

• To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts

• To obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute

• To ensure that this exciting future can develop and be sustained, we have to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable

Beyond the PDF Visual Notes by De Jongens van de Tekeningen is licensed under a Creative Commons Attribution 3.0 Unported License.

Page 5: FORCE11:  Creating a data and tools ecosystem

Old Model: Single type of content; single mode of distribution

Scholar

Library

Scholar

Publisher

Page 6: FORCE11:  Creating a data and tools ecosystem

The future is now...

Scholar

Consumer

Libraries

Data Repositories

Code RepositoriesCommunity databases/platforms

OA

Curators

Social Networks

Social NetworksSocial

Networks

Peer Reviewers

Workflows

Data

Blogs/Wikis

Multimedia

Nanopublications

Narrative

Code

Page 7: FORCE11:  Creating a data and tools ecosystem

The duality of modern scholarship

Observation: Those who build information systems from the machine side don’t understand the requirements of the human very well

Those who build information systems from the human side, don’t understand requirements of machines very well

Scholarship requires the ability to cite and track usage of scholarly artifacts. In our current mode of working, there is no way to easily track artifacts as they move through the ecosystem; no way to incrementally add human expertise; no way to alert everyone when things go wrong

Page 8: FORCE11:  Creating a data and tools ecosystem

Digital objects are a new beast

New modes of representation and verification will be necessary

Trust: Not just who produced it but what produced it

Page 9: FORCE11:  Creating a data and tools ecosystem

Impetus for change: Is our current method serving science?

47/50 major preclinical published cancer studies could not be replicated

“The scientific community assumes that the claims in a preclinical study can be taken at face value-that although there might be some errors in detail, the main message of the paper can be relied on and the data will, for the most part, stand the test of time. Unfortunately, this is not always the case.”

Begley and Ellis, 29 MARCH 2012 | VOL 483 | NATURE | 531

Page 10: FORCE11:  Creating a data and tools ecosystem

The scientific corpus is fragmented

• ~25 million articles total, each covering a fragment of the biomedical space

• Each publisher owns a fragment of a particular field

• The current process is inefficient and slow

Wiley

Elsevier

MacMillian

Oxford

Spinal Muscular Atrophy

Machine-based access requires that we take a global view of the body scholarly and allow mining across content

Page 11: FORCE11:  Creating a data and tools ecosystem

A new platform for scholarly communications

Components• Authoring tools

– Optimized for mark up and linked content• Containers

– Expand the objects that are considered “publications”– Optimize the container for the content

• Processes– Scholarship is code

• Mark up– Data, claims, content suitable for the web– Suitable identifier systems

• Reward systems– Incentives to change– Reward for new objects

Scholarship must move from a “single currency system”; platforms must recognize diversity of output and representation

Page 12: FORCE11:  Creating a data and tools ecosystem

FORCE11.org

• Community platform– Meetings– Discussions– Tools and resources– Blogs– Event calendar– Community projects

• Promote interoperability– Data Citation– Resource identification

initiative

500 members from diverse stakeholder groups700

Page 13: FORCE11:  Creating a data and tools ecosystem

Beyond the PDF• Conference/unconference

where all stakeholders come together as equals to discuss issues– Publishers– Technologists– Scholars– Library scientists

• Incubator for change• What would you do to

change scholarly communication?

San Diego, Jan 2011 ...... Amsterdam, March 2013........?2015

http://www.force11.org/beyondthepdf2

YES!!!

FORCE

Page 14: FORCE11:  Creating a data and tools ecosystem

Promote community, cross-fertilization and interoperability

• FORCE11 helps facilitate communications across disciplines and communities

• Issues are not identical but we can learn from each other– Enhanced publications

• Digital humanities +

– Dealing with data• Science +

– Open Access• Science + “What is an ORCID id?”-computer scientist

Page 15: FORCE11:  Creating a data and tools ecosystem

ORCID

Data journals

Research Data AlliancePeerJ, eLife

Workflows 4Ever

Data Verse

Impact Story, Rubriq

Sadie

Scalar

Resource for scholarly communications: People, organizations, publications, tools

Page 16: FORCE11:  Creating a data and tools ecosystem

FORCE11 Working Groups

• FORCE11 provides a neutral convening place for individuals to come together around issues in scholarly communication– FORCE11 provides web working space and

facilitation where possible– 1K Challenge: Beyond the PDF– Short term working groups with clear focus• Deliverable specified• Time line determined

Page 17: FORCE11:  Creating a data and tools ecosystem

Data: Who’s problem is it?

Scholar

Library

Scholar

Publisher

Domain-specific

Repository

Web site/Personal

data management

Computing

Scholars, Data Repositories, Institutional Repositories taking ownership of data. Where should it go? Sometimes it can’t go anywhere.

Page 18: FORCE11:  Creating a data and tools ecosystem

Is data like a bibliographic record?

• Not uniform in size• Not uniform in

type• Curation requires

deep understanding of domain

• Data is dynamic• Data is fluid

Geoff Bilder, CrossRef

Page 19: FORCE11:  Creating a data and tools ecosystem

Surveying the resource landscape

Neuroscience Information Framework http://neuinfo.org

Page 20: FORCE11:  Creating a data and tools ecosystem

Deep metadata

http://neuinfo.orgWith the thousands of databases and other information sources available, simple descriptive metadata will not suffice

Page 21: FORCE11:  Creating a data and tools ecosystem

A place to come together: Data citation principles

•FORCE11 provides a neutral space for bringing groups together • 35 individuals representing

> 20 organizations concerned with data citation

• Conducted a review of current data citation recommendations from 4 different organizations

• Arrived at a sense of consensus principles

Data citation synthesis group: http://www.force11.org/node/4381

Page 22: FORCE11:  Creating a data and tools ecosystem

Process

Synthesis Community feedback Revision Dissemination

July-Sept 2013 Nov-Dec 2013 Jan 2014 Now

Data Citation Principles: Open for Endorsement

Page 23: FORCE11:  Creating a data and tools ecosystem

Joint Declaration of Data Citation Principles

• Designed to be high level and easy to understand

• Supplemented with a glossary, references and examples

http://www.force11.org/datacitation

1. Importance2. Credit and attribution 3. Evidence4. Unique Identification 5. Access6. Persistence 7. Specificity and verifiability 8. Interoperability and

flexibility

Page 24: FORCE11:  Creating a data and tools ecosystem

Significance & Scope• Sound, reproducible scholarship rests upon a foundation of

robust, accessible data.• Data should be considered legitimate, citable products of

research. • Data citation, like the citation of other evidence and

sources, is good research practice.• The Joint Principles cover purpose, function and attributes

of citations. • Specific practices vary across communities and technologies

– we recommend communities develop practices for machine and human citations consistent with these general principles.

Page 25: FORCE11:  Creating a data and tools ecosystem

1. Importance. Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications [1].

2. Credit and attribution: Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data [2].

3. Evidence. In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited [3].

Purpose

Page 26: FORCE11:  Creating a data and tools ecosystem

Function

4. Unique Identification. A data citation should include a persistent method for identification that is machine-actionable, globally unique, and widely used by a community [4].

5. Access. Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data [5].

Joint Declaration of Data Citation Principles (Overview)

Page 27: FORCE11:  Creating a data and tools ecosystem

Attributes6. Persistence. Unique identifiers, and metadata describing the data

and its disposition, should persist -- even beyond the lifespan of the data they describe [6].

7. Specificity and verifiability. Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited [7].

8. Interoperability and flexibility. Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities [8].

Page 28: FORCE11:  Creating a data and tools ecosystem

Generic Data Citation(as it appears in printed reference list)

Note:● Neither the format nor specific required elements are intended to be defined with this example. Formats, optional

elements, and required elements will vary across publishers and communities. [Principle 8: Interoperability and flexibility]. ● As illustrated in the previous examples, intra-work citations may be accompanied with information including the specific

portion used. [Principles 7,8].● As illustrated in the next example, printed citations should be accompanied by metadata that support credit, attribution,

specificity, and verification. [Principles 2, 5 and 7].

Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier

Principle 2: Credit and Attribution (e.g. authors, repositories or other distributors and contributors)

Principle 4: Unique Identifier (e.g. DOI, Handle.). Principle 5, 6 Access, Persistence: A persistent identifier that provides access and metadata

Principle 7: Specificity and verification (e.g. the specific version used). Versioning or timeslice information should be supplied with any updated or dynamic dataset.

Page 29: FORCE11:  Creating a data and tools ecosystem

Placement of CitationsIntra-work:

● Should provide sufficient information to identify cited data reference within included reference list.

● Citation to data should be in close proximity to claims relying on data. [Principle 3]● May include additional information identifying specific portion of data related

supporting that claim. [Principle 7]Example: The plots shown in Figure X show the distribution of selected measures from the main data [Author(s), Year, portion or subset used].

Full Citation:Citation may vary in style, but should be included in the full reference list along with citations to other types works. Example:References SectionAuthor(s), Year, Article Title, Journal, Publisher, DOI.Author(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier.Author(s), Year, Book Title, Publisher, ISBN.

Page 30: FORCE11:  Creating a data and tools ecosystem

Citation MetadataAuthor(s), Year, Dataset Title, Data Repository or Archive, Version, Global Persistent Identifier.

Metadata retrieval

<!--- CONTRIBUTOR METADATA --><contributor role=” ORCIDid=”>Name</contributor>

<!-- FIXITY and PROVENANCE --<fixity type=”MD5”>XXXX</fixity><fixity type=”UNF”>UNF:XXXX</fixity>

<!-- MACHINE UNDERSTANDABILITY --><content type>data</content type><format>HDF5</format>

Note:● Metadata location, formats, and elements will vary

across publishers and communities. [Principle 8]● Citation metadata is needed in addition to the

information in the printed citation.● Metadata describing the data and its disposition

should persist beyond the lifespan of the data. [Principle 6]

● Citation metadata should support attribution and credit [Principle 2]; machine use [Principle 5]; specificity and verification [principle 7]

● For example, additional citation metadata may be embedded in the citing document; attached to the persistent identifier for the citation, through its resolution service; stored in a separate community indexing service (e.g. DataCite, CrossRef); or provided in a machine-readable way through the surrogate (“landing page”) presented by the repository to which the identifier is resolved.

For more detail, see the References section. http://www.force11.org/node/4772

EXAMPLE METADATA

Page 31: FORCE11:  Creating a data and tools ecosystem

Growing Adoption

https://www.force11.org/datacitation/endorsements

Page 32: FORCE11:  Creating a data and tools ecosystem

Endorse the Principles!• http://www.force11.org/datacitation/endorsements

148 individuals; 60 organizations

Page 33: FORCE11:  Creating a data and tools ecosystem

Unique ID’s for all! Resource Identification Initiative

• It is currently impossible to query the biomedical literature to find out what research resources have been used to produce the results of a study

• Impossible to find all studies that used a resource

• Critical for reproducibility and data mining

• Critical for trouble-shooting

http://www.force11.org/resource_identification_initiative

Faulty Antibodies Continue to Enter US and European Markets, Warns Top Clinical Chemistry Researcher-Genome Web Daily, October 11, 2013

Page 34: FORCE11:  Creating a data and tools ecosystem

Resource Identification Initiative• Have authors supply

appropriate identifiers for key resources used within a study such that they are:– Machine processible (i.e.,

unique identifier that resolves to a single resource)

– Outside of the paywall– Uniform across journals and

publishers

Launched February 2014: > 30 journals participating

Page 35: FORCE11:  Creating a data and tools ecosystem

Pilot Project• Have authors identify 3 different types of

research resources:– Software tools and databases– Antibodies– Genetically modified animals

• Include RRID in methods section• RRID=RRID:Accession number

– Just a string at this point• Voluntary for authors• Journals did not have to modify their

submission system• Journals have flexibility in implementation.

Send request to author at:– Submission– During review– After acceptance

http://scicrunch.com/resources

Resource Identification Portal: Aggregates accession numbers from >10 different databases that are the authorities for registering research resources

Page 36: FORCE11:  Creating a data and tools ecosystem

First results are in the literature

Google Scholar: Search RRID; select since 2014

Page 37: FORCE11:  Creating a data and tools ecosystem

What studies used X?To date: •30 articles have appeared•2 articles have disappeared, i.e., the RRID’s were removed at copyediting•195 RRID’s were reported•14 were in error = 0.7%•> 200 antibodies were added•> 75 software tools/databases were added•A resolver service has been created•3rd party tools are being created to provide linkage between resources and papers

RRID:nif-0000-30467

Page 38: FORCE11:  Creating a data and tools ecosystem

What have we learned?

Utopia plug-in: Steve Pettifer

•Authors are willing to adopt new types of citations•RRID = usage of research resource•Ideal: resolved by search engines without requiring specialized citation services•Citation drives registration•Clear role for repositories as authorities•Should RRID’s be DOI’s?

Will system work for data citation

and more complicated

research objects?

Page 39: FORCE11:  Creating a data and tools ecosystem

Data Citation Implementation Group

Page 40: FORCE11:  Creating a data and tools ecosystem

FORCE11 Vision• Modern technologies enable vastly improve knowledge transfer and far wider

impact; freed from the restrictions of paper, numerous advantages appear

• We see a future in which scientific information and scholarly communication more generally become part of a global, universal and explicit network of knowledge

• To enable this vision, we need to create and use new forms of scholarly publication that work with reusable scholarly artifacts

• To obtain the benefits that networked knowledge promises, we have to put in place reward systems that encourage scholars and researchers to participate and contribute

• To ensure that this exciting future can develop and be sustained, we have to support the rich, variegated, integrated and disparate knowledge offerings that new technologies enable

No single infrastructure serves everything; cooperation in defining a global system of scholarly communication

Page 41: FORCE11:  Creating a data and tools ecosystem

Notes & References for Data Citation Principles

Notes[1] CODATA 2013: sec 3.2.1; Uhlir (ed.) 2012, ch 14; Altman & King 2007[2] CODATA 2013, Sec 3.2; 7.2.3; Uhlir (ed.) 2012,ch. 14[3] CODATA 2013, Sec 3.1; 7.2.3; Uhlir (ed.) 2012, ch. 14[4] Altman-King 2007; CODATA 2013, Sec 3.2.3, Ch. 5; Ball & Duke 2012[5] CODATA 2013, Sec 3.2.4, 3.2.5, 3.2.8[6] Altman-King 2007; Ball & Duke 2012; CODATA 2013, Sec 3.2.2[7] Altman-King 2007; CODATA 2013, Sec 3.2.7, 3.2.8[8] CODATA 2013, Sec 3.2.10

References• M. Altman & G. King, 2007. A Proposed Standard for the Scholarly Citation of

Quantitative Data, D-Lib• Ball, A., Duke, M. (2012). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh:

Digital Curation Centre. • CODATA-ICSTI Task Group on Data Citation, 2013; Out of Cite, Out of Mind: The

Current State of Practice, Policy, and Technology for the Citation of Data. Data Science Journal

• P. Uhlir (ed.),2011. For Attribution -- Developing Data Attribution and Citation Practices and Standards. National Academies of Sciences