40
July 10, 2014 JISC-CNI 2014 ©UC Regents, 2014 1 Opening Up Data MacKenzie Smith University Librarian University of California, Davis

Opening up data – Jisc and CNI conference 10 July 2014

  • Upload
    jisc

  • View
    3.094

  • Download
    3

Embed Size (px)

DESCRIPTION

MacKenzie Smith, university librarian, University of California, Davis

Citation preview

Page 1: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

1July 10, 2014

Opening Up DataMacKenzie SmithUniversity LibrarianUniversity of California, Davis

Page 3: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

3

At Creative Commons, we believe scientific data should be freely available to everyone. We call this idea Open Data. Creative Commons legal tools can be used to make data and databases freely available. We’ve already had successful implementations in taxonomic, energy, genomics, disease research, geospatial, polar, and bibliometric disciplines, and are providing guidance to funders, institutions, private foundations, governments, the corporate sector, and other stakeholders. Read more about Creative Commons and data.

July 10, 2014

Page 4: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

4

U.S. Funding Agency PolicyNIH (2003): “The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers.” (>$500,000, include data sharing plan)

NSF grant guidelines: “NSF ... expects investigators to share with other researchers, at no more than incremental cost and within a reasonable time, the data, samples, physical collections and other supporting materials created or gathered in the course of the work. It also encourages grantees to share software and inventions or otherwise act to make the innovations they embody widely useful and usable.” (2005 and earlier)

NSF peer-reviewed Data Management Plan (DMP), January 2011

July 10, 2014

Page 5: Opening up data – Jisc and CNI conference 10 July 2014

©UC Regents, 2014 5

Credibility Crisis?

3/13/2014

Page 6: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

6July 10, 2014

Page 7: Opening up data – Jisc and CNI conference 10 July 2014

©UC Regents, 2014 7

Journal Data Sharing Policy 2011 2012

Required as condition of publication, barring exceptions

Required but may not affect editorial decisions

Encouraged/addressed, may be reviewed and/or hosted

Implied

No mention

10.6% 11.2%

1.7% 5.9%

20.6% 17.6%

0% 2.9%

67.1% 62.4%

3/13/2014

Source: Stodden, Guo, Ma (2013) PLoS ONE, 8(6)

Page 8: Opening up data – Jisc and CNI conference 10 July 2014

©UC Regents, 2014 8

Journal Code Sharing Policy 2011 2012

Required as condition of publication, barring exceptions

Required but may not affect editorial decisions

Encouraged/addressed, may be reviewed and/or hosted

Implied

No mention

3.5% 3.5%

3.5% 3.5%

10% 12.4%

0% 1.8%

82.9% 78.8%

3/13/2014

Source: Stodden, Guo, Ma (2013) PLoS ONE, 8(6)

Page 9: Opening up data – Jisc and CNI conference 10 July 2014

©UC Regents, 2014 9

Software in Scientific Discovery

JASA June• 1996• 2006• 2009• 2011

Computational Articles Code Publicly Available

9 of 20 0%33 of 35 9%32 of 32 16%29 of 29 21%

3/13/2014

Page 10: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

10

Open Science reaches the White HouseExecutive Memorandum directing federal funding agencies to develop plans for public access to data and publications (Feb 2013)

“data is defined... as the digital recorded factual material commonly accepted in the scientific community as necessary to validate research findings including data sets used to support scholarly publications...”

Executive Order directing federal agencies to make their own data publicly available (May 9)

July 10, 2014

Page 11: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

11

Current NIH View: Components of anAcademic Digital Enterprise

• Consists of digital assets• Datasets, papers, software, lab notes

• Each asset is uniquely identified and has provenance, including access control• e.g., publishing simply involves changing the access

control

• Digital assets are interoperable across the enterprise

July 10, 2014

Page 12: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

12

Barriers to Open Data SharingCode Data77% Time to document and clean up 54%52% Dealing with questions from users 34%44% Not receiving attribution 42%40% Possibility of patents -34% Legal Barriers (e.g. copyright) 41% - Time to verify release with admin 38%30% Potential loss of future publications 35%30% Competitors may get an advantage 33%20% Web/disk space limitations 29%

July 10, 2014

Survey of the Machine Learning Community, NIPS (Stodden 2010)

Page 13: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

13

2014 White House Big Data and Privacy Review

July 10, 2014

Pass National Data Breach Legislation that provides for a single national data breach standard, along the lines of the Administration's 2011 Cybersecurity legislative proposal.

Page 14: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

14July 10, 2014

Page 15: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

15

Higher Education responses• Infrastructure• Developing new tools across the research life cycle • Mostly individual institutions or disciplines• National initiatives emerging (e.g. ARL/AAU/APLU SHARE

initiative)

• Policy• Institutional Open Access policies • SHARE copyright group

• Training• ARL e-science institute• ARL spec kit on RDM activities• Current events

July 10, 2014

Page 16: Opening up data – Jisc and CNI conference 10 July 2014

©UC Regents, 2014 16

New Tools for Computational Reproducibility

Dissemination Platforms, e.g. DataONE DataVerse RunMyCode.org

Workflow Tracking and Research Environments, e.g. VisTrails Kepler Taverna

Embedded Publishing, e.g. Sweave Knitr VCR (Verifiable Computational Research)

3/13/2014

Page 17: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

17

Data Repositories• Disciplinary

• ICPSR, Genbank• Dryad, ONEShare• Sage Commons (Sage Bionetworks)

• Displinary/Institutional• DataVerse, Nesstar

• Institutional • IRs galore: e.g., UC’s Dash and Chronopolis, Purdue’s PURR, JHU’s Data

Conservancy, Stanford Digital Repository, many local DSpace/Fedora/Hydra/Islandora instances, Locally run and cloud hosted, locally run and cloud hosted

• Data Centers on every campus

• Generic/cloud• Figshare• DuraCloud

July 10, 2014

Page 18: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

18

DryadWe continued to refine the infrastructure for linking between articles and data. The web service for returning the corresponding Dryad data DOI when queried with an article DOI is now being used by Elsevier to provide a link to the data from ScienceDirect for 40 different Elsevier journals that have at least one data package in Dryad. Dryad is an international collaborator in the EU-funded ORCID DataCite interoperability Network Project (odinproject.eu), which this past year introduced a tool enabling researchers to add research outputs with DataCite DOIs (such as Dryad data packages) to their ORCID profiles. We also introduced regular updating of linkages between related records in PubMed, Genbank, and EuropePMC to data packages in Dryad. To further promote discoverability and accessibility, Dryad officially became a DataONE Tier 1 member node. Improvements to the curation interface have led to an increase in curation efficiency of greater than 25% in the past year.

July 10, 2014

Dryad Annual Report, 2013

Page 19: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

19

Dryad: Embargo Usage

Embargo selections of Dryad data authors for the 10,108 files in Dryad deposited by September 20, 2013. Data include only datasets related to articles published in journals for which the authors had the option of selecting an embargo. (B) Longer term embargoes (>1 year) by journal that granted them.

Data Archiving: Suggestions to Increase Participation. PLoS Biol12(1): e1001779doi:10.1371/journal.pbio.10017796

July 10, 2014

Page 20: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

20July 10, 2014

Page 22: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

22July 10, 2014

Page 23: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

23

Data Sharing: Discoverability

COMINGNIH data catalog (part of the BD2K initiative)

SHARE registry

HERE NOWThomson Reuters Data Citation Index

OCLC WorldShare (includes OAIster)

Google/Google Scholar

July 10, 2014

Page 24: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

24

Data Sharing: Identifiers

• DOIs for Data (DataCite, CrossRef, EZID)

• ORCIDs for Researchers

• FundRef for funding agencies

• Still missing good institutional identifiers

July 10, 2014

Page 25: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

25July 10, 2014

Page 26: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

26July 10, 2014

Page 27: Opening up data – Jisc and CNI conference 10 July 2014

Dealing with Data Rights

• An IP rights strategy, including the promotion of university-based Open Access policies and favorable licensing terms, will be part of the scaffolding that will enable the layers of SHARE to develop.

• Rights subgroup formed to deal with this

• A broad collective action by AAU and APLU – to be discussed with AAU Presidents in April

Page 28: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

28

Data archiving by library

Data citation support

Other Data Mangement training

DMP consulting

0 10 20 30 40 50 60

40

22

38

42

23

33

48

47 Data management planning

Data management support

Data sharing & archiving

Key finding: RDM Service Offering

ARL SPEC Kit 334: Research Data Management Services (July 2013)http://publications.arl.org/Research-Data-Management-Services-SPEC-Kit-334/

July 10, 2014

Page 29: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

29

Data management planning

July 10, 2014

DMP training DMP consulting0

10

20

30

40

50

60

89%N = 48

61%N = 33

ARL SPEC Kit 334: Research Data Management Services (July 2013)http://publications.arl.org/Research-Data-Management-Services-SPEC-Kit-334/

Page 30: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

30July 10, 2014

Page 31: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

31July 10, 2014

Page 32: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

32

What You Need to Know about Writing Data Management Plans

An ACRL e-Learning Online Course, July 14-August 1, 2014

Description: Demand for data management plan consultants is growing as more granting agencies add this requirement. Most presentations concerning data management do not provide practical advice on how to consult with researchers writing a data management plan for grant submission. This course teaches participants about the elements of a successful data management plan, and provides practice critiquing data management plans in a supportive learning environment where no grant funding is at stake.  Join two experienced data management plan consultants with experience in liaison librarianship and information technology as they demonstrate how all librarians have the ability to successfully consult on data management plan. Each week will include assigned readings, a written lecture, discussion questions, weekly assignments, and live chats with the instructors.   

Participants will examine how data and metadata are defined, open data formats, dark archives, and secure repositories as well as addressing specialty concerns such as how securely preserve information related to at risk populations, etc. Selection of effective long term data preservation and sharing strategies will also be examined. Lastly, participants will evaluate sample data management plans from the sciences, social sciences, and the arts and humanities as a final project for the course. Critiques of each plan will be presented to the class during the final chat session at the end of the course.

Learning Outcomes:List specific data depository resources in order to formulate recommendations for researchers to securely deposit and share their data.Learn about how different funding agencies, and departments within those agencies, have different requirements for data management plans in order to determine how to effectively advise each researcher according to the requirements for their specific plan.Analyze sample data management plans in order to develop an understanding of what constitutes a thorough data management plan. Presenters: Dee Ann Allison, Professor, University of Nebraska-Lincoln; Kiyomi Deards, Assistant Professor, University of Nebraska-Lincoln

Course Requirements: Your participation will require approximately three to five hours per week of primarily asynchronous activities to:Read the online seminar materialPost to online discussion boardsSynchronous chat sessions (optional)Complete online exercisesComplete a seminar evaluation form

July 10, 2014

Page 33: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

33

New England Collaborative Data Management Curriculum

July 10, 2014

Page 34: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

34July 10, 2014

Page 35: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

35

CLIR Postdoc Fellowship Program“CLIR Postdoctoral Fellows work on projects that forge and strengthen connections among library collections, educational technologies, and current research. The program offers recent PhD graduates the chance to help develop research tools, resources, and services while exploring new career opportunities. Host institutions benefit from fellows'  field-specific expertise by gaining insights into their collections' potential uses and users, scholarly information behaviors, and current teaching and learning practices within particular disciplines.”

• >110 fellows so far

• UC Davis postdoc in neuroscience: Jonathan Cachat

July 10, 2014

Page 36: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

36

We get it already…

“Painstakingly detailed surveys have been performed across several research organizations, particularly in North America (CLIR; ARL; CDL), Europe (DCC; RIN; NESTA) and Australia (ANDS). The same overall picture emerges:

• Research data is found in a dizzying number of file formats (some proprietary)

• Research data can be digital or non-digital• Lack of metadata & documentation• Data storage is desperate, unorganized, unsecured and

researchers need more space• Researchers welcome help with federal funding mandates (Data

Management Plans)• PIs are not concerned with data sharing preparation – a time

consuming, thankless job in the current publish-or-perish merit system”

July 10, 2014

Page 37: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

37

Do No Harm

“There is ample evidence of a need for research data management services as provided by reports published from libraries and organizations (cited above). However, it is critical to recognize that sloppy record keeping and the constant, fast-paced strive for bigger, faster, stronger technological infrastructure are inherent to the scientific enterprise. Any services that sterilize or mandate rigid process control may provide solutions to specific data concerns, but will do so at a detriment to science – not an ideal solution”

Amari, Beltrame, Bjaalie, & Dalkara, 2002; Gardner et al., 2003; Kubilius, 2014; Landreth & Silva, 2013; Wallis et al., 2013; White, Baldridge, Brym, Locey, & McGlinn, 2013.

July 10, 2014

Page 38: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

38

“Mandated changes that are detrimental to the flow rate of a daily research enterprise will not be successful. This challenges the core of research data management, curation and service efforts. It highlights the fact that sometimes efforts to help an external group (e.g., neuroscientists) with internal expertise (e.g., library skill sets), even with the best intentions and solid rational can be unhelpful and unsustainable.”

The problem we are trying to solve is advancing the environmental support and training provided by the university to researchers and students in order to fulfill its mission. Researchers and students are aware of the growing popularity and potential of big data, open data, interdisciplinary data. They desire opportunities, skills and support.

Advancing the environmental support will improve their research, it will improve their education – it gives them an edge, and for that a university is recognized.”

July 10, 2014

Page 39: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

39

Requires• Less emphasis on infrastructure

• More emphasis on policy• Citation practices in different research disciplines for data,

software• Legal tools for data and software sharing in different

contexts

• Lots more emphasis on training and culture change• Not of librarians, but researchers themselves

July 10, 2014

Page 40: Opening up data – Jisc and CNI conference 10 July 2014

JISC-CNI 2014 ©UC Regents, 2014

40

Questions?

July 10, 2014