27
What have Scientists Planned for Data Sharing and Reuse? A Content Analysis of NSF Awardees’ Data Management Plans Renata Curty, Youngseek Kim & Dr. Jian Qin Baltimore, 4-5 April 2013

RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

  • Upload
    asist

  • View
    105

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

What have Scientists Planned for Data Sharing and Reuse? A Content Analysis of NSF Awardees’ Data Management Plans

Renata Curty, Youngseek Kim & Dr. Jian Qin

Baltimore, 4-5 April 2013

Page 2: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Motivation

While the NSF mandate gives researchers plenty flexibility to define their own DMP and many academic institutions provide DMP writing support, little is known about how scientists address their strategies on their DMPs.

Page 3: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Study Design Online Survey: 20 questions

Target Population: NSF Awardees from January 18, 2011 to November 5, 2012 - Standard Grants - Total 16065

Random Sample: 1606 cases

Pilot Study: 100 Awardees (Survey Reformulation)

Final Deployment: 966 awardees, 169 responses (17.5%) and DMPs (68)

Page 4: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

NSF Directorate Amount Awarded

166 166

10%

16%

12%

18%

16%

14%

13%

BIO CISE EHR ENG GEO MPS SBE

Awards Info

Page 5: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Awardees InfoAge Organization Type

25-24

35-44

45-54

55-64

65+

7%

41%

26%

19%

7%

150 151

Gove

rnm

ent,

1%Co

mm

ercia

l, 3%

Non-

profi

t, 3%

Academia, 93%

Page 6: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Awardees InfoPosition in Academia

Others: Dean (3), Professor Emeritus (1), Professor of Practice (1), Lecturer/Instructor (1), Post-Doctoral Fellow (1), Emeritus Senior Scientist, Director, Expert Consultant, Administrative Faculty Position, Chair.

143 138

Assistant Professor

23%

Associate Professor

28%

Full Profes-

sor40%

Researcher6.77%

Tenured62%

Retired 2%

On Tenure Track25%

Non- Tenure Track11%

Page 7: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Geographical Distribution

109 Created with Google Fusion Tables.

Page 8: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

4.79

%0.

40%

3.01

%

22.7

5%

21.5

6%10

.24%

11.3

8%13

.77%

6.63

%

25.7

5%25

.75%

10.8

4%

23.3

5%23

.35%

22.8

9%

8.98

%10

.18%

33.1

3%

2.99

%2.

99%

13.2

5%

Strongly disagree Disagree Somewhat disagree Neither agree or disagree

Somewhat agree Agree Strongly agree

DMP is difficult to execute

DMP is important to formalize data sharing practices in scienceN=166m= 4.93 = 1.62

Writing a DMP for NSF proposal is a challenging taskN=167m= 3.89 = 1.45

N=167m= 3.79 = 1.51

Page 9: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Others: Computational Models, Surveys, DNA Sequences, Computer Codes, Crowdsourcing Data (Reviews)

Types of Data Documentation of Data

Will follow:

46% - Disciplinary practices

37% - Research project’s needs

17% - Institutional recommendations/ guidelines

158

3D Models 13.01% - 19Audio Files 12.33% - 18Curriculum Materials 21.23% - 31Data Models 27.40% - 40Field Notes 26.03% - 38Experimental Data 63.70% - 93Images 36.99% - 54Interview Transcripts 17.12% - 25Patient Records 0.68% - 1Samples 20.55% - 30Software 35.62% - 52Spreadsheets 40.41% - 59Video Files 21.23% - 31

Page 10: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Challenges Encountered None26%

Lack of guidance from my institution

29%

Lack of guidance from NSF36%

Appropriate infra-structure to archive/

preserve data41%

Level of granularity of data

25%

Data Description & Documentation

30%

Which stage(s) of research to share

the data 25%

Others:

Some projects do not generate data

Conflict between DMP requirement and IRB requirements regarding social and behavioral research data

Conflicts intellectual property and data protection

Long-term preservation issues

Conflicts individual/group vs. institutional strategies

169

Page 11: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Data Access & Availability

167

Others: “Publications”, “Available to NSF only”

Open 45%

Available with some restric-

tions51%

Restricted5%

By email request 45.52% - 61

Personal website 17.91% - 24

Research Group/Project Website 51.49% - 69

Institutional Repository 20.15% - 27

Disciplinary Repository 32.84% - 44

Page 12: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

164

Barriers for Data Reuse

Page 13: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Reuse Issues - Privacy, Anonymity & Confidentiality

“IRB restrictions on ability to share even deidentified data. Concern that sharing even deidentified data will discourage participation in the study.”

“For myself, no. But for others to use my data, yes: for qualitative data, under IRB requirements for the protection of human subjects around confidentiality and anonymity, DMPs are nearly impossible to implement without perhaps some kind of temporal restriction on them (like, ‘This archive can only be opened in 20 - 30 - 40 years’ or something like that)”

“The project involves human subject; so protections have to be put in place that may limit reuse applications in the future.”

“HIPAA [Health Insurance Portability and Accountability Act] issues - obtaining self reporting data on human subjects.”

Page 14: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Reuse Issues - Context, Time Factor & Documentation“My past data was collected on a unique system built specifically for the research project. Need lots of context to reuse the data.”

“The only problems I see is that data can be taken out of context in a way that produces results that might not be correct.”

“Data is specific to testing scenarios. The insight gleaned from our experimental data is of more importance than the data itself.”

“My data is for specific purposes and it is hard to conceive of how someone would use it for something else/different. Even with a significant amount of metadata it would be difficult for someone to know all the circumstances under which the data was collected and why it was collected.”

“All scientific data is collected in particular context. Mechanisms that facilitate the description of that context are lacking. The creation of metadata that provides this information is a cumbersome, boring task and there are few resources available to ease the burden.”

Page 15: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

“Systems are always changing...It would be best if we could upload data to NSF so that it will be publicly available in the same way NIST [National Institutes of Standards and Technology] publishes data.”

“Our raw data formats are extremely large, and need to be compressed into reduced, on-line archives for sharing. It is not possible for me as an individual PI to archive the raw data for others to examine.”

“My data is generally related to large software artifacts, so using it could involve quite a bit of work to get those artifacts running. This is something that I explicitly try to come up with solutions for in my DMPs.”

“Until NSF provides a free national repository for data archiving, we will not make progress in this area. If such an archive was available, it would be sensible to require researchers to place data there at the end of a grant and would allow other researchers to take advantage of it in a practical way.”

Reuse Issues - Format, Tools, Infrastructure Interoperability & Standards

Page 16: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

DMPs – Preliminary Content Analysis• Coding Scheme

Used both deductive and inductive approaches 35 codes

NSF DMP Policy and University of Virginia's Guideline Emerged from DMP statements

• Data Analysis Procedure A total of 766 utterances were identified 642 unique utterances

Page 17: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

DMPs’ Content

<Wordle Cloud Generated Based on Numbers of Each Code across the 68 DMPs>

Page 18: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Coding Scheme

Types of Data

Metadata Standards

Data Access & Sharing

Process

Data Archiving

Plan

Data Reuse Plan Others

• What to Generate• What Data

Types • How to Create• Where to Get

Existing Data

• Data Format• Metadata

Form• How to Create• Which

Metadata Standard

• Contextual Details Needed

• Discoverability of the Data

• When Available• How Available • What Available • Process for

Gaining Access • How Long

Retain the Right • Embargo Period • Ethical/Privacy

Issues • Compliance

with IRB Protocol • Whose

Intellectual Property

• Reusability of the Data

• Restrictions to Access

• Groups Interested In

• Foreseeable Uses/Users

• Strategy for Archiving Data• Which

Repository • Procedures for

Long-Term Storage • Data

Preservation Period • What Data

Preserved for Long-Term • Transformation

Required • Data

Documentation • Related

Information

• Data Lifecycle• Data Curation• Budget

Page 19: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Types of DataCodes Freq. Examples

What to Generate 58 Geochemical Data, Physical Samples, Mathematica (programing) Code, Course Materials

What Data Types 37 Gene Sequences, Experimental Data, Interview Transcript, Video Recordings

How to Create Data 25 Experimental Setup, Field Observation, Simulation, Survey, Interviews

Where to Get Existing Data 13 Moore Laboratory of Zoology, ArcView/GIS Inventories, Prior Study’s Database

Metadata StandardCodes Freq. Examples

Data Format 38 CSV file, TEMPO data file, XML format, SPSS file, plain text

Metadata Form 31 ArcGIS Metadata file, XML-base standard file, GIS database file

How to Create Metadata 14 Use existing metadata standards, or develop their own metadata standards

Which Metadata Standard 15 Dublin Core, DNA Sequence Metadata, EML (Ecological Metadata Language)

Contextual Details Needed 10 All aspect of the development project documented, experimental procedure record

Data Discoverability 7 Searches Built into Library, Searchable through Project Website

Page 20: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Data Access & Sharing Process

Codes Freq. Examples

When Available 28 Post-Publication, Post-Project, After Data Collection

How Available 37 Upon Request, Project Website, GMOD CHADO databases, Institutional Repository

What Available 33 Original research data (genome assemblies), survey data, educational materials

Process for Gaining Access 25 Email Request, Material Transfer Agreement, Direct Access from Web or Repository

How Long Retain the Right 18 Withhold until Publication, Years after Project Ends, Years after Data Production

Embargo Period 5 Years after data collection, Period for commercialization

Ethical/Privacy Issues 21 Privacy information is not available for public

Compliance with IRB Protocol 13 IRB application submission for human subject research

Whose Intellectual Property 17 Property of the PI and Co-PIs, Institutions, Open-Access

Page 21: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Data ArchivingCodes Freq. Examples

Strategy for Archiving Data 31 Hosted on the Web Servers at (university), ICPSR, disciplinary data repository

Which Repository 55 Organization website, institutional or discipline data repository

Procedures for Long-Term Storage

33 Submitted to databanks including NCBI GEO, Genbank, DataONE, Dryad

Data Preservation Period11 Minimum of five years post-grant funding, Long-

term preservation through disciplinary data repositories

What Data Preserved for Long-Term

7 All data and materials generated by this award, Genome Sequencing Data

Transformation Required 4 Keeping raw image data in its uncompressed form, transferred to IRI format

Data Documentation Submitted 11 Contextual details about experimental procedures, all aspects of the development project

Related Information Submitted 3 Metadata files, proposed study information, companion web page

Page 22: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Data Reuse PlanCodes Freq. Examples

Reusability of the Data 6 Descriptions about reusable methods (Used by a research community to follow-up)

Restrictions to Access 6 Access allowed for a certain group of researchers

Groups Interested In 8Wider research community studying the Great Lakes, academic geography organizations, and geography teacher associations

Foreseeable Uses/Users 10Available to engineers, clinicians, and medical researchers, sociologists and psychologists working in relevant sub-fields.

OthersCodes Freq. Examples

Data Lifecycle 1 Application of the Life Cycle Inventory databases

Data Curation 4 Curation (Consortiums and Partnerships)

Budget 9 Institution will absorb costs, no incremental costs , marginal costs

Page 23: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Data Available -

0

5

10

15

20

25

30

3 3

10

3

8

1

27

13

Page 24: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Types of Data Repositories for Long-Term Archiving

0

2

4

6

8

10

12

14

16

11

4

14

11

2

13 13

Disciplinary

Repository

External/Commercial Storage

Institutional

Repository

Internal/Institutional Storage

Journal Repository/ Supplement

Lab/Organization

Website

Not mentioned

/Specified

Page 25: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Some insights – DMPs’ Preliminary Analysis More informal/personal data sharing procedures rather than

formal/institutionalized data sharing and management plans

Most DMPs lacks content on “Metadata Standard” and “Data Reuse Plan”

Few have plans for long-term archiving. Very vague plans and ideas about long-term use of their data

Many DMPs addressed data archiving in institutional repositories that are not in existence yet, but expected to be created

A few DMPs mentioned interview transcripts will be available, but without addressing IRB issues

Page 26: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Future Directions

Survey a larger number of Awardees

More exhaustive coding analysis and in-depth exploration of the DMPs’ content

Analysis of DMPs to identify patterns, common challenges and best practices across and within different disciplinary communities

Page 27: RDAP13 Renata Curty: What Have Scientists Planned for Data Sharing and Reuse? A C…

Thank you!

[email protected]

Let’s Go Orange!