48
Scientists’ Data and Information Practices and Needs Carol Tenopir, University of Tennessee and Mike Frame, USGS June 15, 2011 UC3 Summer Webinar Series

Scientists’ Data and Information Practices and Needs Carol Tenopir, University of Tennessee and Mike Frame, USGS June 15, 2011 UC3 Summer Webinar Series

Embed Size (px)

Citation preview

Scientists’ Data and Information Practices and Needs

Carol Tenopir, University of Tennessee and

Mike Frame, USGSJune 15, 2011

UC3 Summer Webinar Series

Scientists’ Data and Information Practices and Needs:

A Baseline Assessment & Implications for Libraries

Carol Tenopir, University of Tennessee and

Mike Frame, USGSCo-Leaders of the DataONE Usability & Assessment Working Group

2

Provide universal access to data about life on earth and the environment that sustains it

1. Build on existing cyberinfrastructure

2. Create new cyberinfrastructure 3. Support new communities

of practice

3

Scientists

Data Managers

Public Officials

Citizen-scientists

Libraries & Librarians

Students & Teachers

Assessment-stakeholders

Publishers

5

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Data Life Cycle

Assessment

Baseline Assessment of Scientists (2010)

n=1329n=1317

Primary Discipline

Primary Discipline

social sciences15%

computer science/en-gineering

9%

physical sciences12%

environmental sciences & ecology36%

atmospheric science4%

biology14%

medicine2%

other7%

academic80%

government13%

others8%

Primary Work Sector

6

Meet the Scientists: Joe & Mabel

7

Joe is a biodiversity scientist employed by a government agency. He acts as a program manager and consultant. Joe oversees collection of new data in the field and also manages historical data from other providers. Joe has data from a variety of different projects conducted over the years.

Mabel is an academic environmental scientist. She collects and records data in the field on a variety of specimen variables and environmental impacts. Mabel has a data set related to her personal research interests, as well as data collected for a university museum collection.

Lessons Learned

8

1. Scientists need a variety of data types and many scientists are interested in sharing data.

9

10

experiment

observational

data models

biotic survey

abiotic survey

remote-sensed abiotic

remote-sensed biotic

social science survey

interviews

Other

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

54%48%

38% 34% 33%27%

20% 19%15%

6%

Data Types

share my data with others place at least some of my data into a central data repository

place all of my data into a central data repository

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

75%78%

41%

Current Sharing Practices

Willing to place all of my data into a central data repository with no restric-

tions

Appropriate to create new datasets from shared data

Willing to place at least some of my data into a central data repository with

no restrictions

Willing to share data across a broad group of researchers

0% 20% 40% 60% 80% 100%

41%

76%

78%

81%

Many are interested in sharing data

Percent agree

Joe & Mabel: About Sharing Data

13

“If NBII required anyone who extracted data through the portal to also share data with the portal, then a resounding yes.”

“I’m interested in having data available to researchers interested in larger questions, particularly climate change questions.”

“We are torn between putting it out there for everyone and worry about suffering the risk of something bad happening with it. Saddest thing would be if the data loses its use, where it isn’t shared.”

“I don’t think I would be opposed to it. It would not be a decision I would make personally; we would have to have permission to share.”

2. There are many barriers to sharing data and conditions that must be met.

14

Gap Between Willingness to Share and Accessibility

15

place at least some of my data into a central data repository

place all of my data into a central data repository

Others can access my data easily 0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

78%

41%36%

use other researchers' datasets if their datasets were easily accessible

willing to share data across a broad group of researchers

it is appropriate to create new datasets from shared data

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

84% 81%76%

Interest in Data Sharing

16

Reprints of articles

Reciprocal sharing agreement

Opportunity to collaborate

Acknowledge provider/funder

Formally cite provider/funder

0% 20% 40% 60% 80% 100%

70%

72%

81%

93%

95%

Conditions on data sharing

Percent agree

Lack of funding

Insufficient time to make data available

No place to put data

Don't have the rights to make the data public

0% 20% 40% 60% 80% 100%

40%

54%

24%

24%

More challenges ..

Percent agree

Lack of funding

Insufficient time to make data available

No place to put data

Don't have the rights to make the data public

0% 20% 40% 60% 80% 100%

43%

62%

24%

18%

40%

54%

24%

24%

More challenges ..

Percent agree

Joe & Mabel: About Restrictions & Conditions to Sharing Data

20

“We want to make sure that those of us who have been involved in gathering the data get appropriate recognition for it.”

“If someone were to ask about rare or endangered plants, I would limit that information to appropriate people: natural heritage, universities and federal agencies.”

“We will share it with people who want to use the data for restoration or research. If a consultant wants data to make money, then we are hesitant to hand it out.”

“Is there a mechanism by which we can know when our data is being used? Knowing how valuable we are to the general public comes from the use of our data.”

3. There are different needs, attitudes, and practices between scientists who work in government agencies and those who work in academia.

21

the process for cataloging/describing data

the tools for preparing my documentation

tools and technical support for data management during the life of the project

formal established process to store data beyond the project

0% 20% 40% 60% 80% 100%

62%

46%

40%

35%

48%

34%

52%

53%

GovernmentAcademic

“I am satisfied with …”

Percent agree/strongly agree

• Academic respondents are more likely to have sole responsibility for approving access to some or all of their datasets.– Academic 83%, Government 63%

23

Responsibilities for Data

• Government respondents are more likely to agree their organization was involved in:– “managing data during the life of the project”

• Government 52%, Academic 39%,

– “storing data beyond the life of the project” • Government 53%, Academic 46%

24

Organizational Involvement

25

“If other people are using my data then I somehow need to report that. I need to know how it’s being used and if any publications result.”

“I don’t have anything I’m keeping private. I’m willing to put it all out there.”

“I don’t have the authority to make decisions about data sharing. “

“Our data sharing policy makes it difficult for us to withhold parts of the datasets we receive. As a result, some data contributors only share sub-sets of their data.”

Joe & Mabel: The View from Government & Academic Organizations

4. The skill level of scientists and use and access to appropriate tools varies across the data life cycle.

26

DIF DwC DC EML FGDC Open GIS

ISO My Lab none

12 21 26

95 95 96 97

266

676

Metadata standard

What metadata standard do you currently use?

28

“We are currently redoing all of our collection databases at the museum. We are building an in-house system. We looked at available standards and decided to write our own.”

“For my research, very little metadata has been created. For metadata associated with the museum collection, Darwin Core has been used.“

“For contemporary sets, the person who submits the data also submits a metadata record. We create another record representing what we think it is. We have one version of the data, submitter may have a version they keep on their website. We want to be able to show that these are two different things.”

“We write FGDC records.”

Joe & Mabel: About Metadata

5. Scientists need assistance across the data life cycle.

29

30

% Government % Academic

Training on best practices 23 21

Funds for data management long-term 27 20

Funds for data management short-term 34 29

Tools and technical support for data management long-term

39 34

Tools and technical support for data management short-term

48 43

My organization provides…

Lack of funding

Insufficient time to make data available

No place to put data

Don't have the rights to make the data public

0% 20% 40% 60% 80% 100%

40%

54%

24%

24%

More challenges ..

Percent agree

Joe & Mabel: Looking for Assistance

32

“It is cumbersome to put those data sets together, but only because it is important. If there were ways to automate some of that information collection out of the data sets, it would help.”

“Maximum utility of the data would require geo-referencing of the data. We would need help geo-referencing the part of the collection that isn’t geo-referenced.”

“Ideally, we would like for our research results to be disseminated in a way that’s accessible and digestible to not just academics but to everybody.”

“Manpower. We need more people to handle these sorts of things.”

Are there standards?

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Data Life Cycle Scientist Challenges

How do I preserve my

data?

What tools do I use?

Will I get credit for my work?

How much will it cost?

What is a data management

plan?

Who can help me?

What is metadata?

Where do I preserve my

data?

Year 1 Year 2 Year 3 Year 4 Year 5

Scientists: BL

Future Assessments

Scientists: FU

Librarians: BL Librarians: FU

Policy Makers: BL Policy Makers: FU

Educators: BL Educators: FU

Library Policies: BL Library Policies: FU

Library and Librarian Surveys

• Library (1 per library) current practices• Librarian (individuals) attitudes and

perceptions• Started with ARL libraries (spring and summer

2011; 38 library responses and 223 librarians so far)

• Will expand to other North American academic libraries and librarians

Stewardship role (select &

deselect)?

Librarian & Library Assessment

Collect

Assure

Describe

Deposit

Preserve

Discover

Integrate

Analyze

Are RDS priority?

Role in partnering with

researcher?

Level of knowledge &

skills ?

Is there an agency repository that accepts data?

Level of participation with data?

Role of librarian discovering

data?

Level of involvement

with metadata?

Role of the librarian to help preservation?

Library SurveyResearch Data Services (RDS)

- Research data reference/consultation services to researchers are provided by individual discipline librarians (33%) or dedicated data librarians (17%) or a combination of both (50%).

- Almost half of the libraries (45%) do not have policies and/or procedures associated with research data services.

Library SurveyCollaboration for RDS

n=18

Library Survey Staffing issues

n=28

Library SurveyOpportunities for Staff for RDS

n=25

Librarian Survey

– Distributed to 950 librarians– Science, data, metadata, scholarly communication,

digital collection, electronic resources librarians– 223 people replied at least one question

Librarian Survey

• Interact with faculty, students, or staff in support of RDS 28% Yes-integral part, 41% Yes-occasionally, 32% No (n=221)

• With faculty or staff consultation on

n=192

n=194

n=193

Frequency of research data services performed by the librarian

n=167

n=167

n=167

n=166

Librarian Survey

• Outreach and collaboration w/ other RDS– Off campus 61% Never, 34% few times a year (n=157)– On campus 51% Never, 34% few times a year (n=157)

• Participation in … about RDS

informal discussion groups

working groups/professional groups

policy development

strategic planning

2%

3%

4%

3%

6%

8%

4%

4%

20%

12%

9%

11%

49%

40%

34%

40%

24%

39%

50%

42%

daily once a week once a month few times a year never

n=158

n=158

n=158

n=156

Librarian SurveySkills & Expertise

48%57%

31%

51%

n=157n=156 n=156n=157

Librarian Survey

Most important motivation to be involved in RDS

RDS are important to subject disci-pline I support

RDS is primary responsibility

personal interest in RDS

My job includes facilitating data contributions to our institutional

repository

My job includes metadata creation,

training, and/or management

Other My research includes RDS

0%

5%

10%

15%

20%

25%

30%

25%

23%

16%

14%13%

9%

2%

Next steps

• Follow-up to ARL libraries and librarians• Expand scope to other academic libraries• Federal libraries/librarians• Data Managers• Other Working Groups looking at citizen

scientists and UG educators