Research Data Management in the Humanities and Social Sciences

Preview:

Citation preview

Research Data Management: Humanities and Social Sciences Edition

CC BY-NC

Celia Emmelhainz and Suzi ColeAugust 11, 2015

Modified from presentation by Leslie Barnes, Dylanne Dearborn, Andrew Nicholson

at http://guides.library.utoronto.ca/RDM-intro

• All liaison librarians need a basic knowledge of research data management (RDM).

• RDM is part of the librarian’s toolkit for serving faculty research needs.

• We don’t all need to be data experts, just as we aren’t experts in many areas that we cover.

• RDM is one of many topics we discuss with faculty over time, like collections, instruction, course guides, and student research.

• Our faculty may not know RDM terms or may not understand what our institutional repository or other archives can do with data.

• Humanists may react negatively to the term “data.”• (Optional): we can faculty by reading their drafts of data

management plan: if we don’t understand, reviewers won’t either.• Knowing data concepts enhances our role & expands our visibility.• Data collection and the data lifecycle are part of where we help

with curation in the library.• This is a new knowledge area for all academic librarians.

Our assumptions

Why do academic libraries help with data management?

• Library culture is to acquire, organize, and preserve information • Logical extension of services we’ve traditionally been involved with• Libraries bring people together across disciplinary differences & campuses

Reading: Coates (2014) Ensuring research integrity: the role of data management in current crises. C&RL News 75(11): 598-601.

After these sessions, you should…

● Know the concepts in data management

● Feel less anxious when talking about data

● Begin listening to faculty talk about their research process and outputs

● Know where to get more help with research data for faculty in your disciplines

But why liaisons?

Info: eScience Team presentation on liaison roles, Image: CC0 from pixabay.com

A logical extension of our role as connections between the library and teaching faculty

A great way to show faculty that we care about their research as well as teaching

Liaisons as natural point of “triage”

Liaisons – Learning Over Time

First Steps: Get comfortable with the idea of research data management.

Next Steps: Start a conversation with faculty about their needs, share resources, and direct them to data librarians for complex questions.

Moving Ahead: Take self-paced courses for librarians on the web. And try it out! Try managing data for one of your own projects.

Source: eScience Team presentation on liaison roles for data management

Our path…

Today …introduction to data management…types of research data you’ll encounter…data formats and organization

Thursday…intro to data storage…intro to data sharing…advising on data management plans

DATA?Q1: What is

Prompt: what materials do your faculty use to make sense of their research?

“Research data is collected, observed, or created

for purposes of analysis to produce original research results.”

- U Edinburgh

DATAQ2: What are

in the humanities?

Textual data in the humanities could include:

- Scholarly editions- Text corpora- Text with markup- Thematic collections- Annotations- Accompanying analysis - Finding aids

Cf: guides.library.ucla.edu/c.php?g=180580&p=1187629, guide.dhcuration.org/intro/, image source: slideshare.net/ULCCEvents/the-humanities-and-data-management

Data in the qualitative social sciences could include:

• microfilms• copies of old

documents• oral interviews• video tapes• hand-written

records

from: www.nsf.gov/sbe/ses/common/archive.jsp

Humanities and arts data:● Texts used for research● Annotations● Images and illustrations● Citations ● Bibliographic information● Contextual information● Audio or video files

Health and Life Sciences data: Health indicators, vital signs Protein or genetic sequences Spectra and images Artifacts and samples Slides and specimens

Social Sciences data:● Survey responses● Focus groups and interviews ● Administrative records● Demographic information● Opinion polling● Maps and geospatial data● Websites, primary sources

Physical Sciences data: Sensor or lab measurements Computer modeling and

simulations Observations and/or field notes Numerical measurements

Cf: Best Practices for Arts/Humanities Data Management Plans, CU-Boulder http://bit.ly/1MkKCIa

DigitalThoreau.org: On the left, the Princeton edition of Walden; right, original 1847 draft with changes marked up.

Text Encoding Initiative (TEI) is a markup language that records the structure of text (author, chapters, pages, quotes) for digital humanities/curation purposes.

Ask Yourself (#1):

Using a project summary, ask yourself:

- what is this research project about? - what types of data are being collected- what types of data are being created

data (the stuff we do research with) are vital at every point in the

research lifecycle. Image: www.lib.uci.edu/dss/images/lifecycle.jpg

example: temperature data from a lake

Raw Processed Analyzed Finalized/Published

Example: data across the lifecycle

WHY manage data?

① for the researchers’ own current/future benefit② for transparency and integrity③ for sharing knowledge & how constructed

④ to meet grant requirements (NEH, NSF)⑤ to comply with ethics requirements⑥ to increase exposure to faculty research

2: Data Formats and Organization

CC image from pixabay.com/en/filing-cabinet-office-furniture-146160/

File Naming video

● Use meaningful names ● Avoid special characters ● Use caps or underscores, not spaces● Choose a standard date format:

YYYYMMDD or YYYY-MM-DD● Label versions (v2, v15)

Data Structures videoCould organize by: ● Type of information● Date and time● Research project● Theme or subject

frontispieces/20141211/images

images/frontispieces/20141211

Data Dictionaries and CodebooksExplains what a dataset contains:● Contents or organization of a file● Glossary of key concepts or terms● Definitions for each variable name● Describes relationships of tables/files● Codes that have been used to sort data● Sampling or other methods used

Use open formats when possible:

“open source” formats keep files accessible over time; proprietary formats may be lost of a company goes out of business. Open formats let future researchers access your data!

Video: .mov, .mpegAudio: .wav, .mp3Data: .csv, .sasImages: .tiff, JPEG 2000Text: PDF/A, ASCII

Ask Yourself (#2):

Using the project summary, ask yourself:

- what file formats are the data now in? - do they need conversion to open formats?- are they well documented with metadata?

Intersession exercise:

Read the NEH guidelines for data management.

View any two data management libguides: Who is the audience? What services are offered? How does it connect to users?

Briefly review your chosen project summary, in preparation for the final class.

Research Data Management: Session Two!

CC BY-NC

Celia Emmelhainz and Suzi Cole

August 13, 2015

Modified from presentation by Leslie Barnes, Dylanne Dearborn, Andrew Nicholson

at http://guides.library.utoronto.ca/RDM-intro

3: Data Security and Sensitive Data

CC image: pixabay.com/en/computer-security-business-767784/

Don’t let this be you! (or your faculty, or your students…)

Image www.neatorama.com/2013/04/24/Backup-Your-Data/

Common options for data storage:

● Local hard drives (weak)Ex: personal or office desktop, laptop computer

● External storage devices (weak) Ex: USB drives, External hard drives

● Networked storage (okay)Ex: university servers, but see Colby**

● Cloud storage services (okay) Ex: Microsoft, RackSpace, Amazon, Google

Data Storage: Best Practices● Back up all data frequently, especially after

major changes

● Automate the backup process

● Use ‘versioning software’ (see ITS) or file names to track changes in team projects

The “Rule of 3”: Keep three copies of key data… in at least two different locations

(original file, local backup, remote backup)… in at least one offline/offsite location

Sensitive Data:

…is any data that, if released, could harm the people who participated in the research:

● Address, birth date, name, location● Sensitive political opinions● Sexual practices● GPS data locating endangered species● Coordinates for burial sites or sacred places

This is treated with caution; few archiving options now.

Concepts in Sensitive Data

● Research ethics: protect identities of people interviewed; minimize risk of any leaks

● Confidentiality: how participants’ identifiable private information will be managed and disseminated

● Disclosure risk: increased with online accessibility of data or storage of documents

Sensitive Data: Best Practices

● Collect data without identifying information, if possible

● Strip sensitive or identifying information before archiving or sharing research data

● Encrypt your computer, and use secure connections, and secure servers

● Place sensitive data in a restricted archive with an embargo (time delay) or ethics approval required for access

Ask Yourself (#3):

Using the project summary, ask yourself:

- where will data be stored? - who is responsible for storage and backup? - how will you manage access to sensitive

data?

4: Data Retention & Preservation

image from datasupport.researchdata.nl/

“What data do I keep?”It all depends on:

…whether data is irreplaceable

e.g. are there other copies of this book, document, version, image, interview?

…how much data is needed to verify or reanalyze a research project

…policies of funders, IRB, discipline

Best Practices: Data Preservation

● Use open-source, non-proprietary files

● Include all software needed, if possible

● Note all files and their relationship/structure

● Identify who is responsible for preservation

● Determine how long data should be held

● Budget time and money before starting a project to properly preserve and archive data at the end!

Ask Yourself (#4):

Using the project summary, ask yourself:

- Which data should be kept? Why? - How long should data be kept for? - Who is responsible to preserve the data?

5: Data Sharing and Publication

Fears in sharing data…

Often, researchers want to hide their data:● Fear criticism of their methods/results● Fear exposure of confidential data● Fear political/legal ramifications● Fear getting “scooped” on analysis● Believe benefits are low, and the cost is high

CC image: pixabay.com/en/hands-holding-embracing-loving-718562/

But, sharing data…

● Is often required by journals and funders

● Reduces the costs of research by reducing project duplication

● Is a valuable check on methods and ethics

● Helps promote faculty discoveries

● Increases the impact of faculty work

● May support faculty tenure or salary increases!

Relevant data repositories:

and of course…

Data Papers:

Dataset Description

Reuse Potential

Methods

Overview/Context

Data as a Publication● Data which has been shared can be cited:

Data citations involve: author, title, year, publisher / archive, version, URL or DOI for access.

● Data citations are a metric that can support tenure and promotion for our faculty!

● ORCiDs can help people find and cite data by a given researcher.

Best Practices in Data Sharing

● Find out who owns the data (researcher? university? funding organization?)

● Review legal issues such as copyright or publishers’ embargoes

● Consider ethical issues related to sensitive data or communities

● See publisher/funder requirements for sharing

Data Management Plans

CC image: pixabay.com/en/whiteboard-man-presentation-write-849812/

What’s in a Data Management Plan?

All the things we’ve discussed!

What’s in a Data Management Plan?

● What types of data will be created?● Who will own, have access to, and be

responsible for managing these data?● What equipment or methods will capture,

process and document the data? ● Where will data be stored during and after

active research? ● How will the data be shared with current or

future researchers?

Data Management Plans (DMPs) are a great way to…

plan how you’ll handle research materials describe how you’ll document, store, and

share data so that others can use it remain accountable for how you use and

share research materials get funded on major research projects!

All research proposals sent to the National Science Foundation (NSF) must include a 2-page data management plan, showing how the data will be cared for and shared.

The NSF is a common source of research money in: anthropology, geography, psychology, economics, government, STS, and many interdisciplinary projects.

The NSF expects that all researchers:

“should be prepared to place their data in fully cleaned and documented form in a data archive or library within one year after the expiration of an award.

Before an award is made, investigators will be asked to specify in writing where they plan to deposit their data set”

- National Science Foundation guide for social and economic sciences at nsf.gov/sbe/ses/common/archive.jsp

For the NEH, data are “materials generated or collected during the course of conducting research.”

Humanities data such as “citations, software code, algorithms, digital tools, documentation... geospatial coordinates… reports, and articles” should be archived. Sensitive information can be excluded.

So, humanities faculty should also have a plan for how they’ll archive and share their research data! Source: neh.gov/files/grants/data_management_plans_2015.pdf

How do we actually make DMPs?

● Templates are a starting point:

● However, researchers still need to carefully think through data issues with grants officers, peers, or librarians

● http://libguides.colby.edu/data_mgmt

Sample DMPSimage: asphalttexas.com/wp-content/uploads/2014/06/Screen-Shot-2014-06-18-at-4.33.29-PM.png

Data management at Colby:• Liaisons are first point of contact

• Suzi and Celia advise on further issues

• We are an ICPSR member; quantitative researchers can deposit data there.

• Images and data may be archived in Digital Commons/Shared Shelf; check with Marty.

cf. libguides.colby.edu/data_mgmt.

Question: What 3 things can you do this year with data management?

Image: http://www.dailymail.co.uk/news/article-2728736/Otter-aerobics-Large-group-spotted-going-paces-synchronised-exercise.html

More questions? Contact us!

Celia Emmelhainzcelia.emmelhainz@colby.edu

Suzi Coleswcole@colby.edu

Thanks to New England Collaborative Data Management Curriculum for sharing their slides.

Many thanks to Leslie Barnes, Dylanne Dearborn, and Andrew Nicholson at University of Toronto for sharing their abbreviated slides (http://guides.library.utoronto.ca/RDM-intro), from which this presentation was adapted for the humanities.

Recommended