24
Data Management LIS 653 Starr Hoffman

LIS 653, Session 11: Data Management & Curation

Embed Size (px)

DESCRIPTION

An introduction to data management, data curation, and librarian roles related to data.

Citation preview

Page 1: LIS 653, Session 11: Data Management & Curation

Data Management

LIS 653

Starr Hoffman

Page 2: LIS 653, Session 11: Data Management & Curation

Data

Page 3: LIS 653, Session 11: Data Management & Curation

What is (are) Data? Observations

Sensor data, telemetry, survey data, sample data Experiments

Gene sequences, chromatograms Simulations

Economic models Derivations/Compilations

Text mining, data from public documents Documents & texts themselves = data

Research Process Observational conditions, experimental procedure,

instrumentation, label descriptions, units, metadata

Page 4: LIS 653, Session 11: Data Management & Curation

Librarian Roles & DataAdvisory

Original data: Consult on creating DMP Consult on data organization, methodology, etc. Consult on metadata practices Consult on archiving Help disseminate research

Journal publication, OA resources, blogs, etc. Deposit into repository (IR, 3rd party, etc.)

Secondary data: Consult on methodology / analysis Discovery…

Curatorial Manage IR (institutional repository) Create metadata for datasets Purchase / catalog / discovery for secondary data

Page 5: LIS 653, Session 11: Data Management & Curation

What is Data Management?

Planning for the short-term and long-term: care of and access to

…your data.

Or: What are you going to do with that data?

How will you describe it? How are you organizing it? After you’re done, where will you put it? How will you/others be able to access it? For how long?

Page 6: LIS 653, Session 11: Data Management & Curation

Data Management: Why Does it Matter?

Grant requirements Public access to funded research

Validation Replication Re-use, continue research Teaching

Natural disasters Computer failure/stolen

USB/hard drive failure/lost Files corrupted

Page 7: LIS 653, Session 11: Data Management & Curation

Funding Requirements

NSF: Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan” …describe how the proposal will conform to NSF policy on the dissemination and sharing of research results.

NIH: The NIH expects and supports the timely release and sharing of final research data… for use by other researchers. …expected to include a plan for data sharing or state why data sharing is not possible.

NEH Office of Digital Humanities NOAA IMLS NIJ

Page 8: LIS 653, Session 11: Data Management & Curation

DMP Considerations What data types, from what sources, in what formats will this

project produce? How much of it will there be?

How will you describe or document your data? Are there standards you will be using for this?

Will you be sharing your data? Do you have the rights to share the data? What did you tell the IRB?

How often do you need to backup your files? How do you need to be able to access your files? How many backups will you have?

How much storage space do you need? What is your budget for your storage?

Where are you going to archive or store the data? and how will it be accessed?

What are the roles and responsibilities around all of these things? i.e., Who's going to be doing all this?

Page 9: LIS 653, Session 11: Data Management & Curation

DMP Examples

Page 10: LIS 653, Session 11: Data Management & Curation

Planning the Data Life-Cycle

Consider…

Files: Size, format, organization

Security Storage/Backup system Retention Access/Transparency

Page 11: LIS 653, Session 11: Data Management & Curation

Data Lifecycle:Create / Analyze / Edit

File Management Consistency, brevity, description Versioning (v01, v02, FINAL) Avoid spaces

Directory structure/[Project]/[Grant Number]/[Event]/[Date]

File naming[description]_[instrument]_[location]_[YYYYMMDD].[ext]

Transparency/Sharing Document data: codebook, metadata

Page 12: LIS 653, Session 11: Data Management & Curation

File Structure & Naming Examples

Directory Structure/[Project]/[Grant Number]/[Event]/[Date]

/NYCPhysicalActivity/NOT-MH-14-033/Interview/20141109 /Dissertation/LitReview/LibraryLeadership/

File Naming[description]_[instrument]_[location]_[YYYYMMDD].[ext]

PhysicalActivity_InterviewQs_PS193_20141109.doc PhysicalActivity_InterviewResponses_20141022.xls LibraryLeadershipHenson_Article_2011.pdf Leadership_Survey_20130917.doc

Page 13: LIS 653, Session 11: Data Management & Curation

Metadata & Description

Variables: labels, meaning, how they were measured, units, codes

Survey questions Experimental procedures Research methodology Statistical analyses performed Preferred data citation

Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Retrieved from http://pewhispanic.org/datasets/

Page 14: LIS 653, Session 11: Data Management & Curation

Codebook Examples

Page 15: LIS 653, Session 11: Data Management & Curation

Codebook Examples

Page 16: LIS 653, Session 11: Data Management & Curation

Data Lifecycle:Publish, Store, Access, Reuse

File size & format Open vs. proprietary

Security Anonymize or encrypt? Levels may vary by access (org. vs. 3rd party)

Data Citation Sharing

Upload data & metadata Institutional repository, data center, etc. Persistent identifier

Page 17: LIS 653, Session 11: Data Management & Curation

Institutional Repositories

Page 18: LIS 653, Session 11: Data Management & Curation

Institutional Repositories

Page 19: LIS 653, Session 11: Data Management & Curation

Institutional Repositories

Page 20: LIS 653, Session 11: Data Management & Curation

Institutional Repositories

Page 21: LIS 653, Session 11: Data Management & Curation

Dataset Record in IR

Page 22: LIS 653, Session 11: Data Management & Curation

Other places data can live…

Figshare ICPSR Github DataUp Dropbox

(or other cloud storage) IF you use proper encryption

Lists of data repositories:  DataCite DataBib 

Page 23: LIS 653, Session 11: Data Management & Curation

Data Discovery Data Depositories (previous slide)

ICPSR Figshare

Institutional Repositories OpenDOAR (directory) Specific institutions

Data Catalogs Numeric Data Catalog (Columbia) GeoData (Columbia, others)

Gov & Public Sources (data producers) NYC OpenData Data.gov Census Bureau Bureau of Labor Statistics IMLS (Institute of Museum & Library Services)

Page 24: LIS 653, Session 11: Data Management & Curation

Replicated Data

And Finally…Geeky puns.