Upload
dr-starr-hoffman
View
72
Download
1
Tags:
Embed Size (px)
DESCRIPTION
An introduction to data management, data curation, and librarian roles related to data.
Citation preview
Data Management
LIS 653
Starr Hoffman
Data
What is (are) Data? Observations
Sensor data, telemetry, survey data, sample data Experiments
Gene sequences, chromatograms Simulations
Economic models Derivations/Compilations
Text mining, data from public documents Documents & texts themselves = data
Research Process Observational conditions, experimental procedure,
instrumentation, label descriptions, units, metadata
Librarian Roles & DataAdvisory
Original data: Consult on creating DMP Consult on data organization, methodology, etc. Consult on metadata practices Consult on archiving Help disseminate research
Journal publication, OA resources, blogs, etc. Deposit into repository (IR, 3rd party, etc.)
Secondary data: Consult on methodology / analysis Discovery…
Curatorial Manage IR (institutional repository) Create metadata for datasets Purchase / catalog / discovery for secondary data
What is Data Management?
Planning for the short-term and long-term: care of and access to
…your data.
Or: What are you going to do with that data?
How will you describe it? How are you organizing it? After you’re done, where will you put it? How will you/others be able to access it? For how long?
Data Management: Why Does it Matter?
Grant requirements Public access to funded research
Validation Replication Re-use, continue research Teaching
Natural disasters Computer failure/stolen
USB/hard drive failure/lost Files corrupted
Funding Requirements
NSF: Proposals must include a supplementary document of no more than two pages labeled “Data Management Plan” …describe how the proposal will conform to NSF policy on the dissemination and sharing of research results.
NIH: The NIH expects and supports the timely release and sharing of final research data… for use by other researchers. …expected to include a plan for data sharing or state why data sharing is not possible.
NEH Office of Digital Humanities NOAA IMLS NIJ
DMP Considerations What data types, from what sources, in what formats will this
project produce? How much of it will there be?
How will you describe or document your data? Are there standards you will be using for this?
Will you be sharing your data? Do you have the rights to share the data? What did you tell the IRB?
How often do you need to backup your files? How do you need to be able to access your files? How many backups will you have?
How much storage space do you need? What is your budget for your storage?
Where are you going to archive or store the data? and how will it be accessed?
What are the roles and responsibilities around all of these things? i.e., Who's going to be doing all this?
DMP Examples
Planning the Data Life-Cycle
Consider…
Files: Size, format, organization
Security Storage/Backup system Retention Access/Transparency
Data Lifecycle:Create / Analyze / Edit
File Management Consistency, brevity, description Versioning (v01, v02, FINAL) Avoid spaces
Directory structure/[Project]/[Grant Number]/[Event]/[Date]
File naming[description]_[instrument]_[location]_[YYYYMMDD].[ext]
Transparency/Sharing Document data: codebook, metadata
File Structure & Naming Examples
Directory Structure/[Project]/[Grant Number]/[Event]/[Date]
/NYCPhysicalActivity/NOT-MH-14-033/Interview/20141109 /Dissertation/LitReview/LibraryLeadership/
File Naming[description]_[instrument]_[location]_[YYYYMMDD].[ext]
PhysicalActivity_InterviewQs_PS193_20141109.doc PhysicalActivity_InterviewResponses_20141022.xls LibraryLeadershipHenson_Article_2011.pdf Leadership_Survey_20130917.doc
Metadata & Description
Variables: labels, meaning, how they were measured, units, codes
Survey questions Experimental procedures Research methodology Statistical analyses performed Preferred data citation
Pew Hispanic Center. (2008). 2007 Hispanic Healthcare Survey [Data file and code book]. Retrieved from http://pewhispanic.org/datasets/
Codebook Examples
Codebook Examples
Data Lifecycle:Publish, Store, Access, Reuse
File size & format Open vs. proprietary
Security Anonymize or encrypt? Levels may vary by access (org. vs. 3rd party)
Data Citation Sharing
Upload data & metadata Institutional repository, data center, etc. Persistent identifier
Institutional Repositories
Institutional Repositories
Institutional Repositories
Institutional Repositories
Dataset Record in IR
Other places data can live…
Figshare ICPSR Github DataUp Dropbox
(or other cloud storage) IF you use proper encryption
Lists of data repositories: DataCite DataBib
Data Discovery Data Depositories (previous slide)
ICPSR Figshare
Institutional Repositories OpenDOAR (directory) Specific institutions
Data Catalogs Numeric Data Catalog (Columbia) GeoData (Columbia, others)
Gov & Public Sources (data producers) NYC OpenData Data.gov Census Bureau Bureau of Labor Statistics IMLS (Institute of Museum & Library Services)
Replicated Data
And Finally…Geeky puns.