46
Documentation and Metadata Sherry Lake Data Life Cycle Re-Purpose Re-Use Deposit Data Collection Data Analysis Data Sharing Proposal Planning Writing Data Discovery End of Project Data Archive Project Start Up Andrea Denton

Documentation and Metdata - VA DM Bootcamp

Embed Size (px)

Citation preview

Page 1: Documentation and Metdata - VA DM Bootcamp

Documentation and Metadata

Sherry Lake

Data Life Cycle

Re-Purpose

Re-Use Deposit

Data

Collection

Data

Analysis

Data

Sharing

Proposal

Planning

Writing

Data

Discovery

End of

Project

Data

Archive

Project

Start Up

Andrea Denton

Page 2: Documentation and Metdata - VA DM Bootcamp

We’ll Explore

• Why is documenting your research important?

• What do you document (files? datasets? projects? Hands-on

• What are the common types of documentation?

• Metadata: What is it? Why is it important? Hands-on

• Q & A

Page 3: Documentation and Metdata - VA DM Bootcamp

You’re already documenting your data

• Notebook

– Paper

– Digital

– Lab

• Folders with notes, text files

• Sources, experiments or surveys,

procedures, etc.

Page 4: Documentation and Metdata - VA DM Bootcamp

Critical roles of data documentation

• Data Use

– To know enough details about how the how the data were collected and stored

• Data Discovery

– To be able to identify important data sets

• Data Retrieval

– To know how and where to access data

• Data Archiving

– Data can grow more valuable with time, but only if the critical information required to retrieve and interpret the data remains available

Page 5: Documentation and Metdata - VA DM Bootcamp

Information EntropyIn

form

atio

n C

on

ten

t o

f D

ata

and

Met

adat

a

Time of data development

Specific details about problems with individual items or specific dates are lost relatively rapidly

General details about datasets are lost through time

Accident or technology change may make data unusable

Retirement or career change makes access to “mental storage” difficult or unlikely

Loss of investigator leads to loss of remaining information

TIME

From Michener et al 1997

http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2)

Page 6: Documentation and Metdata - VA DM Bootcamp

Elements of Documentation

Good data documentation answers these

basic questions:

• Why were the data created?

• What is the data about?

• What is the content of the data? The

structure?

• Who created the data?

• Who maintains it?

Page 7: Documentation and Metdata - VA DM Bootcamp

Elements of Documentation, continued

• How were the data created?

• How were the data produced/analyzed?

• Where was it collected (geographic

location)?

• When were the data collected? When

were they published?

• How should the data be cited?

Page 8: Documentation and Metdata - VA DM Bootcamp

Documentation throughout your research

Variable or Item Level File or Dataset Level Project or Study Level

• Labels, codes, classifications

• Missing values (andhow they are represented)

• Inventory of data files

• Relationship between those files

• Records, cases, etc.

• What the study set out to do; researchquestions

• How it contributes new knowledge to the field

• Methodologies used, instruments and measures

UK Data Service: http://ukdataservice.ac.uk/media/440277/documentingdata.pdf/

Page 9: Documentation and Metdata - VA DM Bootcamp

Exercise 1: Exploring Documentation

• Refer to the files on the Data Management

Bootcamp site, either

– http://guides.lib.odu.edu/VADMBC/materials

• In the section Documentation and Metadata

Exercise_1_Data_Documentation Worksheet

– Or, you may have a handout “Exercise 1”

Page 10: Documentation and Metdata - VA DM Bootcamp

Exercise 1: Exploring Documentation

• For Column 1, take 2-3 minutes and, for each

row, write down what general concept (who,

what, when, where, how, or why, or a combination

of these) that field describes about data, if

applicable.

• Now take 2-3 minutes to complete Column 2.

Considering your research data, what

information would you provide for each field?

• Don’t have research data? Use the file

DailyWeather to fill in Column 2.

Page 11: Documentation and Metdata - VA DM Bootcamp

Exercise 1 continued

• Take 2 minutes

• There is a blank row under each category for any

information specific to your field, e.g. latitude and

longitude, species, etc.

• Please share an example with the class in the

Google doc “Questions: Ask them here”

Page 12: Documentation and Metdata - VA DM Bootcamp

Wrapping up: elements of documentation

• We’ve looked at commonly used fields

• What does your discipline say about

what you should document?

• The answers you’ve provided could be

used to create a data dictionary

– we’ll examine next

Page 13: Documentation and Metdata - VA DM Bootcamp

Types of Documentation

• ReadMe File

• Data Dictionary

• Codebook

Page 14: Documentation and Metdata - VA DM Bootcamp

ReadMe

• Describes the core documentation about

an investigation and its data files

• Typically a simple text file

• Can describe the individual file(s) and/or

data package as a whole

Page 15: Documentation and Metdata - VA DM Bootcamp

ReadMe Example - File

Page 16: Documentation and Metdata - VA DM Bootcamp

ReadMe Example - File

Page 17: Documentation and Metdata - VA DM Bootcamp

ReadMe Example - Dataset

Page 18: Documentation and Metdata - VA DM Bootcamp

Data Dictionary

• Provides definitions of the data fields in a

data file

• More details on the variables, observations

of a file

Page 19: Documentation and Metdata - VA DM Bootcamp

Data Dictionary

• Used to understand the data and the

databases that contain it

• Identifies data elements and their

attributes including names, definitions and

units of measure and other information

• Often they are organized as a table

http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_and_usage_version_1.1_2006-11-14.pdf

Page 20: Documentation and Metdata - VA DM Bootcamp

Data Dictionary Example: the dataset

http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=HowToSubmit.pdf

Page 21: Documentation and Metdata - VA DM Bootcamp

Data Dictionary Example: the dictionary

Page 22: Documentation and Metdata - VA DM Bootcamp

Exercise 2: Data Dictionary

• Refer to the files on the Data Management

Bootcamp site, either

– http://guides.lib.odu.edu/VADMBC/materials

• In the section Documentation and Metadata

Exercise_2_DataDictionaryTemplate

– Or, you may have a handout “Exercise 2”

• Open the file DailyWeather

Weather data source:http://www.ncdc.noaa.gov/cdo-

web/search?datasetid=GHCND

Page 23: Documentation and Metdata - VA DM Bootcamp

• Use the Daily Weather dataset

– Two worksheets (tabs)

• Data

• Definitions

• Start by answering the questions

• Fill out a data dictionary for this dataset

Exercise 2: Data Dictionary Creation

Page 24: Documentation and Metdata - VA DM Bootcamp

Exercise 2 Discussion

Page 25: Documentation and Metdata - VA DM Bootcamp

What is a Codebook?• Typical in social sciences research

• Includes elements similar to readme and

dictionary

– Project level information (e.g. survey design

and methodology)

– Response codes for each variable

– Codes used to indicate nonresponse and

missing data

http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook

Page 26: Documentation and Metdata - VA DM Bootcamp

What is a Codebook?

• Additionally, codebooks may also contain:

– A copy of the survey questionnaire (if applicable)

– Exact questions and skip patterns used in a

survey

– Frequencies of response

• Quite long!

http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is-codebook

Page 27: Documentation and Metdata - VA DM Bootcamp

Codebook Example

http://www.icpsr.umich.edu/icpsrweb/ICPSR/help/cb9721.jsp

Page 28: Documentation and Metdata - VA DM Bootcamp

Codebook Example

http://dataarchives.ss.ucla.edu/archive%20tutorial/aboutcodebooks.html

Page 29: Documentation and Metdata - VA DM Bootcamp

Other Examples of Data Documentation

• Lab notebooks

• Software syntax

• Programming code

• Instrument settings and/or calibration

• Provenance of sources of data

• Embedded metadata (e.g. EXIF, FITS)

Page 30: Documentation and Metdata - VA DM Bootcamp

Metadata

• What is it?

– Information that describes a resource

– NISO: “metadata is structured information that

describes, explains, locates, or otherwise makes it

easier to retrieve, use, or manage an information

resource”

• Why is it important?

– Enables a resource or data to be easily

discovered

– Good metadata will help others understand and

use your data

Page 31: Documentation and Metdata - VA DM Bootcamp

Metadata in Everyday Life

DataONE Education Module: Metadata. DataONE. Retrieved Nov 12, 2012. From http://www.dataone.org/sites/all/documents/L07_Metadata.pptx

Author(s) Boullosa, Carmen. Title(s) They're cows, we're pigs /

by Carmen Boullosa Place New York : Grove Press, 1997.

Physical Descr viii, 180 p ; 22 cm. Subject(s) Pirates Caribbean Area Fiction.

Format Fiction

Page 32: Documentation and Metdata - VA DM Bootcamp

Metadata Formats

• Documentation for understanding & re-use

– Readme File

– Data Dictionary

– Codebook

• Structured documentation in XML format for use in programs (few examples)

– DDI

– FGDC

– EML

Page 33: Documentation and Metdata - VA DM Bootcamp

Exercise 3: XML File Creation

• Refer to the files on the Data Management

Bootcamp site, either

– http://guides.lib.odu.edu/VADMBC/materials

• In the section Documentation and Metadata

Exercise_3_Weather-DDI-XML-FillinBlanks

– Or, you may have a handout “Exercise 3”

Page 34: Documentation and Metdata - VA DM Bootcamp

Exercise 3: XML File Creation

• Take the file Weather-DDI-XML and fill in

the blanks (as best you can) using:

• the file DailyWeather

• and/or Exercise 2 Data Dictionary

Page 35: Documentation and Metdata - VA DM Bootcamp

Exercise 3 Discussion

Page 36: Documentation and Metdata - VA DM Bootcamp

Exercise 3 Discussion

Page 37: Documentation and Metdata - VA DM Bootcamp

Exercise 3 Discussion

Page 38: Documentation and Metdata - VA DM Bootcamp

Structured XML

A Few Standard Schemes (XML)

– DDI– Data Document Initiativehttp://www.ddialliance.org/

– FGDC– Geospatial Metadata Standardhttp://www.fgdc.gov/metadata/geospatial-metadata-standards

– EML– Ecological Metadata Languagehttp://knb.ecoinformatics.org/software/eml/

Page 39: Documentation and Metdata - VA DM Bootcamp

FGDC Example

Page 40: Documentation and Metdata - VA DM Bootcamp

Structured Metadata Tools

Tools

– Colectica add-on for Excel (DDI)

– Nesstar (DDI)

– Metavist (FGDC)

– ArcGIS (FGDC) *

– Morpho (EML)

http://data.library.virginia.edu/data-management/plan/metadata/metadata-workshop/

Page 41: Documentation and Metdata - VA DM Bootcamp

Example 1: Nesstar DDI Tool

Page 42: Documentation and Metdata - VA DM Bootcamp

Example 2: Metavist FGDC Tool

Page 43: Documentation and Metdata - VA DM Bootcamp

Metadata Concept Map by Amanda Tarbet is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Metadata Standards

Page 44: Documentation and Metdata - VA DM Bootcamp

Metadata Wrap-up

How to chose a metadata standard or

documentation format?

• What does your discipline use?

• Look at what depositing repository requires

Page 45: Documentation and Metdata - VA DM Bootcamp

Research Life Cycle

Data Life Cycle

Re-

Purpose

Re-

Use

Deposit

Data

Collection

Data

Analysis

Data

Sharing

Proposal

Planning

Writing

Data

Discovery

End of

Project

Data

Archive

Project

Start Up

Page 46: Documentation and Metdata - VA DM Bootcamp

QUESTIONS?