23
Jane Fry & Chantal Ripp CASRAI ReConnect14 November 19, 2014 An Environmental Scan of Best Practices and DDI

RDC Jane Fry, Chantal Ripp - Data Interoperability I

  • Upload
    casrai

  • View
    96

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Jane Fry & Chantal RippCASRAI ReConnect14

November 19, 2014

An Environmental Scan of Best Practices and DDI

Page 2: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Agenda

BackgroundDLI<odesi>

Our partnershipDDIBest PracticesThe current situationSuggestions for the future

2

Page 3: RDC Jane Fry, Chantal Ripp - Data Interoperability I

DLI backgroundIn 1996, the Data Liberation Initiative (DLI) became a 5 year pilot project

Today 78 academic institutions from across Canada are partners in the DLI

Provides access to all Public Use Microdata files (PUMFs) and to a special collection of aggregate data tables under the DLI License Agreement

3

Page 4: RDC Jane Fry, Chantal Ripp - Data Interoperability I

DLI background (cont’d)

Services include:dedicated research assistance;support on how to use the Microdata files; andtools to improve student and faculty access to the data

In response to demands from our partners, the program adopted the DDI standard for its collection of PUMF files

In March 2004 acquired the NESSTAR system

4

Page 5: RDC Jane Fry, Chantal Ripp - Data Interoperability I

<odesi> background

Ontario Data Documentation, Extraction Service and InfrastructureEstablished in 2007

In response to the need for greater data access in Ontario universities

Digital repository for social science dataIncludes both microdata and aggregate data files

Part of OCUL (Ontario Council of University Libraries)Membership includes 21 university librariesWill also provide access to other libraries for a fee

5

Page 6: RDC Jane Fry, Chantal Ripp - Data Interoperability I

<odesi> background (cont’d)

Nesstar interface usedDDI standard used

Data Documentation InitiativeMetadata specification for the social sciences

Services include:Major access point for data in Ontario universities

No need of knowledge of a statistical packageAccess to all associated documentation for the different surveys

Why <odesi>Role of libraries

6

Page 7: RDC Jane Fry, Chantal Ripp - Data Interoperability I

DLI and <odesi>: the partnershipHow it came about and how it grew

Didn’t want to re-invent the wheel

Initial collaboration was between <odesi> and StatCanCarleton, Queen’s, Guelph and University of Ottawa also involved

The partnership has always existed informally

7

Page 8: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Brief Background of DDI What is DDI?

Data Documentation Initiativehttp://www.ddi-alliance.org/

An international specificationNot yet a formal ISO standardGoal

Formats documentation for a social science data fileMore useful than a word or text file

Supports the entire research data lifecycle

8

Page 9: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Brief DDI Background (cont’d)

Facilitates the creation of metadataExpressed in XML

XML SchemaA way of tagging text for meaning, not appearance

DefinesWhich tags are availableThe order the tags will appear in a documentWhether the tags are required or optionalWhether the tags are repeatable or not

9

Page 10: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Brief DDI Background (cont’d)

Creates a standard formatUsed to mark up codebooksMeaningful and consistentMetadata is both human and machine readable

Gives codebook level details such asdataset contents, variable labels, summary Statistics and frequenciesAlso question text for each variable

*as long as it was entered when marking up the document

10

Page 11: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Example of DDI Compliant Codebook<?xml version='1.0' encoding='UTF-8'?>

<codeBook version="1.2.2" ID="ctums_82M0020XCB_E_2004_ann_person-file">

<docDscr>

<citation>

<titlStmt>

<titl>

Canadian Tobacco Use Monitoring Survey, 2004: Annual, Person File

</titl>

<subTitl>

Annual, Person File

</subTitl>

<altTitl>

CTUMS 2004: Annual, Person File

</altTitl>

<parTitl>

Enquete de surveillance de l'usage du tabac au Canada, 2004: Annuel, Personne

</parTitl>

<IDNo>

ctums_82M0020XCB_E_2004_ann_person-file

</IDNo>

</titlStmt>

<rspStmt>

<AuthEnty affiliation="Queen's University. Social Science Data Centre">

Cooper, Alex

</AuthEnty>

</rspStmt>

11

Page 12: RDC Jane Fry, Chantal Ripp - Data Interoperability I

DDI at DLILabour intensive

Marking-up

Staff turnoverDDI and Nesstar not always intuitive

Proper training needed

Need experts in the field to contact

Development of network of DDI experts

12

Page 13: RDC Jane Fry, Chantal Ripp - Data Interoperability I

DDI in AcademiaDifferent universities were doing mark-up at their institutions

Did not really know what was happening elsewhereResources

Supervisor/trainerMoney to hire students

Many of the same issues as DLILabour intensiveStaff turnoverWho to turn to for help

Bilingual issues

13

Page 14: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Best practicesHistorical Essential

When exchanging filesMarkup is done in geographically diverse locations

Consistency needed

Training documentsBest Practices Document: based on DDI 2.x, published in 2008How to do Mark up

Another reasonSharing

14

Page 15: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Best practices (cont’d)

Quality ControlMark-up is done at different universitiesHow to ensure that BPD is being followedWho does Quality Control (QC)Needed to come up with a list of files being worked on, by whom, …Scholars Portal maintains the list

We edit it as necessaryMain pages

QC Activity - microdata files (one for English, one for French)New MarkUp activity (one for English, one for French)QC Activity – Aggregate data files

15

Page 16: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Best practices (cont’d)

An example: 1.1.1.5 <IDNo> Identification Number Description: Unique string or number (producer's or archive's number) for the marked-up document. Note 1: This ID number is the same for the document description and the study description, ie. 1.1.1.5 is the same as

2.1.1.5. Note 2: For Statistics Canada surveys, the catalogue number refers to the microdata file. Formatting Note 1: Language: E = English, F = French

Year: yyyy or yyyy-mm-dd Lowercase: should be used for everything except the catalogue number and the language abbreviation Surveys with cycle numbers and sub-numbers: use a dash between the numbers and not a period, e.g.. for cycle 2.1 the format

would be c2-1 ICPSR surveys: use their ID# and add the short form if there is a subset, e.g.. icpsr9721im

Formatting Note 2: This is the format to be used: acronym_CatalogueNumber_language_year_subset

16

Page 17: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Best practices (cont’d)

Example 1: <titl>Sun Exposure Survey, 1996 [Canada]</titl>

<IDNo>ses_82M0019_E_1996</IDNo> Example 2: <titl>General Social Survey, 2005 [Canada]: Cycle 19, Time Use, Main File</titl>

<IDNo>gss_12M0019_E_2005_c19_main-file</IDNo> Example 3: <titl>Canadian Tobacco Use Monitoring Survey, 2004: Cycle 1, Household File</titl>

<IDNo>ctums_82M0020_E_2004_c1_household-file</IDNo> Example 4:<titl>Canadian Tobacco Use Monitoring Survey, 2004: Cycle 1, Household File</titl>

<IDNo>cchs_82M0013_E_2005_c3-1_health-sample</IDNo> Example 5: <titl>Canadian Gallup Poll, May 1949, #186</titl>

<IDNo>cipo_186_E_1949-05</IDNo> Example 6: <titl>Descriptors and Measurements of the Height of Runaway Slaves and Indentured Servants in the

United States: Runaway Indentured Servant Height Measurements, 1700-1850</titl>

<IDNo>9721im</IDNo>

17

Page 18: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Currently …Formal review of BDP

Ad hoc updatese.g. old ID Number: CatalogueNumber_language_year_subsete.g. new ID Number: CatalogueNumber-language-year-subset

Creating and sharing of new best practicesAggregate data markupCreating Variable GroupsCreation of training videos

MeetingsCarleton and Queen’s Scholar’s Portal MarkUp teamRecent meeting of partners (DLI and <odesi>)

18

Page 19: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Suggestions for the futureContinue with the collaboration

Distributing tagging responsibilitiesSet up a working group

Working Group on File EnhancementExpand collaboration with internal and external stakeholders

InternalNationalInternational

QuestionWho will be the overseer?Is this a role for the libraries?

19

Page 20: RDC Jane Fry, Chantal Ripp - Data Interoperability I

In sumWe learned

The background of DLI and <odesi>How our partnership has worked over the yearsOur background with DDIThe current and evolving situation with Best PracticesSuggestions for the future

DLI and <odesi>A valuable partnership for sustainability

20

Page 21: RDC Jane Fry, Chantal Ripp - Data Interoperability I

References• Best Practices Documents:

• <odesi> : http://www.library.carleton.ca/help/nesstar-how-mark-nesstar• Statistics Canada - for a copy, contact: [email protected]• Carleton Data Centre You Tube: https://www.youtube.com/user/MacOdrumLibrary/playlists• Marking-up in Nesstar videos: http://www.library.carleton.ca/help/nesstar-how-mark-nesstar

• Presentations:• Edwards, M. “DDI in Ontario: Putting the Pieces Together”. Presented at IASSIST, May 2007. http://cudo.carleton.ca/dli-

training/2400• Edwards, M., Fry, J., Bourgeois, MJ. & Wong, I. “DDI Tags”. Presented at DINO, December 2005.

http://cudo.carleton.ca/dli-training/2290• Edwards, M., Fry, J. & Cooper, A. “Best Practices Documents: Are They Really Necessary?”. Presented at IASSIST, May 2008.

http://cudo.carleton.ca/dli-training/2392• Fry, J. “Discover the power of DDI metadata”. Workshop at NADDI April 2014. http://summit.sfu.ca/item/13941• Humphrey, C. “Research Data Management Infrastructure: What each Library can be doing”. Presented at ACCOLEDS,

November 2012. http://cudo.carleton.ca/dli-training/2741• Watkins, W. & Bourgeois, MJ. “DDI and the DLI: A Short and Long-term Strategy for the DLI Community”. Presented at Ontario

DLI Training, April 2005. http://cudo.carleton.ca/dli-training/2116

21

Page 22: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Contact Information

Jane Fry Chantal Ripp

Data SpecialistMacOdrum LibraryCarleton University613.520.2600 [email protected]

Reference Services CoordinatorData Liberation Initiative (DLI)Statistics [email protected]

22

Page 23: RDC Jane Fry, Chantal Ripp - Data Interoperability I

Questions?

23