27
Looking into the future… Providing Social Science Data Services Jim Jacobs

Looking into the future…

  • Upload
    donal

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

Looking into the future…. Providing Social Science Data Services Jim Jacobs. First principles. Metadata are data about data -- information about information. It ’ s all about having complete, accurate, re-usable metadata. - PowerPoint PPT Presentation

Citation preview

Page 1: Looking into the future…

Looking into the future…

Providing Social Science Data Services

Jim Jacobs

Page 2: Looking into the future…

First principles Metadata are data about data -- information about

information. It’s all about having complete, accurate, re-usable

metadata. Software to process the metadata is secondary. We

should be able to have metadata today that we know will be usable in unforeseeable computing environments (operating systems, software, hardware).

Page 3: Looking into the future…

First principlesMetadata should be… Comprehensive Complete Uncompromised Consistent Flexible Sharable Usable and re-usable Preservable

Parseable by computer

Documented Non-proprietary

Page 4: Looking into the future…

How XML fits in…

XML is designed to make it easy to find and usejust the elements you need from a large document.

“Cherry picking”

Page 5: Looking into the future…

How XML fits in…

XML is designed to be parseable with generic tools.

XML can encode meaning and can be self-documenting

XML is non-proprietary, open, flexible.

Page 6: Looking into the future…
Page 7: Looking into the future…

How XML fits in… <stdyDscr> <citation> <titlStmt> <titl>Great Power Wars, 1495-1815</titl> <IDNo>9955</IDNo> </titlStmt> <rspStmt> <AuthEnty>Levy, Jack S.</AuthEnty> </rspStmt> <prodStmt> <fundAg>National Science Foundation.</fundAg> <grantNo>SES86-10567</grantNo> </prodStmt> <distStmt> <distrbtr abbr="ICPSR" affiliation="Institute for Social Research, University of Michigan" URI="http;//www.icpsr.umich.edu">Inter-university Consortium for Political and Social Research</distrbtr> <distDate date="1994-05-20">1994-05-20</distDate> </distStmt><serStmt> </serStmt> <verStmt> <dateAdded>1994-05-20</dateAdded> <dateUpdated>1994-05-20</dateUpdated> </verStmt> <biblCit>Levy, Jack S. GREAT POWER WARS, 1495-1815 [Computer file]. New Brunswick, NJ and Houston, TX: Jack S. Levy and T. Clifton Morgan …

<titl>Great Power Wars, 1495-1815</titl>

You can cherry-pick just what you need from a large XML document…

Page 8: Looking into the future…

From legacies to the future

SAS SPSS OSIRIS PDF Paper Data dictionary Etc.

HTML PDF Any stat package Nesstar, SDA,

Dataverse Library OPAC Google OAI, METS, etc. RSS, RDF GIS DDI 3, 4…

DDI

Page 9: Looking into the future…

From many contributors to many uses

researcher Data collector Analyst Data producer,

distributor Data archivist Data librarian Users of statistics Government

agency

The web Live documents Databases publications Data archives Data libraries Institutional

repositories Secondary

analysis New research New knowledge

DDI

Page 10: Looking into the future…

OAIS Functional Model

Ingest

OAIS Functional Model

Archival Storage Access

Page 11: Looking into the future…

Information Packages

SIP

OAIS Information Model

AIP DIP

SIP

SIP

DIP

DIP

Page 12: Looking into the future…

Data stewardship life cycle

Data Repurposing

Data ProductionData Repository

Data Dissemination

Data Discovery

Page 13: Looking into the future…

DDI Production

Data Repurposing

Data ProductionData Repository

Data Dissemination

Data Discovery

Page 14: Looking into the future…

DDI Use

Data Repurposing

Data ProductionData Repository

Data Dissemination

Data Discovery

Page 15: Looking into the future…

DDI will enable transformation New kinds of data discovery (beyond

“indexing”) Metadata as a primary resource (metadata as

data)

Page 16: Looking into the future…

Metadata for data discovery ICPSR uses DDI metadata to create its

Variables database.

Nesstar and Dataverse software use metadata to produce searchable indexes of data repositories

Page 17: Looking into the future…

Metadata for data discovery Harvesting of DDI from many repositories to

create indexes across collections should become common. (oclc.org/oaister/)

Data discovery by concept and methodology and geography and time period, not just keyword, can become the new norm.

Page 18: Looking into the future…

Metadata as dataWhen we use DDI 3, we are creating digital information that is structured according to processes and functions:

DDI 3: Study Concept, Collection, Processing, Distribution, Archiving, Discovery, Analysis, Repurposing.

Page 19: Looking into the future…

Metadata as dataWhen we use DDI 3, we are creating digital information that is structured according to processes and functions

By doing this, we are creating data! We can treat “metadata” as data.

Researchers will analyze metadata the way we would analyze any data file.

Page 20: Looking into the future…

Metadata as data

As we create and preserve more metadata of this kind, we are creating a new body of knowledge.

We are accumulating a body of information that makes it possible to study trends across time and geography.

Page 21: Looking into the future…

Metadata as data: an example

The technical documentation for the Army's Korean conflict casualty electronic records file has casualty codes that were never used in the data files.

The presence of codes in the metadata for injury by lethal gas and by radiation exposure suggests that Army personnel who designed this record-keeping system expected the possible use of those as weapons. Examination of the data alone would have missed this suggestion.

Page 22: Looking into the future…

Metadata as data: an example

The codes for 'place of casualty' included, in addition to South Korea Sector and North Korea Sector, the Indo-China Sector, Tibet Sector, Mongolia Sector, Honan Sector (sic), Manchuria Sector, North Japan Sector, South Japan Sector, South China Sector, and Formosa Sector."

Page 23: Looking into the future…

Metadata as data: another example

A researcher at the Danish Data Archive is doing a qualitative analysis of the questionnaires used in seven surveys about ethnic minorities in Danish society, "with the purpose of showing how surveys ... mirror and project societal understandings of the subjects under investigation."

Page 24: Looking into the future…

Metadata as data: yet another example

Wendy Thomas of the Minnesota Population Center examined U.S. Census metadata from 1790 through 2000 and compared the changing concept of race and ethnicity as embodied in the categories used by the Census Bureau questions over time. Those concepts are only documented in the metadata, not the Census data files themselves.

Page 25: Looking into the future…

Census Question Coverage

 

1790

1800

1810

1820

1830

1840

1850

1860

1870

1880

1890

1900

1910

1920

1930

1940

1950

1960

1970

1980

1990

Race                                          

Slave Status                                          

Nativity                                          

Parent POB                                          

Ancestry/Ethnicity                                          

Citizenship                                          

Language                                          

Previous Residence                                          

Page 26: Looking into the future…

Patterns of Census reporting of “race”White                                          

Colored                                          

Black                                          

Mulatto                                          

American Indian                                          

Eskimo                                          

Aleut                                          

Chinese                                          

Japanese                                          

Filipino                                          

Asian Indian                                          

Hawaiian                                          

Part Hawaiian                                          

Samoan                                          

Korean                                          

Guamanian                                          

Vietnamese                                          

Other Asian / Pacific Islander                                          

Page 27: Looking into the future…

Metadata as data: yet another example

Politics in Race / Ethnicity / Ancestry: Mining the Metadata for Answers

Wendy L. ThomasMinnesota Population CenterPresented at: APDU 2004, 18 October 2004