Upload
donal
View
32
Download
0
Embed Size (px)
DESCRIPTION
Looking into the future…. Providing Social Science Data Services Jim Jacobs. First principles. Metadata are data about data -- information about information. It ’ s all about having complete, accurate, re-usable metadata. - PowerPoint PPT Presentation
Citation preview
Looking into the future…
Providing Social Science Data Services
Jim Jacobs
First principles Metadata are data about data -- information about
information. It’s all about having complete, accurate, re-usable
metadata. Software to process the metadata is secondary. We
should be able to have metadata today that we know will be usable in unforeseeable computing environments (operating systems, software, hardware).
First principlesMetadata should be… Comprehensive Complete Uncompromised Consistent Flexible Sharable Usable and re-usable Preservable
Parseable by computer
Documented Non-proprietary
How XML fits in…
XML is designed to make it easy to find and usejust the elements you need from a large document.
“Cherry picking”
How XML fits in…
XML is designed to be parseable with generic tools.
XML can encode meaning and can be self-documenting
XML is non-proprietary, open, flexible.
How XML fits in… <stdyDscr> <citation> <titlStmt> <titl>Great Power Wars, 1495-1815</titl> <IDNo>9955</IDNo> </titlStmt> <rspStmt> <AuthEnty>Levy, Jack S.</AuthEnty> </rspStmt> <prodStmt> <fundAg>National Science Foundation.</fundAg> <grantNo>SES86-10567</grantNo> </prodStmt> <distStmt> <distrbtr abbr="ICPSR" affiliation="Institute for Social Research, University of Michigan" URI="http;//www.icpsr.umich.edu">Inter-university Consortium for Political and Social Research</distrbtr> <distDate date="1994-05-20">1994-05-20</distDate> </distStmt><serStmt> </serStmt> <verStmt> <dateAdded>1994-05-20</dateAdded> <dateUpdated>1994-05-20</dateUpdated> </verStmt> <biblCit>Levy, Jack S. GREAT POWER WARS, 1495-1815 [Computer file]. New Brunswick, NJ and Houston, TX: Jack S. Levy and T. Clifton Morgan …
<titl>Great Power Wars, 1495-1815</titl>
You can cherry-pick just what you need from a large XML document…
From legacies to the future
SAS SPSS OSIRIS PDF Paper Data dictionary Etc.
HTML PDF Any stat package Nesstar, SDA,
Dataverse Library OPAC Google OAI, METS, etc. RSS, RDF GIS DDI 3, 4…
DDI
From many contributors to many uses
researcher Data collector Analyst Data producer,
distributor Data archivist Data librarian Users of statistics Government
agency
The web Live documents Databases publications Data archives Data libraries Institutional
repositories Secondary
analysis New research New knowledge
DDI
OAIS Functional Model
Ingest
OAIS Functional Model
Archival Storage Access
Information Packages
SIP
OAIS Information Model
AIP DIP
SIP
SIP
DIP
DIP
Data stewardship life cycle
Data Repurposing
Data ProductionData Repository
Data Dissemination
Data Discovery
DDI Production
Data Repurposing
Data ProductionData Repository
Data Dissemination
Data Discovery
DDI Use
Data Repurposing
Data ProductionData Repository
Data Dissemination
Data Discovery
DDI will enable transformation New kinds of data discovery (beyond
“indexing”) Metadata as a primary resource (metadata as
data)
Metadata for data discovery ICPSR uses DDI metadata to create its
Variables database.
Nesstar and Dataverse software use metadata to produce searchable indexes of data repositories
Metadata for data discovery Harvesting of DDI from many repositories to
create indexes across collections should become common. (oclc.org/oaister/)
Data discovery by concept and methodology and geography and time period, not just keyword, can become the new norm.
Metadata as dataWhen we use DDI 3, we are creating digital information that is structured according to processes and functions:
DDI 3: Study Concept, Collection, Processing, Distribution, Archiving, Discovery, Analysis, Repurposing.
Metadata as dataWhen we use DDI 3, we are creating digital information that is structured according to processes and functions
By doing this, we are creating data! We can treat “metadata” as data.
Researchers will analyze metadata the way we would analyze any data file.
Metadata as data
As we create and preserve more metadata of this kind, we are creating a new body of knowledge.
We are accumulating a body of information that makes it possible to study trends across time and geography.
Metadata as data: an example
The technical documentation for the Army's Korean conflict casualty electronic records file has casualty codes that were never used in the data files.
The presence of codes in the metadata for injury by lethal gas and by radiation exposure suggests that Army personnel who designed this record-keeping system expected the possible use of those as weapons. Examination of the data alone would have missed this suggestion.
Metadata as data: an example
The codes for 'place of casualty' included, in addition to South Korea Sector and North Korea Sector, the Indo-China Sector, Tibet Sector, Mongolia Sector, Honan Sector (sic), Manchuria Sector, North Japan Sector, South Japan Sector, South China Sector, and Formosa Sector."
Metadata as data: another example
A researcher at the Danish Data Archive is doing a qualitative analysis of the questionnaires used in seven surveys about ethnic minorities in Danish society, "with the purpose of showing how surveys ... mirror and project societal understandings of the subjects under investigation."
Metadata as data: yet another example
Wendy Thomas of the Minnesota Population Center examined U.S. Census metadata from 1790 through 2000 and compared the changing concept of race and ethnicity as embodied in the categories used by the Census Bureau questions over time. Those concepts are only documented in the metadata, not the Census data files themselves.
Census Question Coverage
1790
1800
1810
1820
1830
1840
1850
1860
1870
1880
1890
1900
1910
1920
1930
1940
1950
1960
1970
1980
1990
Race
Slave Status
Nativity
Parent POB
Ancestry/Ethnicity
Citizenship
Language
Previous Residence
Patterns of Census reporting of “race”White
Colored
Black
Mulatto
American Indian
Eskimo
Aleut
Chinese
Japanese
Filipino
Asian Indian
Hawaiian
Part Hawaiian
Samoan
Korean
Guamanian
Vietnamese
Other Asian / Pacific Islander
Metadata as data: yet another example
Politics in Race / Ethnicity / Ancestry: Mining the Metadata for Answers
Wendy L. ThomasMinnesota Population CenterPresented at: APDU 2004, 18 October 2004