Looking into the future. Providing Social Science Data Services Jim Jacobs. First principles. Metadata are data about data -- information about information. It s all about having complete, accurate, re-usable metadata. - PowerPoint PPT Presentation
Looking into the future
Looking into the future
Providing Social Science Data ServicesJim Jacobs1First principlesMetadata are data about data -- information about information.Its all about having complete, accurate, re-usable metadata.Software to process the metadata is secondary. We should be able to have metadata today that we know will be usable in unforeseeable computing environments (operating systems, software, hardware).
2First principlesMetadata should be
ComprehensiveCompleteUncompromisedConsistentFlexibleSharableUsable and re-usablePreservableParseable by computerDocumentedNon-proprietary 3How XML fits inXML is designed to make it easy to find and usejust the elements you need from a large document.
4How XML fits inXML is designed to be parseable with generic tools.XML can encode meaning and can be self-documentingXML is non-proprietary, open, flexible.5
How XML fits in Great Power Wars, 1495-1815 9955 Levy, Jack S. National Science Foundation. SES86-10567 Inter-university Consortium for Political and Social Research 1994-05-20
1994-05-20 1994-05-20 Levy, Jack S. GREAT POWER WARS, 1495-1815 [Computer file]. New Brunswick, NJ and Houston, TX: Jack S. Levy and T. Clifton Morgan Great Power Wars, 1495-1815You can cherry-pick just what you need from a large XML document7This means you can have a very large, very complex,very rich metadata file, pick out just the elements you need for a particular application and make those elements fit the requirements of the software.From legacies to the futureSASSPSSOSIRISPDFPaperData dictionaryEtc.HTMLPDFAny stat packageNesstar, SDA, DataverseLibrary OPACGoogleOAI, METS, etc.RSS, RDFGISDDI 3, 4DDI8From many contributors to many usesresearcherData collectorAnalystData producer,distributorData archivistData librarianUsers of statisticsGovernmentagencyThe webLive documentsDatabasespublicationsData archivesData librariesInstitutional repositoriesSecondary analysisNew researchNew knowledgeDDI9OAIS Functional ModelIngestOAIS Functional ModelArchival StorageAccess10So the functional model has ingest, storage, and access functions.Information PackagesSIPOAIS Information ModelAIPDIPSIPSIPDIPDIP11Data stewardship life cycle
Data Discovery12DDI Production
Data Discovery13DDI will most likely be produced at the data production stage and the data repository stage of the life cycle. But contributions to ddi may be made at other stages as well.DDI Use
Data Discovery14But we will make use of ddi at every stage in the life cycleDDI will enable transformationNew kinds of data discovery (beyond indexing)Metadata as a primary resource (metadata as data)Metadata for data discoveryICPSR uses DDI metadata to create its Variables database.
Nesstar and Dataverse software use metadata to produce searchable indexes of data repositoriesMetadata for data discoveryHarvesting of DDI from many repositories to create indexes across collections should become common. (oclc.org/oaister/)
Data discovery by concept and methodology and geography and time period, not just keyword, can become the new norm.Metadata as dataWhen we use DDI 3, we are creating digital information that is structured according to processes and functions:
DDI 3: Study Concept, Collection, Processing, Distribution, Archiving, Discovery, Analysis, Repurposing.
Metadata as dataWhen we use DDI 3, we are creating digital information that is structured according to processes and functions
By doing this, we are creating data! We can treat metadata as data.
Researchers will analyze metadata the way we would analyze any data file.
Metadata as dataAs we create and preserve more metadata of this kind, we are creating a new body of knowledge.
We are accumulating a body of information that makes it possible to study trends across time and geography.
Metadata as data: an exampleThe technical documentation for the Army's Korean conflict casualty electronic records file has casualty codes that were never used in the data files. The presence of codes in the metadata for injury by lethal gas and by radiation exposure suggests that Army personnel who designed this record-keeping system expected the possible use of those as weapons. Examination of the data alone would have missed this suggestion.
Metadata as data: an exampleThe codes for 'place of casualty' included, in addition to South Korea Sector and North Korea Sector, the Indo-China Sector, Tibet Sector, Mongolia Sector, Honan Sector (sic), Manchuria Sector, North Japan Sector, South Japan Sector, South China Sector, and Formosa Sector." Metadata as data: another exampleA researcher at the Danish Data Archive is doing a qualitative analysis of the questionnaires used in seven surveys about ethnic minorities in Danish society, "with the purpose of showing how surveys ... mirror and project societal understandings of the subjects under investigation."Metadata as data: yet another exampleWendy Thomas of the Minnesota Population Center examined U.S. Census metadata from 1790 through 2000 and compared the changing concept of race and ethnicity as embodied in the categories used by the Census Bureau questions over time. Those concepts are only documented in the metadata, not the Census data files themselves.Census Question Coverage179018001810182018301840185018601870188018901900191019201930194019501960197019801990RaceSlave StatusNativityParent POBAncestry/EthnicityCitizenshipLanguagePrevious ResidencePatterns of Census reporting of raceWhiteColored Black MulattoAmerican IndianEskimoAleutChineseJapaneseFilipinoAsian IndianHawaiian Part HawaiianSamoanKoreanGuamanianVietnameseOther Asian / Pacific IslanderMetadata as data: yet another examplePolitics in Race / Ethnicity / Ancestry: Mining the Metadata for Answers
Wendy L. ThomasMinnesota Population CenterPresented at: APDU 2004, 18 October 2004