Upload
weston-kinman
View
217
Download
2
Tags:
Embed Size (px)
Citation preview
Making the Case for Metadata at SRS-NSF
National Science Foundation
Division of Science Resources Statistics
Jeri Mulrow, Geetha Srinivasarao, and John Gawalt
FedCASIC Workshops, BLSMarch 17, 2010
National Science FoundationDivision of Science Resources Statistics
www.nsf.gov/statistics/1
1984
National Science Foundation
Division of Science Resources Statistics
2
1,984
National Science Foundation
Division of Science Resources Statistics
3
1
National Science Foundation
Division of Science Resources Statistics
4
1 9
National Science Foundation
Division of Science Resources Statistics
5
1 9 8
National Science Foundation
Division of Science Resources Statistics
6
1 9 8 4
National Science Foundation
Division of Science Resources Statistics
7
Today’s Talk
National Science Foundation
Division of Science Resources Statistics
• A bit about SRS
•Historical perspective of data and metadata dissemination
• Metadata users and their metadata needs
• Standardization efforts
• Challenges and future vision
8
A bit about the Division of Science Resources
Statistics (SRS)
National Science Foundation
Division of Science Resources Statistics
• Federal Statistical agency within NSF
• 11 periodic data collections on the U.S. Science and Engineering enterprise
• Data dating back to the 1950s
9
Historical Perspective of SRS data and metadata dissemination
National Science Foundation
Division of Science Resources Statistics
• 1950s – early 1990s paper only
• Detailed statistical tables withminimum metadata as footnotes
• Publications included Highlights about the survey Scope and method of survey Questionnaire Cover letters
10
Example -- 1950s publication
National Science Foundation
Division of Science Resources Statistics
11
1990’s thru 2000’s
National Science Foundation
Division of Science Resources Statistics
• 1992 – electronic format
• Detailed statistical tables in spreadsheetswith minimum metadata as footnotes
• Kept paper, added electronic text Survey Methodology, Limitations to the data, Definitions, Historical revisions, List of tables
• PDF added Questionnaire, Cover letters, Instructions
12
Example --1993 PDF
National Science Foundation
Division of Science Resources Statistics
13
Example – 1991 Electronic spreadsheet
National Science Foundation
Division of Science Resources Statistics
14
Example – 1991 text
National Science Foundation
Division of Science Resources Statistics
15
Today
National Science Foundation
Division of Science Resources Statistics
• Source data tables in Excel with footnotes
• HTML / PDF Highlights of the survey Links to references Survey description
• PDF Survey Questionnaire Instructions Definitions
16
Example – 2007 Excel spreadsheet
National Science Foundation
Division of Science Resources Statistics
17
Example -- 2007 SIRD1
National Science Foundation
Division of Science Resources Statistics
18
Example – 2007 HTML
National Science Foundation
Division of Science Resources Statistics
19
Example – 2007 PDF
National Science Foundation
Division of Science Resources Statistics
20
BUT THAT’S NOT ALL
National Science Foundation
Division of Science Resources Statistics
• Electronic databases Create and download your own customized
aggregate tables
• Public use files Access to some microdata series
21
National Science Foundation
Division of Science Resources Statistics
22
Metadata in WebCASPAR ….
National Science Foundation
Division of Science Resources Statistics
23
Metadata in WebCASPAR
National Science Foundation
Division of Science Resources Statistics
• Variable specific metadata available under Info link
• Metadata not tightly integrated with the data itself – does not get downloaded with the data
24
WebCASPAR Taxonomy
National Science Foundation
Division of Science Resources Statistics
• Survey specific taxonomies
•NCES IPEDS Classification of Instructional program codes (CIP)
• Integrated taxonomy for querying across surveys
http://webcaspar.nsf.gov/
25
National Science Foundation
Division of Science Resources Statistics
26
National Science Foundation
Division of Science Resources Statistics
27
Metadata in SESTAT
National Science Foundation
Division of Science Resources Statistics
• Metadata Explorer is separate from the data Individual variable information
Description Question Domain/Availability – history Valid response categories Keywords
•Metadata is not tightly integrated with the data itself – it does not get downloaded with the data
28
https://sestat.nsf.gov/sestat/sestat.html
Example -- Public Use file
National Science Foundation
Division of Science Resources Statistics
29
Example -- Public Use file
National Science Foundation
Division of Science Resources Statistics
30
Summary – Where are we?
National Science Foundation
Division of Science Resources Statistics
• Different surveys have evolved differently Varying levels of details/metadata
• Not in an standardized structure
Hodge-podge
31
National Science Foundation
Division of Science Resources Statistics
32
Metadata Users & Their Metadata Needs
• Not a one-to-one relationship, but many-to-many
• They occur at all stages of the survey process
Process Data
National Science Foundation
Division of Science Resources Statistics
Define research objectives
Choose mode of collection Choose sampling frame
Construct and pretest questionnaire Design and select sample
Develop Survey Instrument Develop Sample Design
33
Survey Process
Source: Survey Methodology (2009) Groves, Fowler, Couper, Lepkowski, Singer & Tourangeau.
Recruit and measure sample
Code and edit data
Make postsurvey adjustments
Perform analysis
Define Scope
Collect Data
Disseminate Data
Define Scope
National Science Foundation
Division of Science Resources Statistics
Users Metadata
Data User GeneralSurvey Manager TopicSubject Matter Expert Population of interestStatistician Other data sourcesSurvey Methodologist SpecificRespondent Frame options Sample design options Historical info/data User needs Federal Register notices
34
Develop Survey Instrument
National Science Foundation
Division of Science Resources Statistics
Users Metadata
Data User QuestionsSurvey Manager Answer choicesSubject Matter Expert Definition of termsStatistician InstructionsSurvey Methodologist Logic flow of questionsRespondent Cognitive work Validity assessments Reliability assessments Functionality testing Alternative questions Instrument design specs – paper, web, CATI
Develop Sample Design
National Science Foundation
Division of Science Resources Statistics
Users Metadata
Data User Population of interestSurvey Manager Sampling frame / Universe specsSubject Matter Expert Update scheduleStatistician Sample design specs Desired criteria Sample selection techniques Historical information on performance of designs Estimation methods
36
Collect Data
National Science Foundation
Division of Science Resources Statistics
Users Metadata
Data User Variable names and formatsSurvey Manager Variable data typesSubject Matter Expert Physical storageStatistician Tables and relationshipsDatabase Administrators Mapping of questions toSoftware Developers variables and definitions Logic flow of questions Response rates over time Paradata Cover letter
37
Process Data
National Science Foundation
Division of Science Resources Statistics
Users Metadata
Data User Item response ratesSurvey Manager Zero vs. null vs. missingSubject Matter Expert Edit specificationsStatistician Imputation specificationsDatabase Administrators Recode specificationsSoftware Developers Data table specifications Changes across survey cycles
38
Data Dissemination and Publication
National Science Foundation
Division of Science Resources Statistics
Users Metadata
Data User History of changesSurvey Manager Methodology reportSubject Matter Expert Public use files with Statistician documentationDatabase Administrators Author/contact source Software Developers Who can access whatArchivist Type of product Content format URL; Keywords Relationships Metadata schema
39
Who are the Metadata Users?
National Science Foundation
Division of Science Resources Statistics
• Data users Basic & advanced Analysts General public
• Respondent• Survey Manager• Survey Methodologist• Statistician• Subject Matter Expert• Software Developer• Database Administrator• Archivist
40
Need for Standardization of Metadata
is Apparent
is Critical
National Science Foundation
Division of Science Resources Statistics
41
Standardization Efforts
National Science Foundation
Division of Science Resources Statistics
• Dublin Core
• SDMX (aggregate level)
• DDI 3.0 (record level)
42
Recent SRS Efforts
National Science Foundation
Division of Science Resources Statistics
• Data Repository (Oracle)
• Inclusion of some metadata
• SAS/ACCESS User Interface for internal users
• Evaluating external user interfaces
43
SRS Efforts -- Working with Commercial Contractors
National Science Foundation
Division of Science Resources Statistics
• Requirements for Data / Metadata delivery
• Examples document
• Standard contracting language
• Checklist
44
SRS AdoptedBasic Operating Procedures
National Science Foundation
Division of Science Resources Statistics
• Using Oracle to store microdata and metadata
• Collecting metadata in whatever format
• Keeping it all organized
45
Challenges
National Science Foundation
Division of Science Resources Statistics
• Getting all the players on the same page Many different users Many different uses Many different providers Many different products Many different formats
• Cost
• Keeping it all straight
46
Near Future Vision
National Science Foundation
Division of Science Resources Statistics
SRS Data Repository
Data and Metadata
Taxonomy Efforts
Data & Metadata
Dissemination
Analytic tools
DDI 3.0, SDMX…
47
Near Future Vision
National Science Foundation
Division of Science Resources Statistics
SRS Data Repository
Data and Metadata
Taxonomy Efforts
Data & Metadata
Dissemination
Analytic tools
DDI 3.0, SDMX…
48
Paradata
1984
National Science Foundation
Division of Science Resources Statistics
49
Thank you!
National Science Foundation
Division of Science Resources Statistics
50