View
225
Download
0
Category
Tags:
Preview:
Citation preview
The GovStat Projectils.unc.edu/govstat
Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the
National Statistical Knowledge Network
Carol A. Hert
Syracuse University NSF Grants EIA 0131824 and EIA 0129978
Principal Investigators: Gary Marchionini, Stephanie Haas, Ben Shneiderman, Catherine Plaisant, and Carol Hert
Gov Stat
Project Partners
• Bureau of Labor Statistics
• Census Bureau
• Center for Health Statistics
• Social Security Administration
• National Agriculture Statistical Service
• Energy Information Administration
Gov Stat
Project Goals
• To create an integrated model of user access to and use of US government statistical information (The Statistical Knowledge Network)
• Design and test prototype interface tools to support finding and using statistics
• To support integration (technical and intellectual) of statistical data
Gov Stat
Statistical Knowledge Network Architecture
Agencies
SKN Registry
ActionsContribute
FindDisplay
Annotate UnderstandManipulate Collaborate
…..
………….
ObjectsActions
Private Work Space
Objects Actions
Private Work Space
Ontology Rules & Constraints
SKN Consortium
…..
Objects Reports metadataTables metadataPeople metadata
GlossaryAnnotations
Objects Actions
Private Work Space
Objects Actions
Private Work Space
Statistical Knowledge Network Architecture
• Enable statistical agencies to:– Reach wider audiences
– Standardize strategies for transmission, retrieval & use
– Reduce costs
– Facilitate cooperation among agencies & organizations
Goal: Increase find-ability, understand-ability & use of government statistics
Metadata as a Linchpin of Integration of Diverse Statistical Information
• Metadata during statistical information seeking
• User studies of statistical information use• Building a schema to support these activities• A hierarchy of integration (and the metadata
to support it)
• With a few closing words on technology transfer! Gov Stat
Metadata for Statistical Information Seeking
• The user challenges:– Who has the relevant data?
• decentralized statistical system
– Finding data that map to the set of topical, time period, geographic and other requirements
• Interface tool relying on metadata (currently harvested automatically from webpages)– Supports exploration prior to access
Gov Stat
.go
vRelation Browser with all EIA pages
User Studies of Metadata and Statistical Information Use
1. metadata requirements for understanding tables (Hert & Hernández, 1999).
2. metadata requirements in a variety of integration tasks (Denn, Haas, & Hert, 2003).
3. Statistical comparisons particularly investigating the types of comparisons made and the rules experts employ during those comparison processes (Hert, 2004).
Gov Stat
Some insights from the studies• Some types needed:
– Definitions– Survey methodology– Rationales and information on differences (what is the
difference between concept 1 and concept 2)– Currency of information (what’s the latest data I can get,
when will more data be available, etc.)– Table structure– Interface design
• Supporting use requires significant amounts of metadata including some not easily generated (automatically or otherwise) Gov Stat
Some insights from the studies
• Comparing is a key activity in integrating statistics
• Business rules for operating on the metadata necessary to support user tasks
• Metadata supports help tools, help tools will be necessary to support metadata usage
Gov Stat
Metadata Schema Philosophy
• To provide sub-document level access and integration across documents and agencies.
• To provide a minimal set of metadata elements necessary while allowing for extensibility.
• To achieve these goals in a manner that enables efficient transfer to agencies.
Gov Stat
Our Schema in Action: An Example
• Scenario: The fact that the percentage of older people in the population of the US is increasing raises a question about the overall economic status of this group. In particular, we are interested in people who are retired or no longer in the work force and over a certain age (65 or older). We want to know the following things to understand the economic status of this particular group of people:– Income level (in terms of median income) compared to the
general (whole) population– Sources of income– Employment status
Examples from the Markup
• Table markup:– For each table, the schema encodes the table
title, each row or column heading, and the data values in the table.
• Each data value element references the row and column heading elements associated with it.
• Footnotes are encoded at the highest level to which they apply – the table level, the row/column level, or the individual data value level.
Examples from the Markup <tableInfo>
<tableTitle>Table 1.1 Percentage with income from specified source, by age, marital status, and sex of nonmarried persons</tableTitle>
<rowInfo><rowTitle>Source of Income -
Earnings</rowTitle><rowID>r001</rowID>
</rowInfo><rowInfo>
<rowTitle>Source of Income - Earnings - Wages and salaries</rowTitle>
<rowID>r002</rowID></rowInfo><rowInfo>
<rowTitle>Source of Income - Earnings - Self-employment</rowTitle>
<rowID>r003</rowID></rowInfo><rowInfo>
<rowTitle>Source of Income - Retirement benefits</rowTitle>
<rowID>r004</rowID></rowInfo><rowInfo>
<rowTitle>Source of Income - Retirement benefits - Social Security</rowTitle>
<rowFootnote>Social Security includes retired-worker benefits, dependents' or survivors' benefits, disability benefits, transitionally insured benefits, or
special age-72 benefits</rowFootnote><rowID>r005</rowID>
</rowInfo>...
In order to preserve category information, individual row and column headings include the category labelling.
Including the category labelling within the row/column headings improves access to data embedded within tables by making the category information searchable.
Examples from the Markup (cont.)<tableTitle>Table 3. Comparison of
Summary Measures of Money Income and Earnings by Selected Characteristics: 2001 and 2002</tableTitle>
<tableFootnote>Source: US Census Bureau, Current Population Survey, 2002 and 2003 Annual Social and Economic Supplements</tableFootnote>
<tableFootnote>Households and people as of March of the following year</tableFootnote>
<rowInfo>
<rowTitle>Age of Householder - 65 years and over</rowTitle>
<rowID>r015</rowID>
</rowInfo>
<colInfo>
<colTitle>2002 - Median money income - value</colTitle>
<colFootnote>dollars</colFootnote>
<colID>c005</colID>
</colInfo>
<cellInfo>
<cellValue rowID="r015" colID="c005">23,152</cellValue>
</cellInfo>
Examples from the Markup (cont.)
<rowInfo><rowTitle>Age of Householder - 65 years and over</rowTitle><rowID>r015</rowID>
</rowInfo>
<colInfo><colTitle>2002 - Median money income - value</colTitle><colFootnote>dollars</colFootnote><colID>c005</colID>
</colInfo><cellInfo>
<cellValue rowID="r015" colID="c005">23,152</cellValue>
</cellInfo>
<colInfo>
<colTitle>Aged 65 or older Total All units</colTitle>
<colID>c003</colID>
</colInfo>
<rowInfo>
<rowTitle>Source of Income - Earnings - Wages and salaries</rowTitle>
<rowID>r002</rowID>
<rowInfo>
<rowTitle>Source of Income - Earnings - Wages and salaries</rowTitle>
<rowID>r002</rowID>
</rowInfo>
<cellInfo>
<cellValue rowID="r002” colID="c003">19</cellValue>
</cellInfo>
Note that since these headings both contain keywords for age 65 or older that we can begin to think about ways to integrate these data.
What the Example Demonstrates
• Access: preserving data from table titles, row/column headings, and footnotes allows metadata essential for understanding to travel with the data values, and aids in search and retrieval
• Integration: once we have this essential metadata tagged, it becomes easier to use tag similarities to allow us to investigate options for displaying data from different tables in an integrated manner.
A Hierarchy of Integration
Low level of integration
High level of integration
• Searchable table titles
• Searchable row and column headings
• Linking of data values to row and column headings
• Linking of row and column headings to underlying survey variables
• Linking of analysis units, universe statements, concept definitions, across documents and agencies
• Linking of contextual information (such as footnotes) to tables, row/column headings, or data values
Our schema can provide the items beneath this dotted line.
Limited amount of metadata
Increasing amounts of metadata
Using the Hierarchy of Integration
Low level of integration
High level of integration
• Searchable table titles
• Searchable row and column headings
• Linking of data values to row and column headings
• Linking of row and column headings to underlying survey variables
• Linking of analysis units, universe statements, concept definitions, across documents and agencies
• Linking of contextual information (such as footnotes) to tables, row/column headings, or data values
Limited amount of metadata
Increasing amounts of metadata
Organization can determine where to“sit” on this hierarchy in terms of effort and level of integration desired
Using the Hierarchy of Integration
Low level of integration
High level of integration
• Searchable table titles
• Searchable row and column headings
• Linking of data values to row and column headings
• Linking of row and column headings to underlying survey variables
• Linking of analysis units, universe statements, concept definitions, across documents and agencies
• Linking of contextual information (such as footnotes) to tables, row/column headings, or data values
Limited amount of metadata
Increasing amounts of metadata
What have we learned about technology transfer
• Must demonstrate utility of research with working prototypes– Relationship Browser (and other
interface tools)– Metadata workstation in development
• Agencies need simplicity or to understand value of complexity to readjust resources– Hierarchy of integration used as a
conceptual tool– Provide training
Gov Stat
Further information
• Cahert@syr.edu• Project website (including demos of
Relationship Browser, an interactive glossary tool, etc.) at http://ils.unc.edu/govstat
Gov Stat
Recommended