23
The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National Statistical Knowledge Network Carol A. Hert Syracuse University NSF Grants EIA 0131824 and EIA 0129978 Principal Investigators: Gary Marchionini, Stephanie Haas, Ben Shneiderman, Catherine Plaisant, and Carol Hert Gov Stat

The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

The GovStat Projectils.unc.edu/govstat

Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the

National Statistical Knowledge Network

Carol A. Hert

Syracuse University NSF Grants EIA 0131824 and EIA 0129978

Principal Investigators: Gary Marchionini, Stephanie Haas, Ben Shneiderman, Catherine Plaisant, and Carol Hert

Gov Stat

Page 2: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Project Partners

• Bureau of Labor Statistics

• Census Bureau

• Center for Health Statistics

• Social Security Administration

• National Agriculture Statistical Service

• Energy Information Administration

Gov Stat

Page 3: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Project Goals

• To create an integrated model of user access to and use of US government statistical information (The Statistical Knowledge Network)

• Design and test prototype interface tools to support finding and using statistics

• To support integration (technical and intellectual) of statistical data

Gov Stat

Page 4: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Statistical Knowledge Network Architecture

Agencies

SKN Registry

ActionsContribute

FindDisplay

Annotate UnderstandManipulate Collaborate

…..

………….

ObjectsActions

Private Work Space

Objects Actions

Private Work Space

Ontology Rules & Constraints

SKN Consortium

…..

Objects Reports metadataTables metadataPeople metadata

GlossaryAnnotations

Objects Actions

Private Work Space

Objects Actions

Private Work Space

Page 5: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Statistical Knowledge Network Architecture

• Enable statistical agencies to:– Reach wider audiences

– Standardize strategies for transmission, retrieval & use

– Reduce costs

– Facilitate cooperation among agencies & organizations

Goal: Increase find-ability, understand-ability & use of government statistics

Page 6: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Metadata as a Linchpin of Integration of Diverse Statistical Information

• Metadata during statistical information seeking

• User studies of statistical information use• Building a schema to support these activities• A hierarchy of integration (and the metadata

to support it)

• With a few closing words on technology transfer! Gov Stat

Page 7: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Metadata for Statistical Information Seeking

• The user challenges:– Who has the relevant data?

• decentralized statistical system

– Finding data that map to the set of topical, time period, geographic and other requirements

• Interface tool relying on metadata (currently harvested automatically from webpages)– Supports exploration prior to access

Gov Stat

Page 8: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

.go

vRelation Browser with all EIA pages

Page 9: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

User Studies of Metadata and Statistical Information Use

1. metadata requirements for understanding tables (Hert & Hernández, 1999).

2. metadata requirements in a variety of integration tasks (Denn, Haas, & Hert, 2003).

3. Statistical comparisons particularly investigating the types of comparisons made and the rules experts employ during those comparison processes (Hert, 2004).

Gov Stat

Page 10: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Some insights from the studies• Some types needed:

– Definitions– Survey methodology– Rationales and information on differences (what is the

difference between concept 1 and concept 2)– Currency of information (what’s the latest data I can get,

when will more data be available, etc.)– Table structure– Interface design

• Supporting use requires significant amounts of metadata including some not easily generated (automatically or otherwise) Gov Stat

Page 11: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Some insights from the studies

• Comparing is a key activity in integrating statistics    

•  Business rules for operating on the metadata necessary to support user tasks

• Metadata supports help tools, help tools will be necessary to support metadata usage

Gov Stat

Page 12: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Metadata Schema Philosophy

• To provide sub-document level access and integration across documents and agencies.

• To provide a minimal set of metadata elements necessary while allowing for extensibility.

• To achieve these goals in a manner that enables efficient transfer to agencies.

Gov Stat

Page 13: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Our Schema in Action: An Example

• Scenario: The fact that the percentage of older people in the population of the US is increasing raises a question about the overall economic status of this group. In particular, we are interested in people who are retired or no longer in the work force and over a certain age (65 or older). We want to know the following things to understand the economic status of this particular group of people:– Income level (in terms of median income) compared to the

general (whole) population– Sources of income– Employment status

Page 14: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Examples from the Markup

• Table markup:– For each table, the schema encodes the table

title, each row or column heading, and the data values in the table.

• Each data value element references the row and column heading elements associated with it.

• Footnotes are encoded at the highest level to which they apply – the table level, the row/column level, or the individual data value level.

Page 15: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Examples from the Markup <tableInfo>

<tableTitle>Table 1.1 Percentage with income from specified source, by age, marital status, and sex of nonmarried persons</tableTitle>

<rowInfo><rowTitle>Source of Income -

Earnings</rowTitle><rowID>r001</rowID>

</rowInfo><rowInfo>

<rowTitle>Source of Income - Earnings - Wages and salaries</rowTitle>

<rowID>r002</rowID></rowInfo><rowInfo>

<rowTitle>Source of Income - Earnings - Self-employment</rowTitle>

<rowID>r003</rowID></rowInfo><rowInfo>

<rowTitle>Source of Income - Retirement benefits</rowTitle>

<rowID>r004</rowID></rowInfo><rowInfo>

<rowTitle>Source of Income - Retirement benefits - Social Security</rowTitle>

<rowFootnote>Social Security includes retired-worker benefits, dependents' or survivors' benefits, disability benefits, transitionally insured benefits, or

special age-72 benefits</rowFootnote><rowID>r005</rowID>

</rowInfo>...

In order to preserve category information, individual row and column headings include the category labelling.

Including the category labelling within the row/column headings improves access to data embedded within tables by making the category information searchable.

Page 16: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Examples from the Markup (cont.)<tableTitle>Table 3. Comparison of

Summary Measures of Money Income and Earnings by Selected Characteristics: 2001 and 2002</tableTitle>

<tableFootnote>Source: US Census Bureau, Current Population Survey, 2002 and 2003 Annual Social and Economic Supplements</tableFootnote>

<tableFootnote>Households and people as of March of the following year</tableFootnote>

<rowInfo>

<rowTitle>Age of Householder - 65 years and over</rowTitle>

<rowID>r015</rowID>

</rowInfo>

<colInfo>

<colTitle>2002 - Median money income - value</colTitle>

<colFootnote>dollars</colFootnote>

<colID>c005</colID>

</colInfo>

<cellInfo>

<cellValue rowID="r015" colID="c005">23,152</cellValue>

</cellInfo>

Page 17: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Examples from the Markup (cont.)

<rowInfo><rowTitle>Age of Householder - 65 years and over</rowTitle><rowID>r015</rowID>

</rowInfo>

<colInfo><colTitle>2002 - Median money income - value</colTitle><colFootnote>dollars</colFootnote><colID>c005</colID>

</colInfo><cellInfo>

<cellValue rowID="r015" colID="c005">23,152</cellValue>

</cellInfo>

<colInfo>

<colTitle>Aged 65 or older Total All units</colTitle>

<colID>c003</colID>

</colInfo>

<rowInfo>

<rowTitle>Source of Income - Earnings - Wages and salaries</rowTitle>

<rowID>r002</rowID>

<rowInfo>

<rowTitle>Source of Income - Earnings - Wages and salaries</rowTitle>

<rowID>r002</rowID>

</rowInfo>

<cellInfo>

<cellValue rowID="r002” colID="c003">19</cellValue>

</cellInfo>

Note that since these headings both contain keywords for age 65 or older that we can begin to think about ways to integrate these data.

Page 18: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

What the Example Demonstrates

• Access: preserving data from table titles, row/column headings, and footnotes allows metadata essential for understanding to travel with the data values, and aids in search and retrieval

• Integration: once we have this essential metadata tagged, it becomes easier to use tag similarities to allow us to investigate options for displaying data from different tables in an integrated manner.

Page 19: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

A Hierarchy of Integration

Low level of integration

High level of integration

• Searchable table titles

• Searchable row and column headings

• Linking of data values to row and column headings

• Linking of row and column headings to underlying survey variables

• Linking of analysis units, universe statements, concept definitions, across documents and agencies

• Linking of contextual information (such as footnotes) to tables, row/column headings, or data values

Our schema can provide the items beneath this dotted line.

Limited amount of metadata

Increasing amounts of metadata

Page 20: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Using the Hierarchy of Integration

Low level of integration

High level of integration

• Searchable table titles

• Searchable row and column headings

• Linking of data values to row and column headings

• Linking of row and column headings to underlying survey variables

• Linking of analysis units, universe statements, concept definitions, across documents and agencies

• Linking of contextual information (such as footnotes) to tables, row/column headings, or data values

Limited amount of metadata

Increasing amounts of metadata

Organization can determine where to“sit” on this hierarchy in terms of effort and level of integration desired

Page 21: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Using the Hierarchy of Integration

Low level of integration

High level of integration

• Searchable table titles

• Searchable row and column headings

• Linking of data values to row and column headings

• Linking of row and column headings to underlying survey variables

• Linking of analysis units, universe statements, concept definitions, across documents and agencies

• Linking of contextual information (such as footnotes) to tables, row/column headings, or data values

Limited amount of metadata

Increasing amounts of metadata

Page 22: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

What have we learned about technology transfer

• Must demonstrate utility of research with working prototypes– Relationship Browser (and other

interface tools)– Metadata workstation in development

• Agencies need simplicity or to understand value of complexity to readjust resources– Hierarchy of integration used as a

conceptual tool– Provide training

Gov Stat

Page 23: The GovStat Project ils.unc.edu/govstat Integration of Data and Interfaces to Enhance Human Understanding of Government Statistics: Toward the National

Further information

[email protected]• Project website (including demos of

Relationship Browser, an interactive glossary tool, etc.) at http://ils.unc.edu/govstat

Gov Stat