23
September 23, 20 07 NSF TANGO BYU/RPI 1 TANGO Table Analysis for Generating Ontologies David W. Embley (BYU) & George Nagy (RPI) under NSF Awards 0414644 and 0414854 INFORMATION & KNOWLEDGE MANAGEMENT Dr. Maria Zemankova (a) Table Interpretation (b) Query by Table

September 23, 2007NSF TANGO BYU/RPI1 TANGO Table Analysis for Generating Ontologies David W. Embley (BYU) & George Nagy (RPI) under NSF Awards 0414644

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

September 23, 2007 NSF TANGO BYU/RPI 1

TANGO

Table Analysis for Generating OntologiesDavid W. Embley (BYU) & George Nagy (RPI)

under NSF Awards 0414644 and 0414854 INFORMATION & KNOWLEDGE MANAGEMENT

Dr. Maria Zemankova

(a) Table Interpretation

(b) Query by Table

September 23, 2007 NSF TANGO BYU/RPI 2

TABLE

INTERPRETED TABLE

MINI ONTOLOGY

GROWING ONTOLOGY

Wang Notation & XML

Wang Notation Tool

Ontology Editor

Annotated Semantic

Web Pages

Standard Ontology Language

(OWL)

Ontology Based Web

Services

Form Based Specification

Extraction Ontologies

Relational Databases

Query By Table

TANGO STEPS

September 23, 2007 NSF TANGO BYU/RPI 3

TABLE

INTERPRETED TABLE

MINI ONTOLOGY

GROWING ONTOLOGY

Wang Notation & XML

Wang Notation Tool

Ontology Editor

Annotated Semantic

Web Pages

Standard Ontology Language

(OWL)

Ontology Based Web

Services

Form Based Specification

Extraction Ontologies

Relational Databases

Query By Table

This presentation

September 23, 2007 NSF TANGO BYU/RPI 4

(a) Table Interpretation

HTML web pages

ConstructWang notation

Matlabtable

Extracttable

WangNotation

XMLtable

Confirm or correct

Confirm orcorrectMini Ontology

September 23, 2007 NSF TANGO BYU/RPI 5

Median Income tablehttp://www40.statcan.ca/l01/cst01/famil108a.htm?sdi=median%20income

September 23, 2007 NSF TANGO BYU/RPI 6

Median Income table displayed from Canada Statistics displayed in TANGO Wang Notation Tool

September 23, 2007 NSF TANGO BYU/RPI 7

Wang Notation

• Abstract table is specified by ordered pair (C,) - (category, delta)

• C is a finite set of labeled domains (header, sub headers of tables, etc)

represents each individual value within a table corresponding to C.

September 23, 2007 NSF TANGO BYU/RPI 8

Categories

• Two categories in previous table.

• CATEGORY 1: (Region_Virtual,{(Canada,phi), (Newfoundland and Labrador,phi), (Prince Edward Island,phi), (Nova Scotia,phi), (New Brunswick,phi), (Quebec,phi), (Ontario,phi), (Manitoba,phi), (Saskatchewan,phi),(Alberta,phi),(British Columbia,phi),(Yukon Territory,phi), (Northwest Territories,phi), (Nunavut,phi)})

• CATEGORY 2: (Year_Virtual, {(2001,phi), (2002,phi), (2003,phi),

(2004,phi), (2005,phi)})

September 23, 2007 NSF TANGO BYU/RPI 9

Content (leaf) cells• Delta Notation for two (of 15) rows:delta({Year_Virtual.2001,Region_Virtual.Canada})=53,500

delta({Year_Virtual.2002,Region_Virtual.Canada})=55,000

delta({Year_Virtual.2003,Region_Virtual.Canada})=56,000

delta({Year_Virtual.2004,Region_Virtual.Canada})=58,100

delta({Year_Virtual.2005,Region_Virtual.Canada})=60,600

delta({Year_Virtual.2001,Region_Virtual.Newfoundland and Labrador})=41,400

delta({Year_Virtual.2002,Region_Virtual.Newfoundland and Labrador})=43,200

delta({Year_Virtual.2003,Region_Virtual.Newfoundland and Labrador})=44,800

delta({Year_Virtual.2004,Region_Virtual.Newfoundland and Labrador})=46,100

delta({Year_Virtual.2005,Region_Virtual.Newfoundland and Labrador})=47,600

September 23, 2007 NSF TANGO BYU/RPI 10

XML Representation:Schema for (1) table (2) categories (3) data cells (4) augmentation

<InterpretedTable xsi:noNamespaceSchemaLocation="G:\RPI\XML\02_TableInterface.XS.070803.xml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <Table TableOID="Table2" Number="2" DocumentCitation="Wang's Thesis" Title="Wang table" Caption="Grades in 1991 and 1992"> <CategoryNodes> <CategoryNode CategoryNodeOID="C1" Label="Median Total Income"></CategoryNode> <CategoryNode CategoryNodeOID="C11" Label="Canada"></CategoryNode> <CategoryNode CategoryNodeOID="C12" Label="Newfoundland and Labrador"></CategoryNode> <CategoryNode CategoryNodeOID="C13" Label="Prince Edward Island"></CategoryNode> <CategoryNode CategoryNodeOID="C14" Label="Nova Scotia"></CategoryNode> <CategoryNode CategoryNodeOID="C15" Label="New Brunswick"></CategoryNode> <CategoryNode CategoryNodeOID="C16" Label="Quebec"></CategoryNode> <CategoryNode CategoryNodeOID="C17" Label="Ontario"></CategoryNode> <CategoryNode CategoryNodeOID="C18" Label="Manitoba"></CategoryNode> <CategoryNode CategoryNodeOID="C19" Label="Saskatchewan"></CategoryNode> <CategoryNode CategoryNodeOID="C110" Label="Alberta"></CategoryNode> <CategoryNode CategoryNodeOID="C111" Label="British Columbia"></CategoryNode> <CategoryNode CategoryNodeOID="C112" Label="Yukon Territory"></CategoryNode> <CategoryNode CategoryNodeOID="C113" Label="Northwest Territories"></CategoryNode> <CategoryNode CategoryNodeOID="C114" Label="Nunavut"></CategoryNode> <CategoryNode CategoryNodeOID="C2" Label="Year (Virtual)"></CategoryNode> <CategoryNode CategoryNodeOID="C21" Label="2001"></CategoryNode> <CategoryNode CategoryNodeOID="C22" Label="2002"></CategoryNode> <CategoryNode CategoryNodeOID="C23" Label="2003"></CategoryNode> <CategoryNode CategoryNodeOID="C24" Label="2004"></CategoryNode> <CategoryNode CategoryNodeOID="C25" Label="2005"></CategoryNode> </CategoryNodes> </Table><CategoryParentNodes> <CategoryParentNode CategoryParentNodeOID="C1"> <CategoryNodes>…… XML file for this table has ~350 lines of Object Identifier tags

September 23, 2007 NSF TANGO BYU/RPI 11

Verification tool: category headers for a selected content cell

September 23, 2007 NSF TANGO BYU/RPI 12

Verification tool:content cells for a selected header

September 23, 2007 NSF TANGO BYU/RPI 13

Verification tool:hierarchical category structure for a selected content cell

September 23, 2007 NSF TANGO BYU/RPI 14

(b) Query by Table

Income2002200320042005

Ontologyfrom many tables Database

Income2002 $45002003 $33002004 $12402005 $3400

QBTInterpretQuery Table

September 23, 2007 NSF TANGO BYU/RPI 15

Query Table Composed in MS-Excel by a person seeking informationfrom an ontology compiled from many web tables

September 23, 2007 NSF TANGO BYU/RPI 16

Display of automatically processed Query Table for human verification

September 23, 2007 NSF TANGO BYU/RPI 17

Wang notation for Query Table

September 23, 2007 NSF TANGO BYU/RPI 18

QBT identifies requested data

September 23, 2007 NSF TANGO BYU/RPI 19

URLs of tables in the Example Database• Median Total Income :

http://www40.statcan.ca/l01/cst01/famil108a.htm?sdi=median%20income* • Number of Induced Abortions:

http://www40.statcan.ca/l01/cst01/health40a.htm?sdi=abortions• Number of Divorces:

http://www40.statcan.ca/l01/cst01/famil02.htm?sdi=number%20divorces• Infant Mortality Rate:

http://www40.statcan.ca/l01/cst01/health21a.htm?sdi=infant%20mortality%20rate*• Trips By Canadians in Canada:

http://www40.statcan.ca/l01/cst01/arts26a.htm• Number of Homicides:

http://www40.statcan.ca/l01/cst01/legal12a.htm?sdi=homicide• Population:

http://www40.statcan.ca/l01/cst01/demo02a.htm?sdi=population• Number of Persons with Diabetes: http://www40.statcan.ca/l01/cst01/health54a.htm?sdi=diabetes• Number of Persons with Asthma: • http://www40.statcan.ca/l01/cst01/health50a.htm?sdi=asthma• University Degrees Awarded to Males: http://www40.statcan.ca/l01/cst01/educ51b.htm• University Degrees Awarded to Females: http://www40.statcan.ca/l01/cst01/educ51c.htm• Food services and drinking places (13 tables):

http://www40.statcan.ca/l01/cst01/serv24j

September 23, 2007 NSF TANGO BYU/RPI 20

Fields in the Example Database• IDENTIFIER • REGION • YEAR• NUMBER_OF_ABORTIONS• ABORTION_RATE• NUMBER_OF_DIVORCES• INFANT_MORTALITY_RATE• NUMBER_OF_TRIPS• MEDIAN_TOTAL_INCOME• POPULATION• NUMBER_OF_HOMICIDES• GENDER • INCIDENCE_OF_DIABETES• UNIVERSITY_DEGREES_AWARDED• INCIDENCE_OF_ASTHMA• RESTAURANT_OPERATING_REVENUE• RESTAURANT_OPERATING_EXPENSES• RESTAURANT_OPERATING_PROFIT_MARGIN• RESTAURANT_OPERATING_WAGES

September 23, 2007 NSF TANGO BYU/RPI 21

QBT fills in requested data from Example Database

September 23, 2007 NSF TANGO BYU/RPI 22

A current puzzle

How can QBT tell that these two query tables represent the same request? NB: Although plausible, both of these tables exemplify poor layout.

Year Region Gender Diabetics

2002

Alberta

Male XXFemale XX

Ontario

Male XXFemale XX

Year Region

Diabetics

Male Female

2002

Alberta XX XXOntario XX XX

September 23, 2007 NSF TANGO BYU/RPI 23

Next steps• Complete the conversion of Wang/XML

table descriptions to mini ontologies

• Improve the interface for generating cumulative ontology from mini ontologies

• Implement database generation from ontology

• Embed logging routines for statistical evaluation of time/error trade-offs