Upload
dulcie-ashlyn-booth
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Learning the Semantic Meaning of a Concept from the Web
Yang Yu and Yun PengMay 30, [email protected], [email protected]
2
The Problem
Manually preparing training data for each concept in text classification based ontology mapping is expensive.
LIVING_THINGS
ANIMAL PLANT
HUMAN
MAN
CAT
WOMAN
TREE
ARBOR
GRASS
FRUTEX
Exemplars
3
Our Approach
Automatically collecting training data.
Benefits Reduce the amount of human work
http://www.google.com/
4
Overview
Background The semantic Web and ontology Ontology Mapping
Approach Prototype System Experimental Results
WEAPONS ontology LIVING_THINGS ontology
Limitations and Conclusions
5
Semantic Web and Ontology Mapping The Semantic Web
“an extension of the current web” ontology files and programs that use them
Ontology Mapping Interoperability problem Mapping
r = f (Ci, Cj) where i=1, …, n and j=1, …, m; r {equivalent, subClassOf, superClassOf,
complement, overlapped, other}
6
Approaches to Ontology Mapping Manual mapping String Matching Text classification
the semantic meaning of a concept can be reflected in the training data (exemplars) that use the concept
Probabilistic feature model Classification Results highly dependent on the quality of
exemplars
7
Motivation and Proposal
Preparing exemplars manually is costly
Billions of documents available on the web Search engines
8
The Proposal
Using the concept defined in an ontology and the semantic information to form a query and processing the search results to obtain exemplars
Verification Build a prototype system Check ontology mapping results
9
System overview – Part I
Ontology A
Parser
Processor
Search Engine
HTML Docs
Queries
Text Files
Links to Web Pages
WWW
Retriever
Retriever
1. Whole file
2. Only sentences containing search keywords
10
System overview– Part II
Ontology A Ontology BModel Builder
Mapping Results
Text Files (B)
Calculator
Feature Model
Text Files (A)
Rainbow
Rainbow
11
The model builder
LIVING_THINGS
ANIMAL PLANT
HUMAN
MAN
CAT
WOMAN
TREE
ARBOR
GRASS
FRUTEX
LIVING_THINGS
ANIMAL PLANT
HUMAN
MAN
CAT
WOMAN
TREE
ARBOR
GRASS
FRUTEX
Mutually exclusive and exhaustive Leaf classes C+ and C-
12
The calculator
Naïve Bayes text classifier tends to give extreme values (1/0)
Calculating conditional probabilities from raw classification data by taking average
13
An Example of the Calculator
APC
TANK-VEHICLE
AIR-DEFENSE-GUN
SAUDI-NAVAL-MISSILE-CRAFT
Classifier
200
10SAUDI-NAVAL-MISSILE-CRAFT
20AIR-DEFENSE-GUN
170TANK-VEHICLE
Num. of exemplars
Categories in WeaponsA.n3
P(TANK-VEHICLE | APC) = 170 /200= 0.85
P(AIR-DEFENSE-GUN | APC) = 0.10
P(SAUDI-NAVAL-MISSILE-CRAFT| APC) = 0.05
Ontology for Weapons
14
Experiments with WEAPONS ontology WeaponsA.n3 and WeaponsB.n3
Information Interoperation and Integration Conference (http://www.atl.lmco.com/projects/ontology/i3con.html)
Both have over 80 classes defined More than 60 classes are leaf classes
15
WeaponsA.n3Part of WeaponsA.n3
TANK-VEHICLE-
MODERN-NAVAL-SHIP
WEAPON
CONVENTIONAL-
WEAPON
WARPLANEARMORED-COMBAT-VEHICLE
PATROL-CRAFT
AIRCRAFT-CARRIER
SUPER-ETENDARD
16
WeaponsB.n3Part of WeaponsB.n3
TANK-VEHICLE-
MODERN-NAVAL-SHIP
WEAPON
CONVENTIONAL-WEAPON
WARPLANEARMORED-COMBAT-VEHICLE
LIGHT-TANK APC
PATROL-WARTER-CRAFT
AIRCRAFT-CARRIER
LIGHT-AIRCRAFT-CARRIER
PATROL-BOAT-RIVER
PATROL-BOAT
FIGHTER-PLANE
FIGHTER-ATTACK-PLANE
SUPER-ETENDARD-FIGHTER
17
Expected Results
TANK-VEHICLESUPER-ETENDARD
LIGHT-TANK
APCPATROL-WARTER-CRAFT
AIRCRAFT-CARRIER
LIGHT-AIRCRAFT-CARRIER
PATROL-BOAT-RIVER
PATROL-BOAT
FIGHTER-PLANE
FIGHTER-ATTACK-PLANE
SUPER-ETENDARD-FIGHTER
PATROL-CRAFT
WeaponsA.n3
WeaponsB.n3
18
A Typical Report
APCAPC
SELF-PROPELLED-ARTILLERY 0.357180681
TANK-VEHICLETANK-VEHICLE 0.2771392740.277139274
ICBM 0.10423636
MRBM 0.080615147
TOWED-ARTILLERY 0.054724102
SUPPORT-VESSEL 0.023265054
PATROL-CRAFT 0.019570325
MOLOTOV-COCKTAIL 0.015032411
TORPEDO-CRAFT 0.013677696
SUPER-ETENDARD 0.009856519
MORTAR 0.00772997
AIR-DEFENSE-GUN 0.002997109
MACHINE-GUN 0.000211772
MOLOTOV-COCKTAIL 0.000187578
TRUCK-BOMB 0.000171675
AS-9-KYLE-ALCM 0.000156403
ARABIL-100-MISSILE 0.000111953
AL-HIJARAH-MISSILE 7.65E-05
OGHAB-MISSILE 7.12E-05
BADAR-2000 4.28E-05
P(APC | Ci) where i = 1 … 63
...... ……
19
classes with highest conditional probability
0.38MRBM0.49AIRCRAFT-CARRIERFIGHTER-PLANE
0.3TANK-VEHICLE0.56SILKWORM-MISSILE-MODLIGHT-TANK
0.66PATROL-CRAFT0.51SILKWORM-MISSILE-MODPATROL-BOAT
0.54PATROL-CRAFT0.65SILKWORM-MISSILE-MODPATROL-BOAT-RIVER
0.52PATROL-CRAFT0.28SILKWORM-MISSILE-MODPATROL-WATERCRAFT
0.38MRBM0.83SILKWORM-MISSILE-MODFIGHTER-ATTACK-PLANE
0.51MRBM0.66SILKWORM-MISSILE-MODSUPER-ETENDARD-FIGHTER
0.36SELF-PROPELLED-ARTILLERY0.46
SILKWORM-MISSILE-MODAPC
0.57AIRCRAFT-CARRIER0.65AIRCRAFT-CARRIERLIGHT-AIRCRAFT-CARRIER
ProbSentences with KeywordsProbWhole fileNew Classes
20
LIVING_THINGS
ANIMAL PLANT
HUMAN
MAN
CAT
WOMAN
TREE
ARBOR
GRASS
FRUTEX
GIRL
Level1
Level2
Level3
Experiment with LIVING_THINGS ontology P(MAN | HUMAN) P (WOMAN | HUMAN) Find a mapping for GIRL
HUMAN
MAN
WOMAN
21
Experiment Results (1)
HUMAN
MAN
WOMAN
Results of experiment (1)
P (MAN | HUMAN) = 0.62
P (WOMAN | HUMAN) = 0.38
22
LIVING_THINGS
ANIMAL PLANT
HUMAN
MAN
CAT
WOMAN
TREE
ARBOR
GRASS
FRUTEX
GIRL
Level1
Level2
Level3
Experiment Results (2)
1P(WOMAN | GIRL)
0P(MAN | GIRL)
0.30P(CAT | GIRL)
0.70P(HUMAN | GIRL)
0.23P(PLANT | GIRL)
0.76P(ANIMAL | GIRL)
0P(PYCNOGONID | GIRL)
0.43P(HUMAN | GIRL)
0.01P(CAT | GIRL)
0.56P(DOG | GIRL)
0.37P(MAN | GIRL)
0.63P(WOMAN | GIRL)
0.08P(CAT | GIRL)
0.92P(HUMAN | GIRL)
0.17P(PLANT | GIRL)
0.83P(ANIMAL | GIRL)
With clustering on exemplarsWithout clustering on exemplars
with additional classes
clusty.com
23
Additional Experiments: Different Queries
Living+things+plant+Plantae+tree+arborarbor
Living+things+plant+Plantae+tree+Frutexfrutex
Living+things+plant+Plantae+grassgrass
Living+things+plant+Plantae+treetree
Living+things+animal+Animalia+human+intelligent+woman+femalewoman
Living+things+animal+Animalia+human+intelligent+man+maleman
Living+things+animal+Animalia+human+intelligenthuman
Living+things+animal+Animalia+cat+Felidaecat
Living+things+plant+Plantaeplant
Living+things+animal+Animaliaanimal
Living+thingsliving+things
QueriesConcepts
Queries augmented with class properties
24
Experiment Results (3)
0.070.09P(WOMAN | HUMAN)
0.930.91P(MAN | HUMAN)
Keyword SentencesWholeConditional Probability
0.840.86P(WOMAN | GIRL)
0.160.14P(MAN | GIRL)
0.170.22P(CAT | GIRL)
0.830.78P(HUMAN | GIRL)
0.170.1P(PLANT | GIRL)
0.830.9P(ANIMAL | GIRL)
Keyword SentencesWholeConditional Probability
HUMAN
MAN
WOMAN
LIVING_THINGS
ANIMAL PLANT
HUMAN
MAN
CAT
WOMAN
TREE
ARBOR
GRASS
FRUTEX
GIRL
Level1
Level2
Level3
Results of experiment (1) with new queries
Results of experiment (2) with new queries
25
Limitation 1: Relevancy != similarity
Search Results for concept A
Text related to concept A
Text against concept AText for concept A
i.e. desired exemplars
Text for related concept B
26
Limitation 2: “Conditional Probability” An exemplar is a combination of strings that
represent some usage of a concept. An exemplar is not an instance of a concept. The way we calculate conditional probability
is an estimation.
HUMAN
MAN
WOMAN
27
Limitation 3: Popularity != relevancy Limited by a search engine’s algorithm
PageRank™ Popularity does not equal relevancy
Weight cannot be specified for words in a search query
28
Related Research
UMBC OntoMapper Sushama Prasad, Peng Yun and Finin Tim, A Tool for Mapping between Two Ontologies
Using Explicit Information, AAMAS 2002 Workshop on Ontologies and Agent Systems, 2002. CAIMEN
Lacher S. Martin and Groh Georg ,Facilitating the Exchange of Explicit Knowledge through Ontology Mappings, Proc of the Fourteenth International FLAIRS conference, 2001.
GLUE Doan Anhai, Madhavan Jayant, Dhamankar Robin, Domingos Pedro, and Halevy Alon,
Learning to Match Ontologies on the Semantic Web, WWW2002, May, 2002.
Google Conditional Probability P(HUMAN | MAN) = 1.77 billion / 2.29 billion = 0.77 P(HUMAN | WOMAN) = 0.6 billion / 2.29 billion = 0.26 Wyatt D., Philipose M., and Choudhury T., Unsupervised Activity Recognition Using
Automatically Mined Common Sense. Proceedings of AAAI-05. pp. 21-27.
29
Conclusion and Future Work Text retrieved from the web can be used as
exemplars for text classification based ontology mapping Many parameters affect the quality of the
exemplars There are noise contained in the processed
documents Future work
Clustering Restrict search to highly relevant sites and web
resources