Upload
desmond-gibbens
View
219
Download
1
Tags:
Embed Size (px)
Citation preview
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Lexical Semantics and Ontologies
Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing
Paul Buitelaar
Language Technology Lab &
Competence Center Semantic Web
DFKI GmbH
Saarbrücken, Germany
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Overview
Day 1: Words and Meanings
Human language as a system
How do words relate to each other
Day 2: Words and Object Descriptions
Human language as a means of representation
How do words represent objects in the/a world
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Day 1 - Introduction
Words and Meanings
Synsets and Senses
Lexical Semantics in WordNet
Related Senses
Generative Lexicon and CoreLex
Domains and Senses
Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Meanings
Lexical Semantics in WordNet
Generative Lexicon and CoreLex
Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Lexical Semantic Resource Semantic Lexicon
Maps words to meanings (senses) Lexical Database
Machine readable (has a formal structure)
Freely available http://wordnet.princeton.edu/
WordNet
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database …The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically …WordNet … instantiates hypotheses based on results of psycholinguistic research … … expose such hypotheses to the full range of the common vocabulary
In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978)
Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. ``Introduction to WordNet: an on-line lexical database.'' In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244.
WordNet - Origins
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WordNet is organized around word meaning (not word forms as with traditional lexicons) Word meaning is represented by “synsets” Synset is a “Set of Synonyms”
Example {board, plank}
Piece of lumber {board, committee}
Group of people
Synsets
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Synsets are organized in hierarchies Defines:
generalization (hypernymy) specialization (hyponymy)
Example
{entity}
…
{whole, unit}
{building material}
{lumber, timber}
{board, plank}
Synset Hierarchy
hyponymyhypernymy
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Hierarchies (WordNet 1.7)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Hierarchy Example (WordNet 2.1)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Synsets and Senses Synsets represent word meaning
Words that occur in several synsets have a corresponding number of meanings (senses)
Example
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WordNet 2.1
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Synonymy Similar in meaning
Hypernymy/Hyponymy Generalization and Specialization
Meronymy Part-of
e.g. study, bathroom, ... meronym house
Antonymy Opposite in meaning
e.g. warm antonym cold
(Other) WordNet Relations
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Meanings
Lexical Semantics in WordNet
Generative Lexicon and CoreLex
Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Homonymy bank
embankment We walked along the bank of the Charles river.
institution Did he have an account at the HBU bank?
Systematic Polysemy school
group (of people) The school went for an outing.
(learning) processSchool starts at 8.30
organization The school was founded in 1910.
building The school has a new roof.
Systematic Polysemy
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Obj1 Obj4
Obj2 Obj3
Semantic Analysis Pragmatic Analysis
Lexical Itemsof the
Language
Objects in the World
school school
Obj1
Obj2 Obj3
Obj4
Semantic or Pragmatic?
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Anaphora Resolution [A long book heavily weighted with military technicalities]NP:event-physical_object-content ,
in this edition it is neither so long event nor so technical content as it was originally.
Metonymy The Boston office called
office > person
person part-of office
Bridging Peter bought a car. The engine runs well.
engine part-of car
The Boston office called. They asked for a new price. office > person
Underspecified Discourse Referents
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Type Coercion
I began the book
book > event
event ‘has-relation-with’ book
read is-a event
multifaceted representation of lexical semantics reflecting systematic / regular / logical polysemy
Generative Lexicon Theory
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Qualia Structure (Pustejovsky 1995)
Formal inheritance (is-a / hyponymy)book formal artifact, communication, …
Constitutive modification (part-of / meronymy)book constitutive section, …
Telic purpose („what is the object used for“)book telic read, …
Agentive causality („how did the object come about“)book agentive write, …
Generative Lexicon Theory
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Automatic Qualia Structure Acquisition CoreLex is an attempt to automatically acquire underspecified
lexical semantic representations that reflect systematic polysemy These representations can be viewed as shallow Qualia
Structures
Sense Distribution in WordNet Systematic polysemy can be empirically studied in WordNet by
observing sense distributions
>> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy (adapted from Apresjan 1973)
CoreLex (Buitelaar 1998)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
book 1.{publication} => artifact2.{product, production} => artifact 3.{fact} => communication 4.{dramatic_composition, dramatic_work} => communication 5.{record} => communication 6.{section, subdivision} => communication 7.{journal} => artifact
Systematic Polysemous Class
“artifact communication”
amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker ... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick
Systematic Polysemous Classes
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Noun1 Nounn
Basic Type1 Basic Type1
Systematic Polysemous Class1
Systematic Polysemous Classn
From WordNet to CoreLex
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
“animal natural_object”alligator broadtail chamois ermine lapin leopard muskrat ...
“natural_object plant”algarroba almond anise baneberry butternut candlenut cardamon ...
“action artifact group_social”artillery assembly band church concourse dance gathering institution ...
“action attribute event psychological”appearance concentration decision deviation difference impulse outrage …
“possession quantity_definite”cent centime dividend gross penny real shilling
Other Examples
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
CoreLex vs. WordNet
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Representation and Interpretation „Dotted Types“ (Pustejovsky)
Lexical types are either simple (human, artifact, ...) or complex (information AND physical_object)
Can be represented with a „dotted type“, e.g.
informationphysical_object
In (Cooper 2005) interpreted as a record type (a delicious lunch can take forever):
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Related Work Apresjan 1973
Regular Polysemy.
Nunberg & Zaenen 1992 Systematic polysemy in lexicology and lexicography.
Bill Dolan 1994 Word Sense Ambiguation: Clustering Related Senses.
Copestake & Briscoe 1996 Semi-productive polysemy and sense extension.
Peters, Peters & Vossen 1998 Automatic Sense Clustering in EuroWordNet.
Tomuro 1998 Semi-Automatic Induction of Systematic Polysemy from WordNet.
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Meanings
Lexical Semantics in WordNet
Generative Lexicon and CoreLex
Tuning WordNet to a Domain
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Reducing Ambiguity
WordNet has too many senses …
Reduce Ambiguity
Cluster related senses (CoreLex)
Tune WordNet to an application domain
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Domains and SensesDomains determine Sense Selection, e.g.
English: cell
prison cell in the Politics/Law domain
living cell in the Biomedical domain
English: tissue
living tissue in the Biomedical domain
cloth in the Fashion domain
German: Probe
test in the Biomedical domain
rehearsal in the Theater domain
>> Compute Domain-Specific Sense
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Approaches Subject Codes
Domain codes are in the dictionary
Topic Signatures Compute (domain-specific) context models from dictionary
definitions, domain corpora, web resources
Tuning of WordNet to a domain Top Down: Cucchiarelli & Velardi, 1998 Bottom Up: Buitelaar & Sacaleanu, 2001 Related recent work: McCarthy et al, 2004; Chan & Ng, 2005;
Mohammad & Hirst, 2006
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Subject Codes Subject Codes (as used in LDOCE) indicate a
domain in which a word is used in a particular sense Examples (2600 codes)
Sub-Field Codes MDZP (Medicine:Physiology)
Code Combinations MLCO (Meteorology+Building) e.g. lightning conductor MLUF (Meteorology+Europe+France) e.g. Mistral
high
SN (sounds)
DG (drugs)
ML (meteorology)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Adding Subject Codes to WordNet
Grouping Synsets together across POS
MEDICINE Nouns: doctor#1, hospital#1Verbs: operate#7
Grouping Synsets together across Sub-Hierarchies
SPORT life_form#1: athlete#1
physical_object#1: game_equipment#1
act#2 : sport#1
location#1 : playing_field#1
Magnini B. & Cavaglià G. Integrating Subject Field Codes into WordNet In: Proceedings LREC 2000
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WordNet DOMAINS
Sense WordNet synset and gloss Domains
1 Depository, financial institution, bank, banking concern, banking company (a financial institution) Economy
2 Bank (sloping land) Geography, Geology
3 Bank (a supply or stock held in reserve) Economy
4 Bank, bank building (a building) Architecture, Economy
5 Bank (an arrangement of similar objects) Factotum
6 Savings bank, coin bank, money box, bank (a container) Economy
7 Bank (a long ridge or pile) Geography, Geology
8 Bank (the funds held by a gambling house ) Economy, Play
9 Bank, cant, camber (a slope in the turn of a road) Architecture
10 Bank (a flight maneuver.) Transport
Bernardo Magnini, Carlo Strapparava, Giovanni Pezzuli, and Alfio Gliozzo. Using domain information for word sense disambiguation. In: Proceedings of the SENSEVAL2 workshop 2001.
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WSD with Subject Codes Match between set of words in the context of the ambiguous
word and the set of words (“neighborhoods”) in the definitions + sample sentences of all senses that share a Subject Code
Guthrie J. A. & Guthrie I. & Wilks Y. & Aidinejad H. Subject Dependent Co-Occurrence and Word Sense Disambiguation In: Proceedings of ACL 1991.
write safe sum
account person put
take money order
keep pay supply
paper draw cheque
bank: Economics
medicine product hold
origin place human
treatment blood hospital
use store
organ comb
bank: Medicine and Biology
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Topic Signatures from the Web Construct Topic Signatures for WordNet synsets/senses
Retrieve document collections from the web and use queries constructed for each WordNet sense, e.g.
Agirre E. & Ansa O. & Hovy E. & Martinez D. Enriching very large ontologies using the WWW In: Proc. of the Ontology Learning Workshop ECAI 2000
( boy AND ( altar boy OR ball boy OR … OR male person )AND NOT (man OR … OR broth of a boy OR son OR … OR mama’s boy OR black ) )
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Top Down Tuning – Cucchiarelli & Velardi
Automatically find the best set of (WordNet) senses that: “… represent at best the semantics of the domain”
“[has the] … ‘right’ level of abstraction, so as to
mediate between over-ambiguity and generality”
“… [is] balanced …, i.e. words should be evenly
distributed among categories”
Alessandro Cucchiarelli, Paola Velardi Finding a domain-appropriate sense inventory for semantically tagging a corpus. Natural Language Engineering 4/4, p.325-344, Dec. 1998.
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Methods Used
Create alternative sets of balanced categories by use of an adapted version of the Hearst/Schütze algorithm
Apply a scoring function to find the best set, with parameters:
Generality Highest possible level of generalization with a small number of categories is
preferred
Discrimination Power Different senses lead to different categories
(Domain) Coverage Words in the domain corpus that are represented by the selected categories
Average Ambiguity Ambiguity reduction is measured by the inverse of the average ambiguity of
all words
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Balanced Categories - Hearst/Schütze
Reduce WordNet noun hierarchy to a set of 726 disjoint categories, each consisting of a relatively large number of synsets and of an average size, with as small a variance as possible
Group categories together into a set of 106 super-categories according to mutual co-occurrence in a training corpus
Measure the frequency of categories on domain corpora
Hearst M. & Schütze H. Customizing a Lexicon to Better Suit a Computational Task In: Proceedings ACL SIGLEX Workshop 1993
12.200 legal_system, ...
11.782 government, ...
7.859 politics, ...
United States Constitution
26.459 religion, ...
25.062 breads, ...
24.356 mythology, ...
Genesis
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Generality
Generality of Category Set Ci: 1/DM(Ci)
Average Distance between the Categories of Ci and the topmost synsets.
n
jiji cdm
nCDM
1
)(*1
)(
Topmost SynSetTopmost SynSet
General SynSetGeneral SynSet
4 + 3 / 24 + 3 / 2 3 / 13 / 1
Ci1 Ci2
Ci = {Ci1, Ci2}
DM (Ci )= (3.5 + 3) / 2 = 3.25
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Discrimination Power
Discrimination Power of Category Set Ci:
(Nc(Ci) - Npc(Ci))/ Nc(Ci)
where Nc(Ci) is the number of words that reach at least one category of Ci and Npc(Ci) is the number of words that have at least two senses that reach the same category cij of Ci
Ci1 Ci = {Ci1 Ci2 Ci3 Ci4}
w1
Ci2
w2
Ci3
w3
Ci4
General SynsetGeneral Synset
SenseSense
Domain WordDomain Word
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Coverage & Average Ambiguity
Coverage of Category Set Ci: Nc(Ci)/W
where Nc(Ci) is the number of words that reach at least one category in Ci
Inverse of Average Ambiguity of Category Set Ci: 1/A(Ci)
where Nc(Ci) is the number of words that reach at least one category in Ci , and for each word w in this set, Cwj(Ci) is the number of categories in Ci reached
)(
1
)(*)(
1)(
CNCwjCAic
ji
ic
i CCN
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Best Category Set (WSJ)
Category Higher-level synset
C1 person, individual, someone, mortal, human, soul
C2 instrumentality, instrumentation
C3 written communication, written language
C4 message, content, subject matter, substance
C5 measure, quantity, amount, quantum
C6 action
C7 activity
C8 group action
C9 organization
C10 psychological feature
C11 possession
C12 state
C13 locationTop Down categories for the financial domain, based on the Wall Street Journal
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Sense Selection with WSJ SetSense Synset hierarchy for sense Top synset for sense
1 capital > asset possession (C11)
2 support > device instrumentality (C2)
4 document > writing written communication (C3)
5 accumulation > asset possession (C11)
6 ancestor > relative person (C1)
Sense Synset hierarchy for sense
3 stock, inventory > merchandise, wares >…
7 broth, stock > soup > …
8 stock, caudex > stalk, stem > …
9 stock > plant part > …
10 stock, gillyflower > flower > …
11 malcolm stock, stock > flower …
12 lineage, line of descent > … > genealogy > …
14 lumber, timber > …
Senses for stock - kept by domain tuning on the Wall Street Journal
Senses for stock - discarded by domain tuning on the Wall Street Journal
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Bottom Up Tuning – Buitelaar & Sacaleanu
Ranking of WordNet synsets according to a domain-specific corpus
Compute term relevance against reference corpus
Compute synset relevance according to term relevance (where term = synonym in synset)
Ranking can be used in WSD (similar to usage of ‘most frequent heuristic’)
Paul Buitelaar, Bogdan Sacaleanu Ranking and Selecting Synsets by Domain Relevance In: Proceedings of WordNet and Other Lexical Resources: Applications, Extensions and Customizations, NAACL 2001 Workshop, June 3/4 2001
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
TFIDF
))(
log(.)(wdf
Ntfwtfidf
tf(w) term frequency (number of word occurrences in a document)
df(w) document frequency (number of documents containing the word)
N number of all documents
tfIdf(w) relative importance of the word in the document
The word is more important if it appears several times in a target document
The word is more important if it appears in less documents
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Term and Synset Relevance
Term Relevance Relevance Score of Synset Members
where t represents the term, d the domain, N is the total number of domains
Synset Relevance Cumulated Relevance Score for a Synset
)log()log()|( ,t
dtdf
Ntfdtrlv
ct
dtrlvdcrlv )|()|(
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Extended Synset Relevance Lexical Coverage
Take Length of the Synset Into Account
[Gefängniszelle, Zelle] ("prison cell")[Zelle] ("living cell")
Hyponyms Take Hyponyms Into Account
[Zelle,Gefängniszelle,Todeszelle][Zelle,Körperzelle,Pflanzenzelle]
ct
dtrlvc
Tdcrlv )|()|(
ct
dtrlvc
Tdcrlv )|()|(
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Experiment – Medical Domain
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Related Recent Work
Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll
Finding predominant senses in untagged text. In Proc. of ACL 2004.
Chan, Yee Seng and Ng, Hwee Tou (2005)
Word Sense Disambiguation with Distribution Estimation. Proc. of IJCAI
2005.
Mohammad, Saif and Hirst, Graeme.
Determining word sense dominance using a thesaurus. Proc. of EACL
2006.
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Day 2 - Introduction
Words and Object Descriptions
Semantics on the Semantic Web
Semantic Web, Ontologies and Natural Language Processing
The Lexical Semantic Web
Knowledge Representation as Word Meaning
A Lexicon Model for Ontologies
Enriching Ontologies with Linguistic Information
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Object Descriptions
Semantics on the Semantic Web
The “Lexical Semantic Web”
A Lexicon Model for Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Web
Web Consists of Non-Interpreted Data
Text DBsImages Tables
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
WebMarkup
Interpretation through Markup - Categories
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
“Web 2.0”Markup
Interpretation through Markup – User Tags
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
“Web 2.0”Markup
Interpretation through Markup – User Tags
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic WebKnowledge
Markup
Formal Interpretation - Knowledge Markup
Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic WebKnowledge
Markup
Formal Interpretation - Knowledge Markup
Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic WebKnowledge
Markup
Formal Interpretation - Knowledge Markup
Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
KnowledgeMarkup Ontologies
Turns the Web into a Knowledge Base
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
KnowledgeMarkup Ontologies
Semantic Web Services
Enables Semantic Web Services …
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Intelligent Man-Machine Interface
KnowledgeMarkup Ontologies
Semantic Web Services
… and Intelligent Man-Machine Interface
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic Web Layer cake
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Resource Description Framework (RDF)
node1
DFKI GmbH
Kaiserslautern
name
location
www http://www.dfki.de
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
RDF : XML-based Representation
<?xml version=‘1.0’ ?><rdf:RDF
xmlns:rdf=“… rdf-syntax-ns#”xmlns:rdfs=“… rdf-schema#”xmlns=“http://example.org”>
<rdf:Description rdf:nodeID=“node1”><name>DFKI GmbH</name><location>Kaiserslautern</location><www rdf:resource=“http://www.dfki.de” />
</rdf:Description></rdf:RDF>
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
RDF Schema (RDFS)
Representation of classes and properties
Person Teacher
Student
rdf:Literal
name
Course
teaches
enrolledInis-
a
is-a
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
RDFS : XML-based Representation
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Web Ontology Language (OWL)
OWL adds further modelling vocabulary on top of RDFS, e.g. Class equivalence Property types (data vs. object property)
Based on Description Logics, three versions OWL Lite OWL DL OWL Full
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
OWL
Extended knowledge representation
Person Teacher
Student
rdf:Literal
name
Course
teache
s
enrolledInis-a
is-a
disjoint
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
OWL : XML-based Representation
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
XML – RDF – RDFS - OWL
XML Schema Namespaces Interpretation Context
RDF Schema
OWL
Formalization:
Class Definition, Properties
Formalization:
extended Class Definition,
Properties, Property Types
Data Types
XML
RDF
Syntax Semantics
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – What they are
Ontology refers to an engineering artifact a specific vocabulary used to describe a certain reality a set of explicit assumptions regarding the intended
meaning of the vocabulary
An Ontology is an explicit specification of a conceptualization [Gruber 93] a shared understanding of a domain of interest
[Uschold/Gruninger 96]
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Why you need them
Make domain assumptions explicit Easier to exchange domain assumptions Easier to understand and update legacy data
Separate domain knowledge from operational knowledge Re-use domain and operational knowledge separately
A community reference for applications
Shared understanding of what particular information means
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Applications of Ontologies NLP
Information Extraction, e.g. Buitelaar et al. 06, Mädche, Staab & Neumann 00, Nedellec, Rebholz
Information Retrieval (Semantic Search), e.g. WebKB (Martin et al. 00), OntoSeek (Guarino et al. 99), Ontobroker (Decker et al. 99)
Question Answering, e.g. Harabagiu, Schlobach & de Rijke, Aqualog (Lopez and Motta 04)
Machine Translation, e.g. Nirenburg et al. 04, Beale et al. 95, Hovy, Knight
Other Business Process Modeling, e.g. Uschold et al. 98 Digital Libraries, e.g. Amann & Fundulaki 99 Information Integration, e.g. Kashyap 99; Wiederhold 92 Knowledge Management (incl. Semantic Web), e.g. Fensel 01, Staab
& Schnurr 00; Sure et al. 00, Abecker et al. 97 Software Agents, e.g. Gluschko et al. 99; Smith & Poulter 99 User Interfaces, e.g. Kesseler 96
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies and Their Relatives
Catalogs
Glossaries & Terminologies
Thesauri
Semantic Networks
Formal isa
Formal Instance
General logicalconstraints
Axioms:Disjoint/Inverse…
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Thesauri – Examples : EuroVoc EuroVoc
covers terminology in all of the official EU languages for all fields (27) that concern the EU institutions, e.g. politics,
trade, law, science, energy, agriculture
MT 3606 natural and applied sciencesUF gene pool
genetic resourcegenetic stockgenotypeheredity
BT1 biologyBT2 life sciencesNT1 DNANT1 eugenicsRT genetic engineering (6411)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Thesauri – Examples : MeSH MeSH (Medical Subject Headings)
organized by terms (~ 250,000) that correspond to medical subjects for each term syntactic, morphological or semantic variants are given
MeSH Heading Databases, GeneticEntry Term Genetic DatabasesEntry Term Genetic Sequence DatabasesEntry Term OMIMEntry Term Online Mendelian Inheritance in ManEntry Term Genetic Data BanksEntry Term Genetic Data BasesEntry Term Genetic DatabanksEntry Term Genetic Information DatabasesSee Also Genetic Screening
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic Networks - Examples : UMLS
Unified Medical Language System integrates linguistic, terminological and semantic information Semantic Network consists of 134 semantic types and 54
relations between types
Pharmacologic Substance affects Pathologic FunctionPharmacologic Substance causes Pathologic FunctionPharmacologic Substance complicates Pathologic FunctionPharmacologic Substance diagnoses Pathologic FunctionPharmacologic Substance prevents Pathologic FunctionPharmacologic Substance treats Pathologic Function
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semantic Networks - Examples : GO
GO (Gene Ontology) Aligns descriptions of gene products in different databases,
including plant, animal and microbial genomes Organizing principles are molecular function, biological process
and cellular component
Accession: GO:0009292Ontology: biological processSynonyms: broad: genetic exchangeDefinition: In the absence of a sexual life cycle, the processes
involved in the introduction of genetic information to create a genetically different individual.
Term Lineage all : all (164142)GO:0008150 : biological process (115947)
GO:0007275 : development (11892)GO:0009292 : genetic transfer (69)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example I
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example II
OntologyF-Logic
similar
city
NeckarZugspitze
Geographical Entity (GE)
Natural GE Inhabited GE
countryrivermountain
instance_of
Germany
BerlinStuttgart
is-a
flow_through
located_in
capital_of
flow_through
flow_through
located_in
capital_of
367
length (km)
2962
height (m)
Design: Philipp Cimiano
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies for NLP
Information Retrieval Query Expansion
Machine Translation Interlingua
Information Extraction Template Definition Semantic Integration
Question Answering Question Analysis Answer Selection
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Information Extraction
Class-based Template Definition Allows for Reasoning over Extracted Templates with
Respect to the Ontology (see e.g. [Nedellec and Nazarenko 2005] for discussion)
Semantic Integration Extraction from Heterogeneous Sources (Text, Tables
and other Semi-Structured Data, Image Captions) – SmartWeb [Buitelaar et al. 06]
Multi-Document Information Extraction – ArtEquAKT [Alani et al. 2003]
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Question Answering Question Analysis
Ontology/WordNet-based Semantic Question Interpretation (e.g. [Pasca and Harabagiu 01])
Answer Selection Ontology/WordNet-based Reasoning for Answer Type-Checking
Ontology of Events [Sinha and Narayanan 05] Geographical Ontology, WordNet [Schlobach & de Rijke 04] WordNet [Pasca and Harabagiu 01]
Ontology-based Question Answering Derive Answers from a Knowledge Base (e.g. Aqualog [Lopez &
Motta 04])
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Life Cycle
Create/SelectDevelopment and/or Selection
PopulateKnowledge Base Generation
ValidateConsistency Checks
EvolveExtension, Modification
MaintainUsability Tests
DeployKnowledge Retrieval
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
NLP in the Ontology Life Cycle
Ontology PopulationInformation Extraction
Ontology LearningText Mining
KB RetrievalQuestion Answering
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Learning
Terms
(Multilingual) Synonyms
Concept Formation
Concept Hierarchy
Relations
Axiom Schemata
GeneralAxioms
Relation Hierarchy
mountain)iver,disjoint(r
z))yx)(z,capital_of(zx)(y,capital_ofy country(x)(x
GE Inhabitedcity city,capital CC
GE):rangeriver,:gh(domflow_throu
(c)Ref,ci(c),:country:c C
located_incapital_of R
Land} nation,{country,
.capital,.. city, nation, country, river,
Design: Philipp Cimiano
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Object Descriptions
Semantics on the Semantic Web
The “Lexical Semantic Web”
A Lexicon Model for Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Dictionary: Words and Senses
Represent interpretations of words through senses, very much like classes that are assigned to a word, e.g.
article
1. An individual thing or element of a class…2. A particular section or item of a series in a written document…3. A non-fictional literary composition that forms an independent part of a publication…4. The part of speech used to indicate nouns and to specify their application5. A particular part or subject; a specific matter or point
(as provided by http://dictionary.reference.com/)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology: Classes and Labels - I Ontologies assign labels (i.e. words) to a given class
In the COMMA ontology on document management the class article corresponds to sense 2 (‘section of a written document’):
http://pauillac.inria.fr/cdrom/ftp/ocomma/comma.rdfs
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Classes and Labels - II
In the GOLD ontology on linguistics, the class label article corresponds to sense 4 (‘part of speech ’):
http://emeld.org/gold
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
The Meaning of Director - I
The Semantic Web can be viewed as a large, distributed dictionary (or rather a semantic lexicon) in which we can look up the meaning of words, e.g. director
… as a ‘role’ (AgentCities ontology)
http://www-agentcities.doc.ic.ac.uk/ontology/shows.daml
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
The Meaning of Director - II
… as ‘head of a program’ (University Benchmark ontology)
http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Exploring the Lexical Semantic Web
Collect ontologies OntoSelect
Analyse the use of class/property labels
Treat class/property labels as lexical entries Normalize Organize by language
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontology Collection
OntoSelect Web Monitor on DAML, RDFS, OWL Files Download, Analyze and Store Included Information
and Metadata Class and Property Labels Multilingual Information Included Ontologies
Ontology Ranking and Selection Functionalities
http://olp.dfki.de/OntoSelect
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
OntoSelect
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Multilinguality on the Semantic Web
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Multilingual Labels
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
“Lexical Semantic Ambiguity”
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Words and Object Descriptions
Semantics on the Semantic Web
The “Lexical Semantic Web”
A Lexicon Model for Ontologies
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III (continued)
Campus University
“Fakultät”
located_at
is_part_of
Student
studies_at
Staff
works_at
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III (continued)
Campus University
“Fakultät”
located_at
is_part_of
Fakultät
has_German_term
School
has_US_English_term Faculteit
has_Dutch_term
Student
studies_at
Staff
works_at
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Ontologies – Example III (continued)
University
“Fakultät”is_part_of
Term
has_term
Fakultät
instance_of
DE
language
faculteit
instance_of
NL
language
school
EN-US
language
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Semiotic Triangle Ogden & Richards, 1923 based on Structural Linguistics studies (de Saussure, 1916) adopted in Knowledge Representation (e.g. Sowa, 1984)
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
LingInfo Model – Simplified
Design: Michael Sintek
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
LingInfo Model
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
LingInfo Instances - Example
Fußballspielers
„of the football player“
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
LingInfo Predicate-Arg Structure
Design: Anette Frank
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Conclusions
© Paul Buitelaar: Lexical Semantics and Ontologies
Tutorial at ACL/HCSnet, July 2006, Melbourne, Australia
Conclusions WordNet: Appropriate Use may include
Introduction of underspecified senses (sense grouping) Tuning to a domain
The “Lexical Semantic Web” The Semantic Web (and Web 2.0) is a potentially
rich resource for (formal) lexical semantics Mining such resources for lexical semantics (i.e.
compilation of a distributed semantic lexicon) only just started
Ontologies to be extended with linguistic/lexical information