View
224
Download
0
Category
Tags:
Preview:
Citation preview
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
A Multi-Layered, XML-Based Approach to the Integration of Linguistic and Semantic
AnnotationsThierry Declerck, Paul BuitelaarUniversity of the Saarland & DFKI GmbH
Saarbrücken, Germany
In this presentation are also slides and graphics included, which are taken from three presentations at the EUROLAN 2003 in Bucharest. Authors are P.Vossen (Wordnet, EuroWordNet, Global Wordnet), A. Lenci (Computational Lexicons and the Semantic Web) and Srini Narayanan (FrameNet Meets the Semantic Web). Also included are graphics from M. Fernández-López and A.
Gómez-Pérez Asun Gomez Perez (UPM) from the deliverable 1.2 of the Esperonto Project
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Overview
Semantic Web Applications of LT Annotation of Web Documents with Ontology-
based Metadata (Knowledge Markup) Ontology Learning through Text Mining from
Annotated Corpora
Integration of Annotations Use of Different Tools Use of Different Knowledge Sources
Motivations
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Overview
… Linguistic and Semantic Annotations Linguistic: e.g. PoS, Lemma, Phrase Structure Semantic: e.g. Concepts, Relations, Events
Objectives: Integration of…
… Annotations from Different Resources e.g. Different Domains… Annotations in Different Formats e.g. from Different Tools
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Knowledge Markup and Knowledge Extraction
Text/Speech/Image-VideoText/Speech/Image-Video
Text/Speech/Media Mining
Concepts, Relations, EventsConcepts, Relations, Events
Linguistic and Media Analysis
Linguistic, Low-level Image and Semantic Annotations
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
AnnotationsProjects, Tools and Resources
Projects MuchMore: Cross-lingual Information Retrieval, Medical Domain Mumis: Content-based Multimedia Retrieval, Soccer DomainTools and Resources MuchMore: Integration of Shprot (TnT, Mmorph, Chunkie) with
Semantic Tagging Tools (UMLS – Medical Semantic Resource, EuroWordNet)
Mumis: Schug, Integration of SPPC with Rule-based Chunking and Shallow Dependency
Analysis, Event Structure (Mumis Soccer Ontology)
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
document sentence
umlsterms
xrceterms
ewnterms
semrels
gramrels
chunks
text
cui
sense
umlsterm
xrceterm
ewnterm
semrel
gramrel
chunk
token
to
id from
to
offset
from
id
code
typeterm2term1id
pref tui
code pref tui
type
id
to
id from
type
id pos
lemma
msh
cui msh
AnnotationsMuchMore
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions.
<text> <token id="w1" pos="NN">Balint</token> <token id="w2" pos="NN">syndrom</token> <token id="w3" pos="VBZ" lemma="be">is</token> <token id="w4" pos="DT" lemma="a">a</token> <token id="w5" pos="NN" lemma="combination">combination</token> <token id="w6" pos="IN" lemma="of">of</token> <token id="w7" pos="NNS" lemma="symptom">symptoms</token> ... <token id="w20" pos="JJ" lemma="spatial">spatial</token> <token id="w21" pos="NN" lemma="perception">perception</token> <token id="w22" pos="CC" lemma="and">and</token> <token id="w23" pos="NN" lemma="representation">representation</token> ...</text>
<chunks><chunk id="c1" from="w1" to="w2" type="NP"/><chunk id="c7" from="w20" to="w23" type="NP"/></chunks>>
AnnotationsMuchMore: Linguistic
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Balint syndrom is a combination of symptoms including simultanagnosia, a disorder of spatial and object-based attention, disturbed spatial perception and representation, and optic ataxia resulting from bilateral parieto-occipital lesions.
<umlsterm id="t7" from="w20" to="w21"><concept id="t7.1" cui="C0037744" preferred="Space Perception" tui="T041"> <msh code="F2.463.593.778"/> <msh code="F2.463.593.932.869"/></concept>
</umlsterm>
<umlsterm id="t8" from="w26" to="w26"><concept id="t8.1" cui="C0029144" preferred="Optics" tui="T090"> <msh code="H1.671.606"/></concept>
</umlsterm>
<semrel id="r7" term1="t7.1" term2="t8.1" reltype="issue_in"/>
<ewnterm id="e2" from="w21" to="w21"><sense offset="0487490"/><sense offset="3955418"/><sense offset="4002483"/>
</ewnterm>
AnnotationsMuchMore: Semantic
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Document SentenceParagraph
PP
VG
NP
NE
AP
AdvP
Subord-Clause
AnnotationsMumis
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
AP
TYPE
STRUK
AP_AGR
STRING
AP_HEADW
AnnotationsMumis
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
VG
TYPE
VG_SUBCAT_STEM
STRING
KLAMMER
VG_STRG
SENT_STRING
VG_TYPE
VG_AGR
STRUK
VG_HEAD
...
VG
W
AnnotationsMumis
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
W
INFL
STRING
CLAUSE_PRED_SUBCAT
CLAUSE_PP_LIST
...
CLAUSE_TYPE
TC
CLAUSE_SUBJ
CLAUSE_PRED_STRG
STEM
TYPE
SENT_STRING
CLAUSE_VG_LIST
CLAUSE_PRED_AGR
CLAUSE
POS
CLAUSE_PP_ADJUNKT
CLAUSE_NP_LIST
AnnotationsMumis
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
AnnotationsIntegration
Objectives Integrate Linguistic and Semantic Information from the MuchMore and Mumis Annotations, e.g.
Enrich MuchMore: Head/Complement of Chunks, Clauses Enrich Mumis: EuroWordNet, Medical Ontology
Approach MuchMore uses Multilayered Annotation over Indexes (‘standoff’) Introduce Mumis Annotations as Additional LayersProblems Integration of Overlapping Layers (i.e. Additional Attributes)
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Industrie, Handel und Dienstleistungen werden in der ersten Liste aufgeführt, wobei die in Klammern gesetzten Zahlen auf die Mutterfirmen hinweisen.
(Industry, trade and services are mentioned in the first list, in which numbers within brackets point to parent companies.) <chunks> <chunk id="c1" from="w1" to="w5" type="NP" head=”w1,w3,w5”/> <chunk id="c2" from="w6" to="w6" type="VG"/> <chunk id="c3" from="w7" to="w10" type="PP" head=”w7” complement=”w8,w9,w10”/> <chunk id="c4" from="w11" to="w1" type="VG"/> ….</chunks> <clauses> <clause id="cl1" from="c1" to="c4" pred_struct="c2 c4" GF_Subj="c1"/> <clause id="cl2" from="c6" to="c9" pred_struct="c9" GF_Subj="c6"/></clauses>
AnnotationsMumis: Linguistic
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Ein Freistoss von Christian Ziege aus 25 Metern geht über das Tor. (A 25-meter free-kick by Christian Ziege goes over the goal.) <clauses> <clause id="cls1" from="c1" to="c4" pred_struct="c3" GF_Subj="c1"/></clauses> <events> <event id="e1" clause=”cls1” event-name=”free-kick”> <arguments> <argument id="arg1" name="player” value=”w4, w5”/> <argument id="arg2" name="location” value=”25-meter”/> <argument id="arg3" name="time” value=”07:00”/> </arguments> </event>
<event id="e2" clause=”cls1” event-name=”goal-scene-fail”> <arguments> <argument id="arg1" name="player” value=”w4, w5”/> <argument id="arg2" name="location” value=”25-meter”/> <argument id="arg3" name="time” value=”07:00”/> </arguments> </event></events>
AnnotationsMumis: Semantic
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Conclusions MuchMore and MUMIS
Work in Progress Development of Compatibility between the Formats Full Integration of the Formats
Possible Future Work Integration of the Formats on a more Abstract Level, i.e. by Use of Data Categories as Specified by ISO/TC37/SC4
Separating Text Data from Annotation. Multiple pointing to Annotations.
Extension to Multimedia
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Esperonto: OverviewApplications
Router
Agent
XML DAML OIL RDF(S)
Certificate
Workbench
Maintenance
Multilinguality
Reengineering
Mapping
OntologyRepository
Service
Tagger/Wrapper
Web ServerProvider
DynamicInformation Provider
StaticInformation Provider
Multimedia DataProvider
Multilingual NL
Understanding
World Wide Web
Semantic Web
VisualizationServiceProvider
SemASP
MultilingualNL
Generation
PortalAgent
Tagger/Wrapper
Tagger/Wrapper
Tagger/Wrapper
Router
Router
Router
Router
Semantic indices, Concept instances
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Ontologies (Classification)
Lassila and McGuinness [Lassila and McGuinness, 2001] categorization
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Ontologies(classification)
Van Heijst and colleagues [Van Heijst et al., 1997] categorization
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Knowledge Architecture
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Esperonto Knowledge Architecture
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Abstracting over Linguistic Information in Esperonto
Ontology_1: NPHead:NMod: {Adj*,PP?}Spec: {Det? PossPron}Type: {RefNP, ProNP, DateNP,etc.}
Ontology_2: PP Head: PrepType: {LocPP,DatePP, etc.}
Comp: NP
Ontology_3:Grammatical FunctionsSubject, Object, Ind. ObjectNP Adjunct, PP Adjunct, etc..
Ontology_4: Dependencies Head Comp Mod Spec
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
From WordNet to EuroWordNet
voorwerp{object}
lepel{spoon}
werktuig{tool}
tas{bag}
bak{box}
blok{block}
lichaam{body}
Wordnet1.5 Dutch Wordnet
bagspoonbox
object
natural object (an object occurring naturally)
artifact, artefact (a man-made object)
instrumentality block body
container
deviceimplement
tool instrument
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Relations of EWN to Top-Level Ontologies
ReferenceOntologyClasses: BOXContainerProduct;SolidTangibleThing
Language-Neutral Ontology
object
box
container
box
container
WordNet1.5
Language-Specific Wordnets
doos
voorwerp
Dutch Wordnet
EuroWordNet Top-Ontology:Form: CubicFunction: ContainOrigin: ArtifactComposition: Whole
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Framenet: Events in Syntactic Context
eventsartifacts, built objectsnatural kinds, parts and aggregatesinstitutions, belief systems, practicesspace, time, location, motionetc.
Let us take a commercial transaction as an example of an event. The following (partial) wordlist is showing lexical realization of the event: Verbs: pay, spend, cost, buy, sell, charge Nouns: cost, price, payment Adjectives: expensive, cheap
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Semantic and Domain Specific Information in the Simple/Parole Framework
semantic frame
semantic relations
ontology
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Combining Ontological and “Linguistic Ontology” (EWN, Parole/Simple)
<lex-element id="ID" concept="Shot-on-goal"> <...lang = "DE" type = "main„ ewn=”[digit+]]pos = „N“ mod = {„von concept = „Player“ | concept = „player“ ewn=”[digit+] gender = „gen“ | pos = „posspron“ } >Torschuss</term> <...lang="DE" type="synonym„ ewn=[[digit+] pos = „V“ comp = {„SUBJ“ concept = „Player“} >abzieh</term> <definition>URL: DFB home page/glossary</definition></lex-element>
LREC 2004, ISO Working Group on the Representation of Multimodal Semantic Information
Linguistic and Semantic Information for the Semantic Web
Actual Work
• Including FrameNet for 3 Languages.• Including new semantic classes for Adj., Adverbs,
Polarity etc.• New improved annotation schema for
syntactic/Semantic annotation• A declarative set of mapping rule Linguistic
Ontology (domain ontologies). The Onto-LT frameowrk (see paper by P. Buitelaar & al at LREC).
Recommended