Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Language Technology I2005/06
Paul BuitelaarGerman Research Center for Artificial Intelligence (DFKI)
Knowledge Extraction/Semantic Web
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Overview
Semantic Web Introduction Semantic Web Representation and Query Languages Semantic Web Tools
Ontologies and Knowledge Markup Ontologies and other Knowledge Organization Systems Knowledge Markup for Ontology Population Ontology Life-Cycle
Knowledge Extraction Ontology Population Ontology Learning
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
WebDocs, Data
Web
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
WebDocs, Data
KnowledgeMarkup
Web > Semantic Web
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
WebDocs, Data
KnowledgeMarkup Ontologies
Web > Semantic Web
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
KnowledgeMarkup Ontologies
Web > Semantic Web
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
KnowledgeMarkup Ontologies
Semantic Web Services
Accessing the Semantic Web - Machines
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Intelligent Man-Machine Interface
KnowledgeMarkup Ontologies
Semantic Web Services
Accessing the Semantic Web - Humans
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Semantic Web Layer cake
• Introduced by Tim Berners-Lee in 2001• Built upon existing WWW standards
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Resource Description Framework (RDF)
• RDF is an extensible language for expressing graph-structures• Serializes to XML
node1
DFKI GmbH
Kaiserslautern
<?xml version=‘1.0’ ?><rdf:RDF
xmlns:rdf=“… rdf-syntax-ns#”xmlns:rdfs=“… rdf-schema#”xmlns=“http://example.org”>
<rdf:Description rdf:nodeID=“node1”><name>DFKI GmbH</name><location>Kaiserslautern</location><www rdf:resource=“http://www.dfki.de” />
</rdf:Description></rdf:RDF>
name
location
www http://www.dfki.de
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
RDF Schema (RDFS)
• Adds a vocabulary for representing classes and properties to RDF
Person Teacher
Student
rdf:Literal
name
Course
teaches
enrolledInis-
a
is-a
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Web Ontology Language (OWL)
• OWL - Based on Description Logics • Adds further modelling vocabulary on top of RDFS
XML Schema Namespaces Interpretation
Context
RDF Schema
OWL
Formalization:
Classes (Inheritance),
Properties
Formalization:
Classes, Class Definitions,
Properties, Property Types
(e.g. Transitivity)
Data Types
XML
RDF
Syntax Semantics
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Semantic Web Query Languages - SPARQL
• SPARQL - query language developed by W3C• Syntactically based on SQL:
• Results available as XML Documents
PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?foafName WHERE {
?x foaf:name ?foafName .OPTIONAL { ?x foaf:mbox ?mbox } .
}
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Semantic Web Tools
Programming APIs Jena - Java Redland – Python, … RAP - PhP
Editors Protégé OntoStudio Triple20 - Prolog
Storage Sesame OntoBroker
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontologies and Knowledge Markup
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontologies in Philosophy
• Ontology is a branch of philosophy that deals with the nature and the organization of reality
• Science of Being (Aristotle, Metaphysics) What characterizes being? Eventually, what is being?
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontologies in Computer Science
Ontology refers to an engineering artifact a specific vocabulary used to describe a certain reality a set of explicit assumptions regarding the intended meaning of the
vocabulary
An Ontology is an explicit specification of a conceptualization [Gruber 93] a shared understanding of a domain of interest [Uschold/Gruninger
96]
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Why Develop an Ontology?
• Make domain assumptions explicit Easier to change domain assumptions Easier to understand and update legacy data
• Separate domain knowledge from operational knowledge Re-use domain and operational knowledge separately
• A community reference for applications
• Shared understanding of what information means
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Types of Ontologies
[Guarino, 98]
Describe very general concepts like space, time, event, which are independent of a particular problem or domain. It seems
reasonable to have unified top-level ontologies for large communities of users.
Describe the vocabulary related
to a generic domain by
specializing the concepts introduced
in the top-level ontology.
Describe the vocabulary related to a
generic task or activity by
specializing the top-level
ontologies.
These are the most specific ontologies. Concepts in application ontologies often correspond to roles played by domain entities
while performing a certain activity.
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontologies and Their Relatives
Catalog / ID
Terms/Glossary
Thesauri
InformalIs-a
FormalIs-a
FormalInstance
Frames
ValueRestric-tions
Generallogical
constraints
AxiomsDisjointInverse Relations,...
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Knowledge Organization Systems
• Semantic Lexicons – e.g. WordNet … group together words according to lexical semantic
relations like synonymy, hyponymy, meronymy, antonymy, etc.
• Thesauri …group together domain terms according to a set of
taxonomic relations, including broader term, narrower term, sibling, etc.
• Semantic Networks and Ontologies … group together classes of objects according to a set of
relations that originate in the nature of the domain of application.
Ontologies are defined by a formal semantics, but semantic networks may be informally defined. Therefore all ontologies are semantic networks, but not all semantic networks are ontologies.
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Thesauri - Examples
MeSH Heading Databases, GeneticEntry Term Genetic DatabasesEntry Term Genetic Sequence DatabasesEntry Term OMIMEntry Term Online Mendelian Inheritance in ManEntry Term Genetic Data BanksEntry Term Genetic Data BasesEntry Term Genetic DatabanksEntry Term Genetic Information DatabasesSee Also Genetic Screening
MT 3606 natural and applied sciencesUF gene pool
genetic resourcegenetic stockgenotypeheredity
BT1 biologyBT2 life sciencesNT1 DNANT1 eugenicsRT genetic engineering (6411)
EuroVoc covers terminology in all of the official EU languages for all fields that concern the EU institutions, e.g., politics, trade, law, science, energy, agriculture, 27 such fields in total.
MeSH (Medical Subject Headings) is organized by terms (currently over 250,000) that correspond to a specific medical subject. For each such term a list of syntactic, morphological or semantic variants is given.
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Semantic Networks - Examples
Pharmacologic Substance affects Pathologic FunctionPharmacologic Substance causes Pathologic FunctionPharmacologic Substance complicates Pathologic FunctionPharmacologic Substance diagnoses Pathologic FunctionPharmacologic Substance prevents Pathologic FunctionPharmacologic Substance treats Pathologic Function
Accession: GO:0009292Ontology: biological processSynonyms: broad: genetic exchangeDefinition: In the absence of a sexual life cycle, the processes involved in the
introduction of genetic information to create a genetically different individual.Term Lineage all : all (164142)
GO:0008150 : biological process (115947)GO:0007275 : development (11892)
GO:0009292 : genetic transfer (69)
GO (Gene Ontology) allows for “consistent descriptions of gene products in different databases, including several of the world’s major repositories for plant, animal and microbial genomes…“ Organizing principles are molecular function, biological process and cellular component.
UMLS (Unified Medical Language System) integrates linguistic, terminological and semantic information. The Semantic Network consists of 134 semantic types and 54 relations between types.
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Example Ontology
Consider an Example Ontology for the Newspaper Domain
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
• Ontologies are used to semantically organize and retrieve data (structured, textual, multimedia) through knowledge markup
Consider the following example:
• Knowledge Markup from Text is based on Named-Entity Recognition, Semantic Tagging (Term to Class Mapping) and Relation Extraction
Knowledge Markup
<news:story xmnls:jobs=“http://www.jobs.org/owl-jobs#” xmlns:com=“http://www.companies.org/owl-companies#” xmlns:it=“http://www.it.net/owl-it#”>
“We were surprised by several of the results, particularly the order of finish,” said <jobs:SystemsAnalyst>Dan Olds</jobs:SystemsAnalyst>. <com:Company>IBM</com:Company> finished first with very strong results, and <com:Company>HP</com:Company> scored a solid number two; we expected to see <com:Company>Sun Microsystems</com:Company> challenging for first place or at least a strong second place. As the largest <it:operatingsystem>UNIX</it:operatingsystem> vendor in terms of number of installed systems, a third place finish should put their management on notice that their installed base may be vulnerable.
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Knowledge Markup - Images
Semantic Annotation of Medical Images
(miAKT Project - UK)
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Knowledge Markup - Images
Semantic Annotation of Video
(SmartMedia – DFKI KM)
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontology Life-Cycle
Create/SelectDevelopment and/or Selection
PopulateKnowledge Base Generation
ValidateConsistency Checks
EvolveExtension, Modification
MaintainUsability Tests
DeployKnowledge Retrieval
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Knowledge Extraction
Ontology Population & Ontology Learning
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontology Life-Cycle – Ontology Population
Create/SelectDevelopment and/or Selection
PopulateKnowledge Base Generation
ValidateConsistency Checks
EvolveExtension, Modification
MaintainUsability Tests
DeployKnowledge Retrieval
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontology Population with SOBA
SOBA: SmartWeb Ontology-based Annotation
Application Context SmartWeb (http://www.smartweb-projekt.de/) – German Project around World-
Cup 2006 Integrates
Multimodal Dialog Processing IR-based Question Answering Ontology-Based Information Extraction Semantic Web Services
Ontology-Based Information Extraction … Combines:
Semantic Wrapping of Semi-Structured Data Semantic and Linguistic Annotation of Free Text Inference Rules for Instantiation and Integration of Annotated Entities and
Events
… and Display Ontology-driven Hyperlink Generation for Display of Extracted Information
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Linguistic AnnotationLinguistic Annotation
Named Entity Recognition&
Semantic Tagging
Named Entity Recognition&
Semantic Tagging
Image ExtractionImage Extraction
PDF Analysis PDF Analysis
Inference Rules forInstantiation &
Integration
Inference Rules forInstantiation &
Integration
KnowledgeBase
DocumentsOntologies
Wrapping of SemiStructured Data
Wrapping of SemiStructured Data
SOBA – Processing and Data Flow
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
SWIntO: SmartWeb Integrated Ontology
SmartDOLCE:Entity
SmartSUMO:Attribute
SmartSUMO:SocialRole
SmartSUMO:Proposition
SportEvent:FootballPlayer
SportEvent:Goalkeeper
SportEvent:FootballOrganizationPerson
SportEvent:FootballClubPresident
…
…
…
… …
…
…
…
SWIntO (by AIFB, DFKI KM/IUI, EML) covers Foundational (DOLCE) and General (SUMO)
Knowledge Domain- and Task-Specific Knowledge
Football / Sport Events Navigation, Discourse, Multimedia other
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
SMartWeb Integrated Ontology (by AIFB, DFKI KM/IUI, EML)
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
SmartWeb Corpus
(Growing) Web Corpus through Monitor on http://fifaworldcup.yahoo.com/ http://www.uefa.com/competitions/worldcup
Semi-Structured Data Tabular: Match Reports, Teams, etc.
Free Text Match Reports Image Captions
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Semi-Structured Data - HTML
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Semi-Structured Data - XML
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Semi-Structured Data – F-Logic
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
MatchEvent [Score, Team1, Team2]
FootballPlayer
Information Extraction from Free Text
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
FoulEvent [FootballPlayer]
FootballPlayer
Information Extraction from Image Captions
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Linguistic and Semantic Annotation
Mark Crossley saved twice with his legs from Huckerby.
Named Entity Recognition & Semantic Tagging
[Mark Crossley GOALKEEPER] [saved GOALKEEPER_ACTION] twice with his legs from [Huckerby PLAYER].
Linguistic Annotation
[Mark Crossley GOALKEEPER : SUBJ] [saved PRED : GOALKEEPER_ACTION] twice [with his legs PP_OBJ] [from [Huckerby PLAYER] PP_ADJUNCT].
[ GOALKEEPER_ACTION = 'save‘, GOALKEEPER = 'Mark Crossley‘, PLAYER = 'Huckerby‘, MANNER = ‘legs']
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Annotation/Extraction Example
Example Sentence from Match Report
Allerdings ist Petrow fuer die Partie gegen Schweden gesperrt und kann erst gegen Ungarn eingesetzt werden.
“However Petrow has been banned for the match against Sweden and can again be deployed against Hungary.”
Annotated/Extracted Information (with SProUT IE Tool - DFKI-LT )
player_action & [GAME_EVENT "Ban", AGENT player & [SURNAME "PETROW"], IN_MATCH game & [TEAM2 "SWE",
TOURNAMENT "Match"]] team & [NAME "HUN"]
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Knowledge Base Generation
<type orig="player" target="dolce#natual-person-denomination> <link type="dolce#natural-person" method="dolce#HAS-DENOMINATION"
id=""/> <map> <simple-mapping> <input>
<arg orig="GIVEN_NAME" target="VAR1"/> </input> <output method="dolce#FIRSTNAME" value="VAR1"/> </simple-mapping> <simple-mapping> <input> <arg orig="SURNAME" target="VAR1"/> </input> <output method="dolce#LASTNAME" value="VAR1"/> </simple-mapping> </map></type>
Transformation of SProUt Output to F-Logic via Declarative Mappings, e.g.:
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
SProUt to F-Logic
FS type="player_action">
[N [N <F name="GAME_EVENT">
<FS type="world champion"/>
<F name="ACTION_TIME">
<FS type="1990"/>
<F name="ACTION_LOCATION">
<FS type="Italy"/>
<F name="AGENT">
<FS type="player">
<F name="SURNAME">
<FS type="Buchwald"/>
<F name="GIVEN_NAME">
<FS type="Guido"/>
soba#player124:sportevent#FootballPlayer
[sportevent#impersonatedBy -> soba#Guido_BUCHWALD].
soba#Guido_BUCHWALD:dolce#"natural-person"
[dolce#"HAS-DENOMINATION" -> soba#Guido_BUCHWALD_Denomination].
soba#Guido_BUCHWALD_Denomination":dolce#"natural-person-denomination"
[dolce#LASTNAME -> "Buchwald"; dolce#FIRSTNAME -> "Guido"].
SProUt F-Logic
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
A Complex Example
semistruct#"Bolivien_vs_Brasilien_09_Oct_05_16_00_Luis_CRISTALDO":sportevent#FieldMatchFootballPlayer [ externalRepresentation@(de) ->> "Luis CRISTALDO (7)"; sportevent#number -> 7; sportevent#impersonatedBy -> semistruct#"Luis_CRISTALDO"
].
semistruct#"Bolivien_vs_Brasilien_09_OCt_05_16_00" [ sportevent#matchEvents -> soba#ID25 ].
soba#ID25:sportevent#Foul [ sportevent#commitedBy -> semistruct#"Bolivien_vs_Brasilien_09_Oct_05_Luis_CRISTALDO ].
mediainst#ID67:media#Picture [ media#URL -> "http://fifaworldcup.yahoo.com/06/de/photos/index.html?aid=124155&d=1"; media#shows -> ID25 ].
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Display of Extracted Information
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Ontology Life-Cycle – Ontology Learning
Create/SelectDevelopment and/or Selection
PopulateKnowledge Base Generation
ValidateConsistency Checks
EvolveExtension, Modification
MaintainUsability Tests
DeployKnowledge Retrieval
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, doctor, hospital
{disease, illness, Krankheit}
DISEASE:=<Int, Ext, Lex>
is_a(DOCTOR, PERSON)
cure(dom:DOCTOR, range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx
Introduced in: Philipp Cimiano, PhD Thesis University of Karlsruhe, forthcoming
Ontology Learning Layer Cake
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Some Current Work on Ontology Learning from Text
Term Extraction Statistical Analysis Patterns (Shallow) Linguistic Parsing Term Disambiguation & Compositional Interpretation Combinations
Taxonomy Extraction Statistical Analysis & Clustering (e.g. FCA) Patterns (Shallow) Linguistic Parsing WordNet Combinations
Relation Extraction Anonymous Relations (e.g. with Association Rules) Named Relations (Linguistic Parsing) (Linguistic) Compound Analysis Web Mining, Social Network Analysis Combinations
Definition Extraction (Linguistic) Compound Analysis (incl. WordNet)
Overview of Current Work: Paul Buitelaar, Philipp Cimiano, Bernardo Magnini Ontology Learning from Text: Methods, Evaluation and Applications Frontiers in Artificial Intelligence and Applications Series, Vol. 123, IOS Press, July 2005.
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, doctor, hospital
{disease, illness, Krankheit}
DISEASE:=<Int, Ext, Lex>
is_a(DOCTOR, PERSON)
cure(dom:DOCTOR, range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx
RelExt - Relation Extraction for Ontology Learning
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
RelExt - Motivation
Extend Ontology with Relations Currently ~ 60 Relations in the Sport Events Ontology
– Mostly Properties, e.g. hasName, atMinute, … Representation of (Verbal) Relations Enables Better Modeling
of Events for Information Extraction Purposes
Example
“Ballack shoots the ball in the net.”
Relation:Shoot (Domain:FootballPlayer Range:BallObject)
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
RelExt – System Architecture
Named-Entity Rec. & Semantic Tagging
Shallow Parsing
Corpus
AnnotatedCorpus
Relevance Measure
FrequenciesIn BNC, NZZ
Relevance ScoresHeads, Preds
Co-occurrence Measure
Co-occurrenceScores
Heads <> Preds
Linguistic Annotation Statistical Processing
TripleGenerationTriples
Head : Pred : HeadEvaluation
Relation Extraction and Evaluation
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Linguistic Annotation
Named-Entity Recognition“Michael Ballack” : FootballPlayer
Semantic Tagging“Ball” (ball), “Leder” (leather) : BallObject
Shallow Parsing Part-of-Speech Tagging
Fussballspieler (soccer player): Noun
Morphological AnalysisFussballspieler: Fussball – Spieler
Dependency Structure Analysis“The team won the second match.” SUBJECT PREDICATE DIRECT_OBJECT
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Relevance Ranking
Top-10 Head-Nouns before and after mapping to Ontology Classes
Rank Headnoun Frequency1 125245.24 68492 121888.52 7767
3 95003.21 59674 64157.18 3575
5 57185.76 31326 45474.96 2298
7 34668.11 17528 30017.75 1561
9 27989.09 147910 27414.66 1457
2
Ball (ball)Tor (goal)
Meter (meters)Schuss (shot / drive)
Ecke (corner)Strafraum (penalty area)
Freistoss (freekick)Leder (leather / ball)Flanke (cross)
Pfosten (post)
Rank Concept Label Frequency1 565510.99 FOOTBALLPLAYER 284942 162137.82 GOALOBJECT 81883 143528.88 BALLOBJECT 72494 138535.44 GOALKEEPER 68875 70814.86 SHOT 35786 49018.16 TEAM 24777 45474.96 PENALTYAREA 22988 34668.11 FREEKICK 17529 29324.54 WING 1482
10 28829.78 POST 1457
2
Rank Predicate Frequency1 27167.41 13732 22045.39 1435
3 21908.37 15034 20439.09 1033
5 16342.99 8266 9563.41 1548
7 9468.57 8148 7752.84 1559
9 7653.68 53710 7637.45 405
2
flanken (to cross)klaeren (to clear)
schiessen (to shot)koepfen (to head)
lassen (to let / to leave)ziehen (to pull / to drag)
passen (to pass / to play)spielen (to play / to pass)lenken (to divert)
parieren (to parry / to save)
Top-10 Predicates
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Co-Occurrence Analysis
Rank Concept Label Frequency1 565510.99 FOOTBALLPLAYER 284942 162137.82 GOALOBJECT 81883 143528.88 BALLOBJECT 72494 138535.44 GOALKEEPER 6887
2
Rank Predicate Frequency1 27167.41 1373
2 22045.39 14353 21908.37 15034 20439.09 1033
2
flanken (to cross)klaeren (to clear)schiessen (to shot)koepfen (to head)
.
.
.
.
.
.
flanken SUBJ:FOOTBALLPLAYER “Klasnic”
flanken DOBJ:FOOTBALLPLAYER “Klose”
flanken_in PP_ADJ “Zuschauer” (audience)
.
.
.
beschimpfen (to insult) SUBJ:FOOTBALLPLAYER “Klasnic”
.
.
.
.
.
.
.
.
.
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Integration into Ontology Development
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Terms
Concepts
Taxonomy
Relations
Rules & Axioms
disease, doctor, hospital
{disease, illness, Krankheit}
DISEASE:=<Int, Ext, Lex>
is_a(DOCTOR, PERSON)
cure(dom:DOCTOR, range:DISEASE)
(Multilingual) Synonyms
))(),((, xillyxsufferFromyx
OntoLT – Protégé Plug-In for Ontology Extraction from Text
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
OntoLT – Basic Idea
Middleware Solution in Ontology Development Supports the Ontology Engineer through Semi-Automatic
Extraction of Ontology Fragments from Domain-Relevant Document Collections
Download http://olp.dfki.de/OntoLT/OntoLT.htm
Based on Automatic Linguistic Annotation Manual Definition of Mapping Rules Statistical Preprocessing (Option) Interactive Validation of Candidates Generation in Protégé of Ontology Fragments
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
OntoLT – System Architecture
AnnotatedCorpus(XML)
Mappings
XML (Linguistic Structure) <=>
Protégé (Classes, Slots)
Extraction
Protégé
Edit Extracted Ontology
Corpus
Definitionof Mappings
LinguisticAnnotation
ExtractedOntology
OntoLT
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Corpus Example – KMI News
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Statistical Relevance
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Extract Candidates
Language Technology ILanguage Technology I© 2006 © 2006 Paul BuitelaarPaul Buitelaar
Generate Ontology Fragments