Upload
anis-ford
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Ontologies Contributions from Language
Technology
Paul Buitelaar DFKI GmbH
Language Techology LabDFKI Competence Center Semantic Web
Saarbrücken, Germany
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
OverviewOntologies and the Semantic Web Semantic Web Intro Ontologies and Knowledge Markup Ontology Development Ontology Lifecycle & Language Technology
Language Technology Levels of Automatic Linguistic Analysis
Ontologies in Multilingual Information Access A Medical Example: MuchMore Project Semantic Resources in the Medical Domain Demo MuchMore System Language Technology in Annotation and Indexing
Conclusions MuchMore for the Legal Domain…
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Semantic Web
Semantic Web
Intelligent Man-Machine Interface
KnowledgeMarkup Ontologies
Semantic Web Services
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Ontology-based Knowledge MarkupSemantic Metadata
Metadata, e.g. Dublin Core -- Title, Author, etc. Semantic:Formal Properties of Objects of Class Author
<xmnls jobs="http://www.jobs.org/daml+oil-jobs-ontology#">
<jobs:systems-analyst>John Smith</jobs:systems-analyst>
Knowledge Markup
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Semantic Web Architecture
Layered Architecture (Tim Berners-Lee)
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Knowledge Markup Languages
XML Schema Namespaces Interpretation Context
RDF Schema
OWL
(DAML+OIL)
Formalization:
Classes (Inheritance),
Properties
Formalization:
Classes, Class Definitions,
Properties, Property Types
(e.g. Transitivity)
Data Types
XML
RDF
Syntax Semantics
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Ontologies: Basic Idea
Definition “… Explicit, Formal Specification of a Shared Conceptualization of a Domain of Interest”T. Gruber Towards principles for the design of ontologies used for knowledge sharing. Int. J. of Human and Computer Studies, 1994
Purpose Knowledge Sharing (e.g. between Agents) Inference (over Sets of Instances)
Related Areas, e.g. Terminologies, Controlled Vocabulary, Thesauri, Taxonomies, Semantic Lexicons, Wordnets, etc. Conceptual Models, Schemas, etc.
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Ontologies: Applications, e.g.
Semantic Web Services Interoperability for (Semantic) Web Services
Intelligent Agents Domain Models for Intelligent Agents
Text Interpretation Ontology-aware Information Extraction
Multimedia Integration Ontology-based Alignment of Extracted Objects in Text, Audio, Video
Intelligent Search/Navigation Ontology-based Indexing in Web-Retrieval
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Ontologies: Development
Ontology Editor / KB Management Most Widely Used: Protégé (Stanford University, Medical Informatics, USA) Originally for Development and Maintenance of Medical Expert Systems
Other, e.g.
KAON: University of Karlsruhe - AIFB, Germany WebOde: UPM – Ontology Group, Madrid, Spain WebOnto: Open University - KMI, UK
Overview at XML.com by Michael Denny: Ontology Building: A Survey of Editing Tools
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Class Hierarchy
Slot Descriptions
http://dmag.upf.es/ontologies/2003/12/ipronto.owl
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Ontology Lifecycle
Creating
Populating
Validating
Evolving
Maintaining
Deploying
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
LT in the Ontology Lifecycle
Ontology(Knowledge)
Creating & EvolvingLinguistic Analysis to Extract
Classes / Relations
Populating
(Knowledge Base Generation)Linguistic Analysis to Extract
Instances
Instances
Documents(Text)
Language Technology (LT) for Ontology:
Language Technology = Automated Linguistic Analysis
Classes,Relations/Properties
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Linguistic Analysis: Example
The Dell computer with a flat screen had to be rejected because of a failure in the motherboard.
Dell computerflat screen
motherboard
has-a
has-a
reject
failurelocation-of
animate-entity
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Part-of-Speech, Morphology
Part-of-Speech e.g.: noun, verb, adjective, preposition, … PoS tag sets may have between 10 and 50 (or more) tags
Morphology Most languages have inflection and declination, e.g.:
Singular/Plural computer, computers Present/Past reject, rejected
Many languages have also complex (de)composition, e.g.:
Flachbildschirm (flat screen) > flach + Bildschirm> flach + Bild + Schirm
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Phrases, Terms, Named Entities
Semantic Units Phrases (e.g. nominal - NP, prepositional - PP)
NP a flat screenPP with a flat screenNP (recursive) the Dell computer with a flat
screen a failure in the motherboard
Terms (domain-specific phrases)Dell computer
Dell computer with a flat screen
Named Entities (phrases corresponding to dates, names, …)
COMPANY Dell COMPANY Dell Computer Corporation PERSON Michael Dell
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Dependency Structure
Semantic StructureDependencies between Predicates and Arguments
the Dell computer with a flat screen had to be rejected
PRED: rejectARG1: ENTITYARG2: ‘the Dell computer with a flat screen’
‘Logical Form’ : reject(x,y) & animate-entity(x) & computer(y) & …
The Dell computer that has been rejected was claimed to have suffered from handling.
reject(e1,x1,y1) & animate-entity(x1) & Dell_computer(y1) & claim(e2,x2,e3) & animate-entity(x2) & suffer_from(e3,y1,y2) & handling (y2)
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
MuchMore Project
Demonstration Prototype Real-Life Medical Scenario for Cross-Lingual Information Retrieval
Research & Development Combined Data- and Knowledge-Driven
Performance Evaluation Performance Comparison of Existing and Novel Methods
http://muchmore.dfki.de
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
GeneralWordNet (EN), GermaNet (DE), EuroWordNet (“linked”)
Medical DomainUMLS: Unified Medical Language System
Medical MetaThesaurus (only MeSH2001 is used)
English, German, Spanish, …730.000 Concepts9 Relations (Broader, Narrower,…)
Semantic Network
134 Semantic Types54 Semantic Relations
Semantic Resources
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
C0019682|ENG|P|L0019682|PF|S0048631|HIV|0|
C0019682|ENG|S|L0020103|PF|S0049688|HTLV-III|0|
C0019682|ENG|S|L0020128|VS|S0049756|Human Immunodeficiency Virus|0|
C0019682|ENG|S|L0020128|VWS|S0098727|Virus, Human Immunodeficiency|0|
C0019682|FRE|P|L0168651|PF|S0233132|HIV|3|
C0019682|FRE|S|L0206547|PF|S0277133|VIRUS IMMUNODEFICIENCE HUMAINE|3|
C0019682|GER|P|L0413854|PF|S0538136|HIV|3|
C0019682|GER|S|L1261793|PF|S1503739|Humanes T-Zell-lymphotropes Virus Typ III|3|
other languagesGERMAN 66,381ENGLISH 1.462,202
Concept Names: 1.734,706
Each CUI (Concept Unique Identifier) is mapped to one out of 134 Semantic Types or TUI (Type Unique Identifier)
Clozapine: C0009079 Pharmacologic Substance: T121
MetaThesaurus, SemNet
Semantic Types are organized in a Network through 54 Relations
T121|T154|T047
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Token (with Part-of-Speech)German: Kreuzbandes English: ligaments
Lemma (or Sequence of Lemmas - Decomposition)German: Faserknorpel Faser + KnorpelEnglish: ligament
UMLS Concept Code and Semantic Typeligament : C0022745_T030
MeSH CodeA2.513
Semantic Relation (over a Pair of UMLS Concepts)C0022745_T030 interconnects C0047693_T065
Annotation & Indexing
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
UMLS Semantic Network specifies 54 types of relations between 134 semantic types
Pharmacologic Substance affects Cell Function
Relations are generic and potentially falseTherapeutic Procedure method_of Occupation,Discipline
*discectomy method_of history
Relations are ambiguousTherapeutic Procedure prevents Neoplastic ProcessTherapeutic Procedure complicates Neoplastic ProcessTherapeutic Procedure affects Neoplastic ProcessTherapeutic Procedure treats Neoplastic Process
Relations
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Discontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.
Example
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Terms: C0019134 Heparin
C0005790 Blood coagulation tests
C0013227 Pharmaceutical preparations
Example: Terms/ConceptsDiscontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Relations: C0019134 interacts_with C0013227C0005790 analyses C0019134 C0005790 analyses C0013227
Example: Relations
Terms: C0019134 Heparin
C0005790 Blood coagulation tests
C0013227 Pharmaceutical preparations
Discontinuation of heparin is a simple and essential maneuvre, and anticoagulation has to be continued by alternative drugs.
© Paul Buitelaar: eJustice Presentation, July 15th, 2004
Conclusions
MuchMore for the Legal Domain…
ResourcesLegal Domain Ontology with…
…Large-scale Terminology for Multiple Languages, or if not available…
…Large Legal Domain Corpora in Multiple Languages for Term Extraction…
…and for Relation Extraction if Ontology Needs to be Constructed/Adapted
ToolsLinguistic Analysis (PoS, Morphology, Term Grammars, etc.)…
…for Multiple Languages…
…Tuned to the Legal Domain…
Information Retrieval Infrastructure, Interface Design, etc.