33
ENABLER/ELSNET Workshop, 28-29 August 2003 An Ontology-Based Knowledge Portal for Language Technology Hans Uszkoreit, Brigitte Jörg, Gregor Erbach

An Ontology-Based Knowledge Portal for Language Technology

  • Upload
    fiona

  • View
    27

  • Download
    1

Embed Size (px)

DESCRIPTION

An Ontology-Based Knowledge Portal for Language Technology. Hans Uszkoreit, Brigitte J örg, Gregor Erbach. Project COLLATE. - PowerPoint PPT Presentation

Citation preview

Page 1: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

An Ontology-Based

Knowledge Portal

for Language Technology

Hans Uszkoreit, Brigitte Jörg, Gregor Erbach

Page 2: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Project COLLATE

Theme: Computational Linguistics and Language Technology for

Real World Applications

Partners: DFKI Saarbrücken, Saarland University

Support: A Grant by the German Federal Ministry for Education

and Research for RTD strengthening the position of

Saarbrücken as a Competence Center for

Language Technology

PIs: Hans Uszkoreit, Manfred Pinkal and Wolfgang Wahlster

Duration: Spring 2001 - end of 2003

Page 3: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Information Service about Language Technology

www.lt-world.org

Ontology-based

XML Import and Export Formats

Visual and Structural Design

Information Center: LT World

Page 4: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Objectives

distributed information service combines and offers for each aspect of LT the best contents

available exploits hypermedia technology for including useful contents is flexible and scalable enough to support the evolution of the

discipline exhibits a structure that is transparent for both experts and

visitors from outside the field increasingly utilizes language and knowledge technologies for

improved management and presentation of the information. is open for exchange of data with other information services potential for interoperability with future knowledge services is suited for the sophisticated metadata schemes of the envisaged

semantic web

Page 5: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

LT World - Levels and Tasks

underlying logicalstructure

data maintenancestructure

presentationalstructure

ontology specifications

concrete architecture

XML specificationsDBs, XML pages,

HTML pages

generic designCI

actual designof pages

selection of sourcesorganization of

collection/production

content in DBs,documents, links

presented contents

Conceptual Level Specification LevelTechnical Realization

LevelContent Level

Page 6: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

User View: Four Top Level Areas

Information and Knowledge

Players and Teams

Resources and Results

Communication/Interaction

Page 7: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Information and Knowledge

Basic knowledge about all areas of LT

source: Survey of the State of the Art in Human Language Technology (1997, new edition in preparation)

Pointers to specialized knowledge (links to literature, projects, systems, products, people, resources, standards...) source: link collection by DFKI

Glossary of the fieldsource: DFKI with input from HLT Survey

Page 8: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Players and Teams

DB with all researchers in LTnames, affiliations, links to homepagesnumber of entries: 2235

DB of projectsnumber of entries: 659

DB of research organisations, companies, funding agenciesnumber of entries: 1561

Page 9: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Resources and Results

DB of prototypes, research systems and productssource: ACL Software Registry (operated by DFKI)

Links to resource initiatives: ELRA, LDC,

For resources link to search service of OLAC

Page 10: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Communication/Interaction

News about technologies, people, products, centers, etc.source: collection by DFKI and contributions by usersnumber of entries: 370

List of Events: Conferences, Workshops, Summer Schools,etc.source: collection by DFKI and contributions by usersnumber of entries: 251

Links Topic-Centered Mailing Listssource: collection of existing lists

Page 11: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Usage of LT World

MonthUnique visitors

Number of visits

Pages Hits Bandwidth

Jan 795 1502 15185 33635 171.58 MB

Feb 808 1443 12127 28622 140.89 MB

Mar 1036 1780 15751 41622 199.38 MB

Apr 989 1778 17994 47452 231.71 MB

May 1006 1922 16143 44624 180.93 MB

Jun 944 1963 18458 42912 237.69 MB

Jul 912 2103 16066 41712 208.18 MB

… -- -- -- -- --

Total 6496 12499 111745 280600 1.34 GB

Page 12: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Systematics of the Discipline

Mature scientific or engineering disciplines have developed a systematics of the subject

Younger disciplines have outgrown their first systematics

LT or CL does not yet have a systematics or a classification scheme

Page 13: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Logical Structuring: Two Options

Tree-Structured Classification

Libraries

Encyclopedias and Handbooks

Multidimensional Structuring

Multiple-Inheritance Hierarchies

And-Or Hierarchies

Page 14: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Means for Ordering

Terminology

Thesaurus

Classification vs. Systematics

Taxonomy = Classification + Nomenclature

Ontology formal ontology relational ontology

Page 15: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Our Setup

Immediately visible structure: easy and transparent

Some multidimensional structuring through chapter structure of the Survey

For internal storage and DB search: complex multidimensional structure

Underlying systematics: multilayered and multidimensional ontology

Page 16: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Ontologies

Theoretical Ontologies Epistemological reasons Phenomenological systematics

Practical Ontologies Support of processes Data Maintenance Information Services

Page 17: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Systematics/Ontologies

Generic Core: Dublin Core

Special Ontologies underlying exchange formats for special information types such as OLAC (for linguistic resources) BibTex (for scientific literature) Languages (for language codes)

Generic ontologies for the scientific discipline and technology sector

General Multidimensional Classification for CL and LT

Page 18: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Science Actor Subject NewKnowledge (Scientific)Means

Research Actors Subject ResearchGoals Means

ResearchProject Actors Subject ResearchGoals Means Duration

Applied Science Actor SubjectNewKnowledgeMeansApplications

Applied ResearchActorsSubjectResearchGoalsMethodsApplications

Applied ResearchProjectActorsSubjectResearchGoalsMethodsDurationApplications

Page 19: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Funded Research Project

Name Acronym Full Name

Actors Organizations PI Other Roles Researchers

Subject Discipline/Area

Objectives Goals Means Program

Duration StartDate EndDate

Funding Agency Program Funding Number

Page 20: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

SearchScience Production

Technology

Education

ExtraScientific PurposeResearchScientific

Education

Applied Research

ExtraScientific Purpose

Technical Product

Page 21: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Multidimensional Classification for CL and LT

Dimensions Generic:

Type of Resource (web page, metaindex, publication, person, product, patent, project, ...)

PeopleGeolocationDate/Comments

Disciplin--Specific (not all may apply for a given resource)

Application (grammar checking, text translation, IR)Linguality (monolingual, bilingual,multilingual, translingual, language-

inde) Languages/Language Pairs (Romanian, Thai, <en-fr>,...)Technologies (HMM, FSA, EBT, linear programming, ...)Linguistic Area (morphology, syntax, pragmatics,...)

Linguistic Approach (Two-Level Morpology, systemic functional g., DRT)

Page 22: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Excerpt from the Ontology

Dublin Core

OLACLanguages

LT World

Language Technology

Technology

BibTex

Information & Knowledge

Teams &Players

Systems &Resources

Communication& Events

Publications

Page 23: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Area Nodes

Example of the shallow hierarchy for technologies

Text Technologies ...

Text Summarization...

Information Extraction• Named Entity Recognition• Terminology Extraction• Relation Extraction• Answer Extraction

...

Text Generation...

Page 24: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Main Info for Each Subject Area

Name Acronyms aka‘s, Term Translations Short Definition Explanation Topic Websites R&D Prototypes/Products Projects People Literature

Page 25: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Ontology Modelling and Interchange Formats

Ontologies maintained with Protégé 2000

Ontology Modelling with Protégé

Export / Interchange Formats

Page 26: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Protégé: Class View

Page 27: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Protégé: Slot ViewProtégé: Slot View

Page 28: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Protégé: Form View (Input-Configuration)Protégé: Form View

Page 29: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Protégé: Instance View (Input-Interface)Protégé: Instance View

Page 30: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

<LT:System rdf:about="&LT;LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="[email protected]" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"><LT:resource.description>Babel is a Prolog System with Web-Interface in

Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description>

<LT:dc.language rdf:resource="&LT;English"/><LT:lt.languages rdf:resource="&LT;German"/><LT:dc.creator rdf:resource="&LT;LT_00399"/><LT:developed-by rdf:resource="&LT;LT_00399"/><LT:dc.rights rdf:resource="&LT;ont_051002_00178"/><LT:developed-by rdf:resource="&LT;ont_051002_00209"/><LT:olac.format.os>Windows 95</LT:olac.format.os><LT:olac.format.os>Windows NT</LT:olac.format.os>

</LT:System>

Protégé: RDF-Export Instance of the Babel systemProtégé: RDF-Export

Re

lati

on

sA

ttri

bu

tes

Page 31: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Protégé: RDF-Export Instance of the Babel system

Re

lati

on

s

Protégé: RDF-Export

<LT:System rdf:about="&LT;LT_00398" LT:applications="Structure Building" LT:dc.coverage="66123 Saarbruecken" LT:dc.identifier="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:lt.linguality="monolingual" LT:lt.linguistic_approach="HPSG" LT:lt.linguistic_area="syntax" LT:olac.type.functionality="Written Language" LT:olac.type.linguistic="HPSG" LT:resource.contact="[email protected]" LT:resource.homepage_url="http://www.dfki.de/~stefan/Babel/e_babel.html" LT:resource.name="Babel" LT:resource.type="system" LT:technological_method="Written Language" LT:type="system" rdfs:label="Babel"><LT:resource.description>Babel is a Prolog System with Web-Interface in

Perl and Java. Its main purpose is the test of an HPSG grammar for German.</LT:resource.description>

<LT:dc.language rdf:resource="&LT;English"/><LT:lt.languages rdf:resource="&LT;German"/><LT:dc.creator rdf:resource="&LT;LT_00399"/><LT:developed-by rdf:resource="&LT;LT_00399"/><LT:dc.rights rdf:resource="&LT;ont_051002_00178"/><LT:developed-by rdf:resource="&LT;ont_051002_00209"/><LT:olac.format.os>Windows 95</LT:olac.format.os><LT:olac.format.os>Windows NT</LT:olac.format.os>

</LT:System>

Page 32: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Organizational Issues

Division of Labour

In the beginning all contents and references were collected and maintained by DFKI

Input of the authors/ area specialists of the Survey for distributed authoring and content maintenance

Input from the LT community via HTML forms and XML import format

News and conferences maintained and updated by DFKI

Page 33: An Ontology-Based  Knowledge Portal for Language Technology

ENABLER/ELSNET Workshop, 28-29 August 2003

Relationships to External Resources

Included but autonomous resources: ACL NL Software Registry, Language Technology Survey

Systematically cross-Linked and Cross-Searchable Resources: all OLAC Resources such as (LDC, SIL, ACL SR, and OLAC Home)

Systematically crosslinked resources: HLT Central, ELSNET, EACL ACL NLP Universe

Linked resources: All other relevant resources relevant for LT