31
1 Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland , November 2006 Nicoletta Calzolari Nicoletta Calzolari Istituto di Linguistica Computazionale - CNR - Pisa [email protected] An Infrastructure of An Infrastructure of Language Resources & Language Resources & Language Technologies: Language Technologies: Why we need it? Why we need it? Priorities & Priorities & Challenges Challenges

1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

Embed Size (px)

Citation preview

Page 1: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Nicoletta Calzolari Nicoletta Calzolari

Istituto di Linguistica Computazionale - CNR - Pisa

[email protected]

An Infrastructure of An Infrastructure of

Language Resources & Language Resources &

Language Technologies:Language Technologies:Why we need it?Why we need it?

Priorities & Priorities & ChallengesChallenges

Page 2: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

2Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

What are we (LT& LR) What are we (LT& LR) assembling, …. since many assembling, …. since many years?years? Lexicons & their OntologiesLexicons & their Ontologies

Written, Spoken, ItalWordNets, PAROLE/SIMPLE, Written, Spoken, ItalWordNets, PAROLE/SIMPLE, ……

Annotated corpora/TreebanksAnnotated corpora/Treebanks Basic ToolsBasic Tools

Integrated Architecture for Integrated Architecture for Annotation at various levels (from morph. to Annotation at various levels (from morph. to

conceptual)conceptual) Acquisition/learningAcquisition/learning Classification Classification Ontology creationOntology creation ……

MethodologiesMethodologies Know how Know how & expertise& expertise Infrastructural bodiesInfrastructural bodies (on which to build)

Standards

… … a very a very large large infrastructinfrastructure of LRs ure of LRs & LT& LT

Page 3: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

3Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

History:History: Some international LRs Some international LRs initiativesinitiatives

ACQUILEX ACQUILEX [[since since ’88’88]]

MULTILEXMULTILEX ET-7ET-7 ET-10ET-10 TEITEI NERCNERC RELATORRELATOR ONOMASTICAONOMASTICA MULTEXTMULTEXT COLSITCOLSIT LSGRAMLSGRAM DELISDELIS EAGLESEAGLES PAROLEPAROLE SIMPLESIMPLE SPARKLESPARKLE ELSNETELSNET

EuroWordNetEuroWordNet MATEMATE NITENITE Cluster 488 (Italian)Cluster 488 (Italian) TAL TAL (Italian)(Italian) ISLEISLE ENABLERENABLER INTERAINTERA …… SENSEVALSENSEVAL WRITEWRITE Forum TAL (Italian)Forum TAL (Italian) …… LIRICSLIRICS ISOISO ELRAELRA LRECLREC LRE JournalLRE Journal NEDONEDO ……

Essential role of ECEssential role of ECto start a basic to start a basic InfrastructureInfrastructure

EU at the EU at the forefront in the forefront in the

areas of LRs areas of LRs and standards and standards

in the ’90sin the ’90s

Established a modelEstablished a model

Page 4: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

4Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Today: a broad “potential” Today: a broad “potential” InfrastructureInfrastructure

RELATORRELATOREAGLES/ISLEEAGLES/ISLEENABLERENABLER ELSNETELSNETTELRITELRIINTERAINTERA……LIRICSLIRICSELRAELRA

BLARKBLARKUnified Lexicon (W/S)Unified Lexicon (W/S)

LRECLRECLRE journalLRE journal……ERANET-LangNetERANET-LangNet……

LDC LDC & others& othersISO ISO COCOSDA/WRITECOCOSDA/WRITE US US

CyberinfrastructurCyberinfrastructuree

Japan COE21Japan COE21NEDONEDO……

EUEU InternatInternat

National National

………………

Cooperative

Cooperative

initiatives –

initiatives –

Links to…Links to…

CLARIN CLARIN (ESFRI (ESFRI proposalproposal))

Vitality &Vitality & Success signs… for LRsSuccess signs… for LRs

Page 5: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

5Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

{{Casa,abitazione,dimoraCasa,abitazione,dimora}}

HyperonymHyperonym:: {edificio,..}

Hyponym:Hyponym:{villetta }{catapecchia, bicocca, .. }{cottage}{bungalow }

Role_location: {stare, abitare, ...}

Role_target_direction: {rincasare}

Role_patient: {affitto, locazione}

Mero_part: {vestibolo}

{stanza}Holo_part: {casale} {frazione} {caseggiato}

{{home,domicile,..}}{{house}}

TOP ConceptsTOP Concepts: Object,Artifact,BuildingObject,Artifact,Building

WordNetsWordNetsSynsets linked by semantic relationsSynsets linked by semantic relations

Page 6: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

6Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Terminological Wordnets: Terminological Wordnets:

e.g. e.g. JurJur--WordNetWordNet

JurJur-WordNet-WordNet EExtension for the xtension for the juridical domainjuridical domain

of ItalWordNet of ItalWordNet (With ITTIG-CNR - Istituto di Teoria e Tecniche dell’Informazione Giuridica)(With ITTIG-CNR - Istituto di Teoria e Tecniche dell’Informazione Giuridica)

Knowledge base for multilingual access to sources of legal Knowledge base for multilingual access to sources of legal informationinformation

Source of metadata for semantic markup oflegal textsSource of metadata for semantic markup oflegal texts

To be used, together with the generic ItalWordNet, in To be used, together with the generic ItalWordNet, in applications of Information Extraction, Question Answering, applications of Information Extraction, Question Answering, Automatic Tagging, Knowledge Sharing, Norm Comparison, Automatic Tagging, Knowledge Sharing, Norm Comparison, etc.etc.

Page 7: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

7Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

PAROLE- SIMPLE-CLIPS Lexicon: PAROLE- SIMPLE-CLIPS Lexicon: …harmonised model for 12 European …harmonised model for 12 European

languageslanguages

Page 8: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

8Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

TopTop

FormalFormal ConstitutiveConstitutive AgentiveAgentive TelicTelic

Is_aIs_a Is_a_part_ofIs_a_part_of PropertyProperty

ContainsContains

Created_byCreated_by Agentive_causeAgentive_cause Indirect_telicIndirect_telic PurposePurpose

InstrumentalInstrumental

Is_the_habit_ofIs_the_habit_ofUsed_forUsed_for Used_asUsed_as

... ...

The targets of relations identify:

prototypical semantic information associated with a SemUprototypical semantic information associated with a SemU

elements of dictionary definitions of SemUselements of dictionary definitions of SemUs

typical corpus collocates of the SemUtypical corpus collocates of the SemU

100 Rels.100 Rels.

....

ActivityActivity.... ....

For a BioLexicon

For a BioLexicon

Page 9: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

9Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

mangiarmangiaree

Domain - Semantic classDomain - Semantic classDomain - Semantic classDomain - Semantic class

Page 10: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

10Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

mangiarmangiaree

Used_forUsed_forObject_of_thObject_of_the_e_aactivityaactivity

man

gia

rem

an

gia

re

man

gia

rem

an

gia

re

tavolatavola

FURNITUREFURNITURE

forchettaforchetta

posataposata

INSTRUMENTINSTRUMENT

ristoranteristorante

BUILDINGBUILDING

cucinare

cucinare

cuocere

cuocere

mestolomestolo

pentolapentola

CONTAINERCONTAINER

mangia

mangia

rere

friggere

friggere

friggitricefriggitrice

bollitorebollitore

bollire

bollire

pes

cepes

ce

pescierapesciera

Is_the_activity_of

Is_the_activity_of

cuococuoco

PROFESSIONPROFESSION

cucin

are

cucin

arem

angi

are

man

giar

e

man

giar

e

man

giar

em

angia

re

man

giar

e

man

gia

rem

angia

re

coniglioconiglio

carnecarne

melamela

carotacarota

arrostoarrosto

man

gia

rem

an

gia

re

ARTIFACT _FOODARTIFACT _FOOD

VEGETABLESVEGETABLES

FRUITFRUITFOODFOOD

SUBSTANCE_FOODSUBSTANCE_FOOD

+edible+edible

zuccherozucchero

alloroalloro

tartufotartufo

VEGETAL_ENTITYVEGETAL_ENTITY

FLAVOURINGFLAVOURING

NATURAL_SUBSTANCENATURAL_SUBSTANCE

AGENTIVEAGENTIVE

TELICTELIC

Created_byCreated_by

cucinarecucinare

cuocerecuocerearrostirearrostirebollirebollire

lessarelessarestufarestufare

friggere friggere rosolarerosolaregrigliaregrigliare

…………

Domain - Semantic classDomain - Semantic classDomain - Semantic classDomain - Semantic class

Page 11: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

11Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

These These dimensionsdimensions

could be at the basis of a could be at the basis of a new Paradigm for LRs & new Paradigm for LRs &

LTLT& of a new Infrastructure& of a new Infrastructure

Dynamic LRsDynamic LRs

SharingSharing

Collaborative creation & Manag.Collaborative creation & Manag.

Content interoperabilityContent interoperability

++ Distributed architecturesDistributed architectures

Need toolsNeed tools

Technology existTechnology exist

In the ’90sIn the ’90s:: there was a global vision of the field & its main there was a global vision of the field & its main components: components: Standards, Creation of LRs, Automatic Standards, Creation of LRs, Automatic

acquisition, Distributionacquisition, Distribution TodayToday: the wealth of data & basic technology is such that we should : the wealth of data & basic technology is such that we should

reflect again at the field as a whole & ask ifreflect again at the field as a whole & ask if these these are still “the” are still “the” important components, or how they have changed/must changeimportant components, or how they have changed/must change

… … Which new challenges for Which new challenges for a a

mature infrastructure of mature infrastructure of LRs & LT??LRs & LT??

Page 12: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

12Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Basic LR coverage for all languagesBasic LR coverage for all languages ((BLARK/ELARKBLARK/ELARK) Specific (new) types of LRs: Specific (new) types of LRs: opinion, sentiment, emotion, opinion, sentiment, emotion,

subjectivitysubjectivity;; ““Example-based” context sensitive LRs,Example-based” context sensitive LRs, Lexicon & Corpus Lexicon & Corpus

togethertogether, dynamically created, dynamically created, new ways to extract value new ways to extract value from large linguistic repositories : from large linguistic repositories : Web exploited as a Web exploited as a multilingual corpusmultilingual corpus

Tools to quickly develop LRs Tools to quickly develop LRs (acquisition, annotation, porting (acquisition, annotation, porting betw. domains/languages);betw. domains/languages); Coordinate the development of LTs & Coordinate the development of LTs & LRs (also across languages)LRs (also across languages)

Knowledge transfer across languages; Maintenance of Knowledge transfer across languages; Maintenance of

LRsLRs

Cooperation betw. communities of HLT & Semantic Cooperation betw. communities of HLT & Semantic Web/OntologistsWeb/Ontologists

'Open Source''Open Source' concept for LRs & LT, Open & concept for LRs & LT, Open & distributed architectures for LRs and LT, distributed architectures for LRs and LT, wiki-mode?wiki-mode? Collaborative Infrastructures Collaborative Infrastructures Interoperability & Interoperability & StandardsStandards

GRID technologyGRID technology ……

Challenges & Priorities for Challenges & Priorities for LRsLRswith technological and/or

organisational/political aspects

Multilingua

Multilingua

litylityUnifying

frameworks

Page 13: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

13Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Subjectivity, opinion, sentiment, emotionSubjectivity, opinion, sentiment, emotion: orthogonal issue wrt objective content. Detection /separation of Detection /separation of subjective from objective contentsubjective from objective content, opinion mining, extraction of positive & negative perceptionspositive & negative perceptions, have obvious and big impactbig impact in many applications, e.g. business intelligence

Commonsense understandingCommonsense understanding with major implications allow commonsense reasoning/inference: plausible vs allow commonsense reasoning/inference: plausible vs

logical, for fail-soft applicationslogical, for fail-soft applications can be pursued in distributed and collaborative fashion can be pursued in distributed and collaborative fashion

by the community as a wholeby the community as a whole relation of this with how an agent might put together relation of this with how an agent might put together

SW services to accomplish high–level goals for the userSW services to accomplish high–level goals for the user Temporal structureTemporal structure for which de facto standards are

emerging (TimeMLTimeML) Integration of text, speech and gestureIntegration of text, speech and gesture Strategies for handling miscommunicationhandling miscommunication Hybrid approaches, Interdisciplinary approachesHybrid approaches, Interdisciplinary approaches …

LT & “new” topics

MultimodalMultimodal

ityity

Page 14: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

14Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Natural convergence with HLTHLT:

•multilingual semantic multilingual semantic

processingprocessing•ontologiesontologies•semantic-syntactic semantic-syntactic

computational lexiconscomputational lexicons

In the Semantic Web In the Semantic Web vision ...vision ...

……need to tackle the twofold challenge of need to tackle the twofold challenge of content availabilitycontent availability && multilingualitymultilinguality

Page 15: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

15Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Issues in LR & LT research agendaIssues in LR & LT research agendaconverging with Semantic Web converging with Semantic Web

needsneedsFrom LT:From LT:

Meaning & content Meaning & content Knowledge Knowledge Semantic markup: Semantic markup: Concept-based Text Concept-based Text

representationrepresentation

Semantic lexicons/ Terminologies/ Semantic lexicons/ Terminologies/ OntologiesOntologies

To create a web of metadataTo create a web of metadata

Viceversa, from SW:Viceversa, from SW: LRs as web servicesLRs as web services

Ontologies for LRs & LT Collaborative & distributed infrastructure; Collaborative & distributed infrastructure;

open accessopen access Interoperability & standardsInteroperability & standards

to to add meaning to Web dataadd meaning to Web data & make it & make it usable for processing, mining, add spatial & usable for processing, mining, add spatial & temporal metadata, …temporal metadata, …

Page 16: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

16Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Computational Lexicons:Computational Lexicons: challenges from the Semantic Webchallenges from the Semantic Web

Semantic Web

The The Semantic Web VisionSemantic Web Visionturning the WWW into turning the WWW into

a machine understandable knowledge basea machine understandable knowledge base

Ontologies

KnowledgeMarkup

IntelligentAgents

Applications

Documents

Databases

ComputationalLexicons

LinguisticMarkup

Page 17: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

17Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Language/sLanguage/s

OntologiesOntologies and and Computational LexiconsComputational Lexicons

ConceptConceptSpaceSpace

ConceptConceptSpaceSpace

OntologyOntology

ComputationalComputationalLexiconLexicon

SemanticsSyntax

MorphologyMultilinguality

polysemy, context-sensitiveness,

etc.

Page 18: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

18Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

term extraction from textterm extraction from text

{museo, quadro, pinacoteca, biblioteca, sito_archeologico, museo_archeologico, museo_etrusco, scultura, affresco, …}

conceptual clustering of termsconceptual clustering of terms

C_MUSEO: {museo, pinacoteca, …}

C_MUSEO_ARCHEOLOGICO: {museo_archeologico, museo_etrusco, …}

C_OPERA_ARTISTICA: {quadro, scultura, affresco, …}

C_MUSEO

C_MUSEO_ARCHEOLOGICO

is_aOntologyOntology

concept structuringconcept structuring

TL+MLTL+ML

Ontology Ontology LearningLearning

Page 19: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

19Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Identification of horizontal relations among terms through the events which better characterise them

Ontology Learning in T2KOntology Learning in T2K

from thesaurus to conceptual mapfrom thesaurus to conceptual map

eventsevents - - situationssituations

involving domain entities

Page 20: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

20Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Reference Reference Lexical ResourcesLexical Resources

Tools for Tools for terminology terminology extractionextraction

Tools forTools forAnnotation of Annotation of

the logicalthe logicalstructurestructure

Structured KnowledgeStructured Knowledge

LOGICAL FORMLOGICAL FORM

Module of analysis of ItalianModule of analysis of Italian

For Applications:For Applications: Semantic/Conceptual Semantic/Conceptual Annotation of TextsAnnotation of Texts

Page 21: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

21Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Semantic WebSemantic WebSemantic WebSemantic Web LT & LRsLT & LRs

Content Interoperable LRs & LTContent Interoperable LRs & LT

Language Tech … Language Tech … & … & …

Knowledge, Knowledge, ContentContent

Knowledge MarkupKnowledge Markup

Ready?Ready?????

How to How to cooperate??cooperate??

Hum&SSHum&SS

Page 22: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

22Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

A new paradigm of R&D in LRs & A new paradigm of R&D in LRs & LTLT

Open & distributed linguistic infrastructures for Open & distributed linguistic infrastructures for LRs & LTLRs & LT

adopting the paradigm of adopting the paradigm of accumulation of accumulation of knowledgeknowledge so successful in more mature so successful in more mature disciplines, based on sharing LRs & toolsdisciplines, based on sharing LRs & tools

ability to build on each other achievements, results ability to build on each other achievements, results accessible to various systems, allowing controlled accessible to various systems, allowing controlled & & effective cooperation of many groups on effective cooperation of many groups on common tasks common tasks (see HGP (see HGP HLPHLP))

Emerging concept of collective intelligenceEmerging concept of collective intelligence

Emphasize Emphasize interoperabilityinteroperability among LRs, LT & among LRs, LT & knowledge basesknowledge basese. g. initiatives aimed at achieving international e. g. initiatives aimed at achieving international consensus on annotation guidelines: consensus on annotation guidelines: to merge to merge annotation efforts, produce coherent, comprehensive annotation efforts, produce coherent, comprehensive linguistic annotations to be readily disseminated throughout linguistic annotations to be readily disseminated throughout the communitythe community

Page 23: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

23Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

ISO & LIRICS: ISO & LIRICS: Meta-model & Data Meta-model & Data

CategoriesCategoriese.g. Proposal for an ISO standard for NLP lexicae.g. Proposal for an ISO standard for NLP lexica

Define a Define a Lexical Markup FrameworkLexical Markup Framework, a general & abstract meta-, a general & abstract meta-model & a set of structural nodes relevant for linguistic descriptionmodel & a set of structural nodes relevant for linguistic description

Define a flexible environment, enabling specific implementations of Define a flexible environment, enabling specific implementations of user-defined mark-up languages (called LML) on the basis of user-defined mark-up languages (called LML) on the basis of common DCscommon DCs

ObjectivesObjectives Design of the abstract lexical meta-modelabstract lexical meta-model Definition of the common setcommon set of related Data CategoriesData Categories

The field is mature

The field is mature

Builds also on Builds also on EAGLES/ISLEEAGLES/ISLE

Page 24: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

24Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

MILE Lexical ModelMILE Lexical Model

Data Categories for Content Data Categories for Content InteroperabilityInteroperability

MILEEntry Schema

MILE LexicalClasses

User DefinedUser DefinedMDCMDC

MDCMDCRegistryRegistry

RDF/SRDF/SDescriptionsDescriptions

Monolingual/MultilingualMonolingual/MultilingualLexiconLexicon

ISO ISO TC37 SC4/WG4TC37 SC4/WG4

ISO ISO TC37 SC4/WG4TC37 SC4/WG4

MMultilingual ultilingual IISLE SLE LLexical exical EEntryntry

LIRICSLIRICS

NEDONEDO

Page 25: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

25Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Beyond MILE:Beyond MILE: towards open & distributed Lexicon towards open & distributed Lexicon

InfrastructureInfrastructure

Semantic Lexicon

URI = http://www.xxx…

Syntactic Constructions

URI = http://www.yyy…

OntologyOntology

URI = http://www.zzz…

Monolingual/Monolingual/MultilingualMultilingual LexiconsLexicons

Lex_object: semFeature

URI = http://www.xxx…#HUMAN

Lex_object: syntagmaNT

URI = http://www.zzz…#NP

Corpora/Web

LanguageLanguageKnowledgeKnowledge

……towards the towards the Semantic WebSemantic Web

……towards the towards the Semantic WebSemantic Web

Page 26: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

26Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Lexical WEB & Standards forLexical WEB & Standards for

Content Interoperability … still openContent Interoperability … still open

as a critical step for as a critical step for semantic mark-upsemantic mark-up in in the SemWebthe SemWeb

ComLex

SIMPLE

WordNetsWordNets

WordNets

FrameNetLex_x

Lex_y

MILEMILE

with intelligent

agents

NomLex

Standards for Standards for InteroperabilInteroperabil

ityity

Page 27: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

27Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Open distributed architectures for LRs Open distributed architectures for LRs

and LT, interoperability,and LT, interoperability, GRID GRID

technology, … technology, … & standards& standards

e-Science: e-Science: GRID technologyGRID technology for large-scale for large-scale

distributed collaborative processing of distributed collaborative processing of huge quantities of huge quantities of “facts & their “facts & their relations”relations” (development of large-scale (development of large-scale annotated LRs, linking them across different annotated LRs, linking them across different sources, …)sources, …)

problem of how to coordinate different problem of how to coordinate different information sourcesinformation sources

new ways of extending large-scale LRs new ways of extending large-scale LRs and knowledge bases and knowledge bases relying on relying on volunteer labourvolunteer labour, , wiki-modewiki-mode??

interoperability

interoperability

Towards:Towards: Large online “open source” Large online “open source” collaborative projectscollaborative projects

Page 28: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

28Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Need of tools to make this Need of tools to make this vision operational & concretevision operational & concrete

E.g. new prototype built in Pisa E.g. new prototype built in Pisa ((http://xmlgroup.iit.cnr.it:98/MILE/lexflow/demo.xhtml):http://xmlgroup.iit.cnr.it:98/MILE/lexflow/demo.xhtml):

LeXFlow, a web-based collaborative environment LeXFlow, a web-based collaborative environment

for semi-automatic management of lexical for semi-automatic management of lexical

resourcesresources

Is intended to fulfil the requirements posed by Is intended to fulfil the requirements posed by

innovative types of LRs by supporting:innovative types of LRs by supporting: Dynamic language resources, integrating tools for automatic Dynamic language resources, integrating tools for automatic

acquisition of information from corpora and cross-fertilization acquisition of information from corpora and cross-fertilization

of lexiconsof lexicons Content interoperability of resources, by supporting ISLE/ISO Content interoperability of resources, by supporting ISLE/ISO

standardsstandards Cooperative & collective creation and management of LRs, by Cooperative & collective creation and management of LRs, by

providing a web-based environment for the collaboration and providing a web-based environment for the collaboration and

interaction of distributed agents and resourcesinteraction of distributed agents and resources

Page 29: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

29Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Why an infrastructure of Why an infrastructure of LRs?LRs?

Because what is special in Language data …Because what is special in Language data …

… … is what is more difficult wrt hard sciences, is what is more difficult wrt hard sciences,

i.e. i.e. “language” and its “ambiguity” “language” and its “ambiguity”

Already Already in the ENABLER Mission:in the ENABLER Mission:

Availability of LRsAvailability of LRs also a also a “sensitive” “sensitive” issueissue, , touching the sphere of linguistic & touching the sphere of linguistic & cultural identity, but also with cultural identity, but also with economical & political implicationseconomical & political implications

Putting togetherPutting together technical, technical, organisational,organisational,strategic, strategic, political political issues of LRsissues of LRs

Page 30: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

30Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Cultural issuesCultural issuesLanguage … and cultural identity cultural identityLanguage … and the Humanities the Humanities

Why an infrastructure of Why an infrastructure of LRs?LRs?

Many dimensions around the notion Many dimensions around the notion of languageof language

Economic, social issuesEconomic, social issuesApplications

Services Technical issuesTechnical issues

Interdisciplinarity &

Interdisciplinarity &

Multidisciplinarity

Multidisciplinarity

Political issuesPolitical issuese.g. a commonly agreed list of minimal

requirements for “national” LRs: BLARK

Multi

lingua

Multi

lingua

lism

lism

Need of bodies for

Need of bodies for

a broad research

a broad research

agenda & strategic

agenda & strategic

actionsactions

for LT&LRs (W/S /MM)

for LT&LRs (W/S /MM)

Putting togetherPutting together technical, technical, organisational, strategic, organisational, strategic, political political issues of LRsissues of LRs

Page 31: 1Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006 Nicoletta Calzolari Istituto di Linguistica Computazionale -

31Nicoletta Calzolari - Emerging technologies for Digital Libraries, Poland, November 2006

Which Which Communities?Communities?

Language Language ResourcesResources

Language Language TechnologyTechnology

StandardisatiStandardisationon

GridGridSemantic Semantic WebWebOntologistsOntologistsICTICT……

HumanitiesHumanitiesSocial SciencesSocial SciencesDigital Digital LibrariesLibrariesCultural Cultural HeritageHeritage……

Many Many applicationapplication domains domains ((eculture, egovernment, ehealth, …)eculture, egovernment, ehealth, …)

corecore

Multilinguality

EnablinEnabling g

infrastrinfrastr

forfor

onon

Focus on cooperationFocus on cooperation

Technologies exist, but the infrastructure Technologies exist, but the infrastructure that puts them together and sustains that puts them together and sustains them is still missingthem is still missing

forfor