35
Tools and resources (not only) for French, Italian and Spanish Thomas Koller NCLT seminar series, 22.11.2005

Tools and resources (not only) for French, Italian and Spanish

  • Upload
    len

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Tools and resources (not only) for French, Italian and Spanish. Thomas Koller NCLT seminar series, 22.11.2005. Overview. Plurilingual learning Existing resources Created resources Developed tools Software architecture. Overview. Plurilingual learning Existing resources - PowerPoint PPT Presentation

Citation preview

Page 1: Tools and resources (not only) for  French, Italian and Spanish

Tools and resources (not only) for French, Italian and Spanish

Thomas Koller

NCLT seminar series, 22.11.2005

Page 2: Tools and resources (not only) for  French, Italian and Spanish

Overview

Plurilingual learning

Existing resources

Created resources

Developed tools

Software architecture

Page 3: Tools and resources (not only) for  French, Italian and Spanish

Overview

Plurilingual learning

Existing resources

Created resources

Developed tools

Software architecture

Page 4: Tools and resources (not only) for  French, Italian and Spanish

Plurilingual learning

• Exploits learners’ knowledge of similar languages

• Raises language awareness by showing similar properties in several languages

• Aims to avoid learners’ typical errors related to transfer processes

Page 5: Tools and resources (not only) for  French, Italian and Spanish

Plurilingual learning: Fields of similarity

• Pan-Romance vocabulary (dormir, sang, vin) – 39 words in all languages– 141 words in 8-9 languages– 227 words in 5-7 languages

• Sound correspondences– sp. ñ → fr. / it. gn, n :

señor, campaña → seigneur, campagne / signore, campagnaaño → an / anno

• Morphosyntactic elements

Page 6: Tools and resources (not only) for  French, Italian and Spanish

Plurilingual learning: Example

El

Il

Le

padre

padre

père

habla

parla

parle

con

con

avec

su

suo

son

hijo

figlio

fils

de la

della

de l’

escuel

a

scuola

école

paternalPater

parl-

su-

fi l-

de la

Schuleschool

Page 7: Tools and resources (not only) for  French, Italian and Spanish

Overview

Plurilingual learning

Existing resources

Created resources

Developed tools

Software architecture

Page 8: Tools and resources (not only) for  French, Italian and Spanish

Existing resources: Linguistic tools

• POS tagger– TreeTagger– SVMTool (Spanish, English, Catalan)

• IBM JFrost lemmatiser– provides possible base forms + POS– morphological information (no POS tagging)

• Verb conjugator– English, German, French, Italian and Spanish– generates all forms for all tenses

Page 9: Tools and resources (not only) for  French, Italian and Spanish

Existing plurilingual resources

• Pan-Romance wordlist: 840 words eau agua acqua -- utiliser utilizar utilizzare

• Profile words: 340 words avec con con -- presque casi quasi

• Sound correspondences: – Italian → Spanish: 19 chi- → ll- chiamare → llamar– Italian → French: 19 -ott- → -uit- notte → nuit– Spanish → Italian: 23 -ue- → -uo- bueno → buono– Spanish → French: 31 ll- → pl- llorar → pleurer– French → Italian: 17 qu- → ch- que → che– French → Spanish: 27 -ein → -eno plein → lleno

Page 10: Tools and resources (not only) for  French, Italian and Spanish

Existing resources

• Bilingual wordlists– wordlists can easily be converted into

• different XML formats• relational databases

– used to create multilingual XML lexicons• Plurilingual lexicon

– French, Italian, Spanish (Portuguese, Romanian)

– 1800 entries

Page 11: Tools and resources (not only) for  French, Italian and Spanish

Existing resources: Plurilingual lexicon

– [1]actuar, [2]tratarseagir [v] {1 intransitif, 2 pronominal impers.}[1]agire, [2]trattarsi

– [caldo->'bouillon'], calientechaud [adj] caldo

– contar [+'raconter']compter [v]contare [+'raconter']

Page 12: Tools and resources (not only) for  French, Italian and Spanish

Overview

Plurilingual learning

Existing resources

Created resources

Developed tools

Software architecture

Page 13: Tools and resources (not only) for  French, Italian and Spanish

Created resources

• Multilingual XML lexicon– 43 topics– French: 11,500 lemmas / 14,900 entries– Italian: 13,400 lemmas / 17,800 entries– Spanish: 14,600 lemmas / 19,700 entries– English: 17,600 lemmas / 25,900 entries– German: 5,200 lemmas / 7,300 entries– POS: nouns (m, n, f), verbs, adverbs, adjectives,

conjunctions, articles, pronouns, prepositions, interjections, numerals

– Language levels: 1 - 4

Page 14: Tools and resources (not only) for  French, Italian and Spanish

Multilingual XML lexicon: sample entry

Page 15: Tools and resources (not only) for  French, Italian and Spanish

Created resources: verb lexicons

Verb lexicons with 500 verbs for each language containing verb pattern information

accepter <vt> <v pron>[de + INF][de faire qch][par][qch de qn][que]

Page 16: Tools and resources (not only) for  French, Italian and Spanish

Created resources: verb lexicons

Full-form verb lexicons for 1500 – 1700 verbs

échappeéchapper:pres:1séchapper:pres:3séchapper:subj_pres:1séchapper:subj_pres:3séchapper:impe:2s

abandonner1s_abandonne2s_abandonnes3s_abandonne1p_abandonnons2p_abandonnez3p_abandonnent

Page 17: Tools and resources (not only) for  French, Italian and Spanish

Overview

Plurilingual learning

Existing resources

Created resources

Developed tools

Software architecture

Page 18: Tools and resources (not only) for  French, Italian and Spanish

Overview

Developed tools

Animated grammar presentations

Dictionary tools

Plurilingual analysis module

Page 19: Tools and resources (not only) for  French, Italian and Spanish

Animated grammar presentations

• Dynamic representation of grammatical properties / processes

• Tailor-made presentations – Replacing indications of place– Emphasising the subject– Irregular verb conjugations– Spatial prepositions and movements

• Authoring tool for creation of slide-based learning materials with animated content– produces slide-based learning materials– animated and/or static text can be included

Page 20: Tools and resources (not only) for  French, Italian and Spanish

Authoring tool: Presenter

• Can be embedded in web page or used as standalone tool in Windows

• XML data can be created automatically and then fed into the presenter→ suitable for flexible feedback

• Several XML files can be provided for use in one page and then e.g. chosen via PHP or JavaScript

Page 21: Tools and resources (not only) for  French, Italian and Spanish

Dictionary tools

• Input: any text in French, Italian or Spanish

• Provide word-by-word translations• Multilingual dictionary tool

– Tense, number, person for verb forms– POS– Topic

• Plurilingual dictionary tool– Similar word forms– Profile words

Page 22: Tools and resources (not only) for  French, Italian and Spanish

Multilingual dictionary: Resources

• Used resources– Multilingual XML lexicons, multilingual

MySQL database– Full-form verb lexicons

• Dictionary tool can easily be used with any other data base– special language dictionaries– monolingual definition dictionaries

Page 23: Tools and resources (not only) for  French, Italian and Spanish

Plurilingual dictionary: Tools and resources

• TreeTagger provides most likely POS• Pan-Romance wordlist and list of profile

words• Tool makes use of

– sound correspondences – Levenshtein string similarity measure – multilingual MySQL database

to automatically detect graphically similar words with the same meaning

Page 24: Tools and resources (not only) for  French, Italian and Spanish

Plurilingual dictionary: Word detection

• Basically all words of target language with “distance” ≤ 2 are displayed

• Sp. posibilidad -- Fr. possibilité → Normal distance: 4

• Sound correspondence: Sp. -dad -- Fr. -té→ Intermediate form: posibilité

• Distance between intermediate form and French form is now only 1

Page 25: Tools and resources (not only) for  French, Italian and Spanish

Plurilingual analysis module

• Exploits similar sentence structures in Romance languages

• Able to analyse learner input up to (paragraphs of) simple sentences and to give detailed feedback

Page 26: Tools and resources (not only) for  French, Italian and Spanish

Resources

• JFrost: – possible lemmas + POS – (extended morphological information)

• Verb lexicons• Hand-crafted grammar

Page 27: Tools and resources (not only) for  French, Italian and Spanish

Parser type

Robust island parser

Hoy la madre no ha vuelto a hablar con su hijo.

Verb group:

V V P V

• has a fixed position and extension in the sentence• only contains verbs and certain POS

subject objectVerb group

sentence is splitted at potential verb groups only parts before and after verb group are actually parsed

Page 28: Tools and resources (not only) for  French, Italian and Spanish

Analysis module: Recognised errors

• Agreement errors– inside NPs– between sentence components

• Subcategorisation errors– too many/few sentence components– wrong preposition– wrong infinite verb form

• Position errors– Negation– Adverbs

• ...

Page 29: Tools and resources (not only) for  French, Italian and Spanish

Error recognition

• Constraint relaxation – no constraints during parsing– suite of tests after parsing

• Agreement• Position of adverbs• Correctness of Verb group

• Error rules

Page 30: Tools and resources (not only) for  French, Italian and Spanish

Modules

• Grammar reader– Reads in grammar file– Extrapolates phrase structure rules

NP -> (det) n (AP)– Provides direct access to subparts of the grammar

”give me all NP rules for Spanish”• Verb group divider

– Divides sentence at its verbal group– Returns the sentence chunks before and after the VG

• NP finder– Finds all possible NP occurrences in sentence

chunks– Returns positions of NPs in sentence chunks

Page 31: Tools and resources (not only) for  French, Italian and Spanish

Overview

Plurilingual learning

Existing resources

Created resources

Developed tools

Software architecture

Page 32: Tools and resources (not only) for  French, Italian and Spanish

Interaction of software components

Flash

Server Client

Web page

Shared

Object

XML

MySQL NLP

PHPPerlJavaNLP

XML

PDF

Page 33: Tools and resources (not only) for  French, Italian and Spanish

Software architecture: Pros

• Uniform representation on several platforms, browser-independent

• Easy integration of different media types (audio, video, images, animation)

• Embed fonts for many character sets (Cyrillic, Hebrew, Arabic, Chinese, Japanese, Korean)

• Flash Remoting: sending complex data structures (Java objects, arrays, hashes) to and from server

Page 34: Tools and resources (not only) for  French, Italian and Spanish

Software architecture: Pros

• Flash files can interact mutually via JavaScript, LocalConnection class or using the same Local Shared Objects

• Local Shared Objects provide the opportunity to save structured data (e.g. XML data) on the client side

• No reload necessary for incoming server data

• Can read XML files, you can use XPath and regular expressions

Page 35: Tools and resources (not only) for  French, Italian and Spanish

Software architecture: Cons

• (Requires browser plug-in)• Steep learning curve at the beginning• Contents cannot be read by search

engines• Software is not for free