Upload
whitney-bridges
View
220
Download
0
Tags:
Embed Size (px)
Citation preview
Software Applications for Processing
Romanian Texts. Demonstration and
Comparison
Sanda Cherata
Babeş-Bolyai University
Faculty of Letters
2
Software Applications
The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva
www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,
adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts
3
DMR
Paradigm of a given lemma• classic form• stem + termination
Accents Syllabification Morphological analysis of a given word
4
Software Applications
The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva
www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,
adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts
5
LEXICON
Specifying attributes for lexico-morphological classes
Designed to collect data from multiple users Friendly interface
6
Software Applications
The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva
www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,
adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts
7
SIASTRO-AM
Lexico-morphological analysis Parsing of noun, adjective, adverb, verb and
prepositional phrases• Uses a lexicon based on DMR, enriched with new
lexical and syntactic attributes added with the LEXICON application
• Outputs an annotated text
8
SIASTRO-AMTags for text elements
{F – Start sentencesentencesentence
F} – End sentencesentence{C – Start wordword wordC} – End wordword{N – Start unknown wordunknown word unknown wordN} – End unknown wordunknown word{D – Start numbernumber numberD} – End numbernumber
{S – Start punctuation signpunctuation sign punctuation signS} – End punctuation signpunctuation sign{L – Start hyphenhyphen
-L} – End hyphenhyphen{I – Start ignored sequenceignored sequence
sequence I} – End ignored sequenceignored sequence
9
SIASTRO-AMTags for words
{C word ( part of speech + grammatical category + grammatical category + ...... , separates parts of speech + grammatical category + grammatical category + ...... ) syllabification+accent position: , separates homographs (.......) , ....... (......) syllabification+ accent
position:+ lemma +: ......C}
{C date{C date (vrb+p_fp+,(vrb+p_fp+, sbt+fdpn+fisn+fipn+fvpa+,sbt+fdpn+fisn+fipn+fvpa+, adj+fdpn+fisn+fipn+fvpa+adj+fdpn+fisn+fipn+fvpa+ )) da-te+2:+da+:+dată+:+dat+:da-te+2:+da+:+dată+:+dat+:C}C}
10
Software Applications
The Romanian Morphological Dictionary (DMR) – Software ITC SA – RoLingva
www.rolingva.ro LEXICON – for updating attributes in lexical entries SIASTRO-AM – phrase analysis of noun, adjective,
adverb, verb and prepositional phrases ETR – term extractor for Romanian specialised texts