25
TOWARDS THE CREATION OF A BELARUSIAN GRAMMATICAL DICTIONARY Igor V. Shevchenko (ULIF NANU, Kyiv) Natalia Kotsyba (ISPAN, Warsaw) Kiryl Kurshuk (Hrodna University)

TOWARDS THE CREATION OF A BELARUSIAN GRAMMATICAL DICTIONARY

Embed Size (px)

DESCRIPTION

TOWARDS THE CREATION OF A BELARUSIAN GRAMMATICAL DICTIONARY. Igor V . Shevchenko ( ULIF NANU , Kyiv ) Natalia Kotsyba (ISPAN, Warsaw) Kiryl Kurshuk (Hrodna University). Grammatical dictionarie s: applications. p rovide description of the word - declination and word - formation - PowerPoint PPT Presentation

Citation preview

Page 1: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

TOWARDS THE CREATION OF A BELARUSIAN GRAMMATICAL

DICTIONARY

Igor V. Shevchenko (ULIF NANU, Kyiv) Natalia Kotsyba (ISPAN, Warsaw)Kiryl Kurshuk (Hrodna University)

Page 2: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Grammatical dictionaries: applications

• provide description of the word-declination and word-formation

• enable lemmatization• morphological analysis and synthesis• grammatical tagging of text corpora• can be integrated into other dictionaries:

explanatory, synonimic, etc. (“Словники України”)

Page 3: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Grammatical dictionaries: history

• prototype: Grammatical Dictionary of Russian by Andrey Zaliznyak (1967)

• Grammatical dictionary of the Ukrainian language (UGD) by Igor Shevchenko, 2456 word-inflexion grammatical classes (WIC)

• Grammatical dictionaries for Polish, German, English, etc., in ULIF NASU

Page 4: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Grammatical dictionaries: features

• contain information about all the forms of inflected words of a language and their grammatical features

• uniformity of information is provided by WIC or• WIC is a set of words with the same type of

word-inflexion• WIC is determined by a combination of the

parameter values• words belonging to the same WIC differ in their

invariable parts only

Page 5: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

POS-independent value parameters

• part of speech (or WI-generalization, WI-type) masculine nouns, feminine nouns

• type of word stem • conjugation pattern• type of consonant-vowel changes• paradigm incompleteness• non-typical features of wordforms in certain

grammatical meanings • type of the accent distribution in the word-

inflexion paradigm.

Page 6: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

POS-dependent value parametersFor verbs: • aspect , reflexivity , imperative form, passive

participle suffixFor nouns: • gender, animacy, genitive for masculine nouns,

locative for masculine and neutral nouns, dative for masculine nouns, accusative case in plural

• etc.

Page 7: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

WIC examples: #1540 and #382Lexeme POS declension basis change anim genitiv WIC

вbборець

видавtць

іспfнець

промислjве

ць

n 2dec soft -е person а 1540

Lexeme POS conjugation basis final

suffix

aspect reflex change WIC

компенсувfти

ліквідувfти

нормалізувfт

и

v 1dec iota -ати imperf

+ perf

– – 382

Page 8: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

WIC examples: #2335 and #1145

Lexeme POS declension basis change animal

coord

pecul WIC

Вітfліїв

змsїв

нeнціїв

a possessive hard ї-є + – 2335

всzкий

дtкотрий

жjдний

pron general hard – + – 1145

Page 9: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Word-inflexion parameter: scope• word-inflexion parameter can be regarded as a

discrete function with a limited range of possible values (the value area)

• parameter "type of the word stem" can get one of 5 values: hard, soft, combined, iotized and r-type.

• parameter ”gender” has potentially 10 different values for a lexeme (three genders, six their combinations by the order of two and, besides, one combination of all three genders)

• genitive for masculine nouns has three values: -a (or -я, depending on the word ending), -у (-ю), or both -а (-я) and -у (-ю) are possible.

Page 10: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

GD for Belarusian

• Dictionary of the Belarusian Language. Spelling. Orthoepy. Accentuation. Word-Inflection. (Слоўнік беларускай мовы. Арфаграфія. Арфаэпія. Акцэнтуація. Словазмяненне), Мінск, 1987

• ca. 100000 words, including material from:• “The Belarusian-Russian Dictionary” (Moscow, 1962)• Explanatory Dictionary of the Belarusian Language

(“Тлумачальны слоўнік беларускай мовы”, vol. 1-5, 1977-1984)

• materials of the lexical card file of the Yakub Kolas Institute of Linguistics

Page 11: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Examples of word entries

• абвязfць зак., абвяжe, абвzжаш, -жа, -жам, -жаце, -жуць

• кацsць незак., качe, кjціш, кjціць, кjцім, кjціце, кjцяць

• вы#ган с.-г., -ну, -не, -наў; (дзеянне) -ну, -не

• насtнне [ньне] -нні• свякрeха -усе, -ух

Page 12: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Grammatical options and the accent

• друкfрня -рні, -рань і -рняў• мадэ#ль -ллю [льлю], -лей і –ляў

The scope of paradigm presentation varies depending on the rolling accent:

• кнот кнjта, -оце, мн. кнаты#, -тjў• лес лtсу, лtсе, мн. лясы#, лясjў• сват свfта, свfце, мн. сваты#, -тjў• склеп -па, -пе, мн. скляпы#, -пjў

Page 13: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Regular errors

• у# (under stress) > ў

абагачу > абагачў (“enrich”)• о# (under stress) > б

аблокі > аблбкі (“clouds”)

However: адбракаваны (“rejected”). We can’t change automatically combinations of some consonant letters, like дбр into дор.

• e# (under stress) > ё

смецце > смёцце (“rubbish”)• ы > ьі

Page 14: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Substitution of some affixes

• -цель, -чык, -шчык > -льнік, -нік, -ц-а, -ец, -ар.

збавіцель > збаўца (“rescuer”)

натхніцель > натхняльнік (“inspirer”)

выхавацель > выхавальнік (“educator”)• -учы (-ючы) > ц-а

выступаючы > выступоўца (“speaker”)• -енн > -ав

дарэформенны > дарэформавы (“pre-reform”)

Page 15: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Parallelism in the word-inflexion systems of Ukrainian and Belarusian

• nouns of neutral gender on -нне: “стаzнне” (“standing”), “абогатварэ#нне” (“idolizing”), “абагравfнне” (“playing up”) = Ukrainian WIC #2108 , -ння: “стоzння” (“standing”), “малювfння” (“drawing”)

• -сць: “легfльнасць” („legality”) „легкавfжнасць” (“light-mindedness”) = Ukrainian WIC #2143, 3rd declension with the change o-i in some cases: “актbвність” (“activity”), “раптjвість” (“suddenness”)

• adjectives with the -ы “бtлы” (“white”), “агeльны” (“general”) = Ukrainian WIC #2302 “бsлий” (“white”), “спsльний” (“common”), hard consonant stem

• verb ending with -аць “дбаць” (“take care”), “спаць” (“sleep”) = Ukrainian WIC #697 “дбfти” (“take care”), “спfти” (“sleep”), i.e. verbs of the 1st conjugation with iotized endings in present tense and without passive participle in the paradigm

• the infinitive form ending -ацца “абагашчfцца” (“get rich”), “зжывfцца” (“get used”) = WIC #700 “вигинzтися” (“bend”), “зупинzтися” (“stop”)

Page 16: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Some corresponding WICs

lang Lexeme POS declension basis chang

e

anim gen WIC

Ukr. стоzння n 2dec hard – person a 2108

Bel. абагравfн

не

n 2dec hard – animate а 2108

Ukr. раптjвість n 2dec hard і-о inanimat

e

а 2143

Bel. легfльнас

ць

n 2dec hard – inanimat

e

а 2143

lang Lexeme POS declination basis change animal

coord

pecul WIC

Ukr. бsлий adj general hard – + – 2302

Bel. агeльны adj general hard – + – 2302

Page 17: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Differences within WICs• no vowel change in Belarusian in the feminine nouns on -асць:

“твjрчасць” (“creativity”), inherent in similar Ukrainian nouns on -iсть: “влeчність – влeчності” (“marksmanship”) in some indirect cases

• Ukrainian WIC №1607, masc. nouns , 2nd decl. with hard consonant stems and genitive -a flexion, without vowel change, designating inanimate objects: “гриб” ("mushroom") = similar, vowel-invariable, Belarusian entries like "марjз" (“frost”).

But: • т-ц “абанемtнт” – dat. “абанемtнце” (“season ticket”)

WIC № 1615• д-дз „пад’tзд” – dat. “пад’tздзе” (“doorway”) WIC № 1627• double change of the lexeme “снег” (“snow”) with the locative

“снtзе” and the nominative plural “снягs” WIC № 1635

Page 18: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Ukr. WIC > Bel. WIC

Langu

age

lexeme part of

speech

declension basis anim geniti

v

change WIC

Ukr. гриб n 2dec hard inanimate а – 1607

Bel. марjз n 2dec hard inanimate а – 1607

Bel. абанемt

нт

n 2dec hard inanimate а т-ц 1615

Bel. пад’tзд n 2dec hard inanimate а д-дз 1627

Bel. снег n 2dec hard inanimate а г-з,

е-я

1635

Page 19: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Ukr. WIC > Bel. WIC• appearance of the prothetic в- in some

grammatical meanings• Ukrainian WIC #1991, neutral nouns with hard

endings = Belarusian WICs #1991: “гаспадfрства” (“economy”)

But also:• “акнj” (“window”) inserted в- in the plural

(акнj – вjкны), WIC #2001• “вjзера” (“lake”) the omission of the same в-

(вjзера – азёры) WIC #2002

Page 20: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Ukr. WIC > Bel. WIC• Ukr. verbal class #490, "огорнeти"

(“embrace”) = Bel. WIC #490, “недацягнeць” (“fail to hold out”)

But also to:• WIC #491, ending vowel change of е-:

абамкнeць, -нe, -нёш, -нt, -нём, -няцt, -нeць (“surround”)

• WIC #494, stem vowel change а-о „абгарнeць, абгарнe, абгjрнеш” (“embrace”)

Page 21: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Ukr. WIC > Bel. WIC

Page 22: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Ukr. WIC < Bel. WIC

• Ukr. WIC as #1628, vocative change к-ч: “чоловsк” (“man”) – voc. “чоловsче” (“o, man!”)

• no vocative in modern Belarusian language• Bel. “чалавtк” (“man”) more general

class of masculine nouns on -к, Ukr. WIC #1788, “мsстык” (mystic”), no vocative change

Page 23: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Ukr. WIC < Bel. WIC

• two alternative forms of accusative plural in Ukrainian for nouns designating animals : nominative plural and genitive plural “пасти коні” and “пасти коней” (“to graze horses”) vs

• one form (coinciding with the genitive plural) for nouns designating people : “зустріти дівчат”, but not “*зустріти дівчата” (“to meet girls”)

• no such differentiation for Belarusian

Page 24: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

language lexeme part of

speech

declension basis vocative

change

anim genitiv WIC

Ukr. мsстик n 2dec hard – person А 1788

Bel. мsстык n 2dec hard animate А 1788

Ukr. зfйчик n 2dec hard – animal А 1789

Bel. зfйчык n 2dec hard animate А 1788

Ukr. чоловsк n 2dec hard к-ч person А 1628

Bel. чалавtк n 2dec hard animate А 1788

Ukr. їжfк n 2dec hard к-ч animal А 1629

Bel. вjжык n 2dec hard animate А 1788

Page 25: TOWARDS THE CREATION  OF  A BELARUSIAN GRAMMATICAL DICTIONARY

Conclusions about GDs• Statistics of usage given by a GD can help us trace more

common patterns of word-inflexion in similar classes of words, which can be useful for recommendations on standardization, considering the current variability of existing forms in both Ukrainian and Belarusian.

• Statistics of WICs can be of use in grammatical homonymy disambiguation.

• GDs can be a powerful tool for comparative studies.• GDs are corpus-driven, so they help us reveal the

information about a language that is not covered in grammars, or is not covered consistently or clearly enough for the users.