NON-TECHNICAL COMPUTER THESAURUSVERSUSSPECIALIZED COMPUTER THESAURUS
Olena Siruk Laboratory for Computational Linguistics Institute of PhilologyNational Taras Schevchenko University of KyivUkraine [email protected]
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
2
1. Topicality of the research
Compilation of general and specialised (terminological) thesauri
Ukrainian lexicography development Users’ requirements in integrated information Development of computer technologies
• Development of formalised principles of thesauri modellingDevelopment of formalised principles of thesauri modelling• Systematisation of termsSystematisation of terms
• Standardisation of definitionsStandardisation of definitions
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
3
Non-technical(Computer Thesaurus of Ukrainian Verbs)approbation on the basis of the semantic field of speech
ТТhe Thesaurushe Thesaurus joinsjoins termstermson the on the conceptual conceptual principleprinciple
Specialized(Specialized Thesaurus of Computer Ideography)
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
4
2. CT units (CT of CI versus CT of UV)
Quantity – 75 terms (it is considered complete) / the semantic field of speech contains about 2000 units
Type – nouns, noun-noun and noun-adjective compounds / verbs
Amount – from 1 to 4 words in a term / LSV Content – from highly specialised terms to
terms related with other linguistic disciplines / verbs of the semantic field of speech
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
5
3. CT of Nouns versus CT of Verbs
It is precisely the noun that holds the garland in ideographical dictionaries of different languages.
The basis for the semantic scheme of nouns is adopted from objective extralinguistic reality.
Verbs are included in the different types of thesauri considerably less often than nouns, and especially seldom in terminological thesauri.
Significative semantics prevails in the meaning of a verb.
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
6
4. CT of Nouns
Consequently, for a noun
1) external, denotative choice of concepts is characteristic;
2) a deductive approach to structuring the material is mostly applied;
3) word-formation and the valency potential of a noun are not very important for the creation of the synoptic scheme;
4) whole–part relations are substantial, taxonomy is prevalent.
It is precisely the noun that holds the garland in ideographical dictionaries of different languages.
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
7
5. CT of Verbs
In light of this for verbs
1) an internal, significative concept selection strategy based on the analysis of meaning is more acceptable;
2) an inductive approach to ordering lexemes is more adequate;
3) relations based on word-formation type (derivation hyponymy) and valency potential (a basis for connections between parts of speech) are essential;
4) taxonomy, whole–part relations are irrelevant.
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
8
6. CT macrostructure
Synoptic scheme represented as a term index
Maximum depth – 6 intervals of hierarchy
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
9
Both types of CT have certain common, analogous and uniting features:
1) both dictionaries represent more or less completely the relations between units;
2) both dictionaries either have an explicit synoptic scheme, that is a division of the universe into thematic classes, or such a scheme is present іmplicitly;
3) the rubric (a class of synonymous words in non-technical thesauri and a descriptor article in specialized thesauri) serves as interpretation, or as context, in both dictionaries;
4) there are cross-references between entries in both dictionaries.
The features of the lexical semantics of verbs condition the difference between an ideographical dictionary of nouns and an analogous dictionary of verbs with respect to the organization of its external structure (macrostructure). Verbs have been categorized primarily on a semantic basis, using the method of component analysis and stepwise identification of verbal meanings.
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
10
Комп’ютерна лексикографія (КЛ) – 0 рівень
Комп’ютерна ідеографія (КІ) – 1 рівень Відношення між одиницями КТ – 2
рівень Комп’ютерний тезаурус (КТ) – 2 рівень Одиниці КТ – 2 рівень Укладання КТ– 2 рівень
База даних КТ – 3 рівень Лінгвістичний процесор – 3 рівень
Лінгвістичний алгоритм – 4 рівень Блок-схема алгоритму – 5 рівень
Макроструктура КТ – 3 рівень Методика укладання КТ – 3 рівень
Дедуктивний метод – 4 рівень Індуктивний метод – 4 рівень Метод компонентного аналізу – 4 рівень Метод ступеневої ідентифікації – 4 рівень
Мікроструктура КТ – 3 рівень
Synoptic scheme of the CT
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
11
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
12
CT fragment (online version)
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
13
Діяльність – 0 рівень Діяльність мовленнєва – 1 рівень
Висловлення думки / почуття (висловлювати) – 2 рівень
Обмін думками (розмовляти) – 2 рівень Особливості вимови (вимовляти) – 2 рівень
* багато, беззмістовно, про неістотне – 3 рівень * басом – 3 рівень * включаючи свої слова в чиєсь мовлення – 3 рівень * грубо – 3 рівень * для записування іншою особою – 3 рівень * довго, захоплюючись розмовою – 3 рівень * дотепно – 3 рівень * дуже голосно – 3 рівень
* * з негативним наслідком – 4 рівень * * з позитивним наслідком – 4 рівень * * один раз – 4 рівень * * раз по раз – 4 рівень * * постійно – 4 рівень […]
* чітко – 3 рівень […] Повідомлення інформації (повідомляти) – 2 рівень
[…] Здатність, спроможність, уміння – 1 рівень […]
Synoptic scheme of speech verbs in the CT
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
14
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
15
7. CT microstructure (CT of CI versus CT of UV)
Title term / Verb Definition – genus-species (for a term)
or close to encyclopaedic (for a concept) / interpretation
Relations – genus-species and synonymic / + manner of action relations and relations between verb and other parts of speech
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
16
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
17
8. Semantic relations in CT
Hierarchical interverbal relations, or hyponymy, derivational hyponymy in particular, represented by hyperonyms, hyponyms, and verbs of manner of action (VMA).
Same-level interverbal relations, i.e., synonymy (represented by complete (absolute) synonyms, in particular, by phonetic variants of verbs, stylistic and derivational synonyms) as well as antonymy (represented by antonyms).
Relations between verb and other parts of speech, based on verbal derivation within parts of speech and valence potential of the verb.
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
18
Example of a CT entry
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
19
Example of a CT entry
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
20
9. Application of CT
As an inquiry system For teaching purposes
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
21
10. Audience
The Specialised Thesaurus of Computer Ideography is intended for:
Specialists in philology Students of philologyThe Computer Thesaurus of Ukrainian Verbs has
a wider audience: thanks to its specification, it can be used as a multi-level information system and as a base for further linguistic research.
Metalanguage and encoding scheme design for digital lexicography, 15-16.04.2009, Bratislava, Slovakia
22
11. How to use CT
Computer program Paper project Computer version – on the linguistic
portal MOVA.info in the dictionary section
Thank you!
Contact information:
Olena Siruk Laboratory for Computational Linguistics National Taras Schevchenko University of [email protected]