Upload
camdyn
View
108
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Indo WordNet A WordNet for Hindi. Debasri Chakrabarti, Dipak Kumar Narayan, Prabhakar Pandey, Madhu Prasad Sharma. Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay. Introduction. WordNet – A lexical database - PowerPoint PPT Presentation
Citation preview
Indo WordNet A WordNet for Hindi
Centre for Technology Development for Indian Languages
Computer Science and Engineering Department, IIT Bombay
Debasri Chakrabarti, Dipak Kumar Narayan,
Prabhakar Pandey, Madhu Prasad Sharma
Introduction
WordNet – A lexical databaseSearching the dictionary conceptuallyDifferent organizing principle for different syntactic categorySynsets or the Synonymy Sets are the basic building blocksLexical knowledge base is the heart of any intelligent information processing system
WordNet for Hindi
Hindi WordNet is an on-line lexical database for Hindi languageDesign has been inspired by the famous English WordNetUnique features Graded antonyms and meronymy relationships Efficient underlying database design Cross part of speech linkage
Semantic relations in WordNet
SynonymyHypernymy / HyponymyAntonymyMeronymy / HolonymyGradationEntailment Troponymy
Semantic Relations
Synonymy True synonyms are rare Synonymy related to a context
{Gar ‚ kmara}{Gar ‚ Aavaasa}{Gar ‚ janmakuMDlaIya sqaana}
{Gar ‚ svadoSa}
Semantic Relations
Hypernymy and Hyponymy Relation between word meaning (synsets) X is a hyponym of Y if X is a kind of Y Hyponymy is transitive and asymmetrical Hypernymy is inverse of Hyponymy
lionanimalliving entityentity
Saor pSau sajaIva Aist%va
Semantic Relations
Antonymy Oppositeness in meaning Relation between word forms
Meronymy and Holonymy Part-whole relation, branch is a part of tree X is a meronymy of Y if X is a part of Y Meronym is transitive and asymmetrical Holonymy is inverse relation of Meronymy
Troponym and Entailment
Entailment { Kra-Ta laonaa – saaonaa £
Troponym { laÐgaD,anaa ‚ kdmatala krnaa –
calanaa £ ¡ fusafusaanaa – baaolanaa £
Antonymy RelationSize CaoTa – baD,aQuality AcCa – bauraState rat – idnaPersonality rama – ravaNaDirection pUva- – piScamaAction laonaa – donaaAmount kma – jyaadaPlace dUr – pasaTime saubah – SaamaGender baoTa – baoTI
Meronymy RelationComponent-object maaqaa – SarIrStuff-object p%qar – maUit-Member-collection poD, – jaMgalaFeature-Activity BaaYaNa –
samaaraohPlace-Area idllaI – BaartPhase-State javaanaI – ]ma`Resource-process klama – laoKnaPosition-Area icaik%sak – icaik
%saa
GradationState bacapna ‚ javaanaI
‚ bauZ,apaSize baD,a ‚ maÐJalaa
‚ CaoTaLight ]jaalaa
‚ QauÐQalaa ‚ AÐQaora
Gender mad- ‚ napuMsak ‚ AaOrt
Temperature garma ‚ gaunagaunaa ‚ zMDa
Color gaaora ‚ saaÐvalaa ‚ kalaa
Time idna ‚ gaaoQaUila ‚ rat
Quality AcCa ‚ saamaanya ‚ Kraba
Action saaonaa ‚ }ÐGanaa ‚ jaaganaa
Manner tojaI sao ‚ maQyama gait sao ‚ QaIro – QaIro
Classification of verbs
Simple verbs (sarla iËyaa) : saaonaa‚ KanaaConjunct verbs (saMyau@t iËyaa) Compound verbs (samaaisak iËyaa) Á Kanaa–pInaaCausative verbs (p`orNaa%mak iËyaa) Á saulavaanaa
Gloss
AQyana kxa
Hyponymy
Hyponymy
Aavaasa , inavaasa
Sayana kxa
rsaao[-Gar
Gar , gaRh manauYyaaoM ka
Cayaa huAa vah sqaana jaao dIvaaraoM sao Gaor kr banaayaa jaata hO
Aitiqa gaRh
baramada
Aa^Mgana
AaEama
JaaopD,I
saMrcanaa
Meronymy
Hyponymy
Meronymy
Hypernymy
WordNet Sub-Graph
Design and Implementation
Basic relations or lexical links are between synonym setsLexical database is stored in MySQL packageSub-tasks identified Database design Data entry interface Implementation of Organizer Utility Application programs to access and display the
information in the lexical database
Data Entry Interface
GUI designed in Java/JFCSeparate screen for data entry of different categoriesAutomatic generation of synset id’sScreen to view the entered data
Synset Entry Interface
Organizer Utility
Designed to preprocess the dataReflexive pointers are generated e.g. if A hypernym of B then B hyponym of A is
automatically generatedEach semantic relation is mapped to a separate table (normalized)Font conversion Roman Hindi DV-TTYogesh
Storage Structure
Relation between Synsets tblNounHypernyms
Relation between Word-forms tblNounAntonyms
Synset_Id HyperSynset_Id
Synset_Id Synset_Word Anto_Id Anto_Word Anto_Type
System Statistics
Over 8500 synsets entered in the databaseMySQL used as the back-end database serverData entry interface designed in Java/JFCOrganizer utility written in perlWeb based data retrieval system developed in HTML and PHPDV-TTYogesh Font used to display Hindi Text
Application of WordNet
Word Sense DisambiguationInterface to Internet Search EnginesText classificationInformation Retrieval systemDocument Similarity
Conclusion
The structure of Hindi Language have been studied and new features have been introduced in the Hindi WordNetCurrently over 8500 synsets have been inserted into the databaseThe MySQL database has been found to be quite efficientThe web interface for querying the lexical database is under continuous evolution