26
Indo WordNet A WordNet for Hindi Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay Debasri Chakrabarti, Dipak Kumar Narayan, Prabhakar Pandey, Madhu Prasad Sharma

Indo WordNet A WordNet for Hindi

  • Upload
    camdyn

  • View
    108

  • Download
    4

Embed Size (px)

DESCRIPTION

Indo WordNet A WordNet for Hindi. Debasri Chakrabarti, Dipak Kumar Narayan, Prabhakar Pandey, Madhu Prasad Sharma. Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay. Introduction. WordNet – A lexical database - PowerPoint PPT Presentation

Citation preview

Page 1: Indo WordNet                A WordNet for Hindi

Indo WordNet A WordNet for Hindi

Centre for Technology Development for Indian Languages

Computer Science and Engineering Department, IIT Bombay

Debasri Chakrabarti, Dipak Kumar Narayan,

Prabhakar Pandey, Madhu Prasad Sharma

Page 2: Indo WordNet                A WordNet for Hindi

Introduction

WordNet – A lexical databaseSearching the dictionary conceptuallyDifferent organizing principle for different syntactic categorySynsets or the Synonymy Sets are the basic building blocksLexical knowledge base is the heart of any intelligent information processing system

Page 3: Indo WordNet                A WordNet for Hindi

WordNet for Hindi

Hindi WordNet is an on-line lexical database for Hindi languageDesign has been inspired by the famous English WordNetUnique features Graded antonyms and meronymy relationships Efficient underlying database design Cross part of speech linkage

Page 4: Indo WordNet                A WordNet for Hindi

Semantic relations in WordNet

SynonymyHypernymy / HyponymyAntonymyMeronymy / HolonymyGradationEntailment Troponymy

Page 5: Indo WordNet                A WordNet for Hindi

Semantic Relations

Synonymy True synonyms are rare Synonymy related to a context

{Gar ‚ kmara}{Gar ‚ Aavaasa}{Gar ‚ janmakuMDlaIya sqaana}

{Gar ‚ svadoSa}

Page 6: Indo WordNet                A WordNet for Hindi

Semantic Relations

Hypernymy and Hyponymy Relation between word meaning (synsets) X is a hyponym of Y if X is a kind of Y Hyponymy is transitive and asymmetrical Hypernymy is inverse of Hyponymy

lionanimalliving entityentity

Saor pSau sajaIva Aist%va

Page 7: Indo WordNet                A WordNet for Hindi

Semantic Relations

Antonymy Oppositeness in meaning Relation between word forms

Meronymy and Holonymy Part-whole relation, branch is a part of tree X is a meronymy of Y if X is a part of Y Meronym is transitive and asymmetrical Holonymy is inverse relation of Meronymy

Page 8: Indo WordNet                A WordNet for Hindi

Troponym and Entailment

Entailment { Kra-Ta laonaa – saaonaa £

Troponym { laÐgaD,anaa ‚ kdmatala krnaa –

calanaa £ ¡ fusafusaanaa – baaolanaa £

Page 9: Indo WordNet                A WordNet for Hindi

Antonymy RelationSize CaoTa – baD,aQuality AcCa – bauraState rat – idnaPersonality rama – ravaNaDirection pUva- – piScamaAction laonaa – donaaAmount kma – jyaadaPlace dUr – pasaTime saubah – SaamaGender baoTa – baoTI

Page 10: Indo WordNet                A WordNet for Hindi

Meronymy RelationComponent-object maaqaa – SarIrStuff-object p%qar – maUit-Member-collection poD, – jaMgalaFeature-Activity BaaYaNa –

samaaraohPlace-Area idllaI – BaartPhase-State javaanaI – ]ma`Resource-process klama – laoKnaPosition-Area icaik%sak – icaik

%saa

Page 11: Indo WordNet                A WordNet for Hindi

GradationState bacapna ‚ javaanaI

‚ bauZ,apaSize baD,a ‚ maÐJalaa

‚ CaoTaLight ]jaalaa

‚ QauÐQalaa ‚ AÐQaora

Gender mad- ‚ napuMsak ‚ AaOrt

Temperature garma ‚ gaunagaunaa ‚ zMDa

Color gaaora ‚ saaÐvalaa ‚ kalaa

Time idna ‚ gaaoQaUila ‚ rat

Quality AcCa ‚ saamaanya ‚ Kraba

Action saaonaa ‚ }ÐGanaa ‚ jaaganaa

Manner tojaI sao ‚ maQyama gait sao ‚ QaIro – QaIro

Page 12: Indo WordNet                A WordNet for Hindi

Classification of verbs

Simple verbs (sarla iËyaa) : saaonaa‚ KanaaConjunct verbs (saMyau@t iËyaa) Compound verbs (samaaisak iËyaa) Á Kanaa–pInaaCausative verbs (p`orNaa%mak iËyaa) Á saulavaanaa

Page 13: Indo WordNet                A WordNet for Hindi

Gloss

AQyana kxa

Hyponymy

Hyponymy

Aavaasa , inavaasa

Sayana kxa

rsaao[-Gar

Gar , gaRh manauYyaaoM ka

Cayaa huAa vah sqaana jaao dIvaaraoM sao Gaor kr banaayaa jaata hO

Aitiqa gaRh

baramada

Aa^Mgana

AaEama

JaaopD,I

saMrcanaa

Meronymy

Hyponymy

Meronymy

Hypernymy

WordNet Sub-Graph

Page 14: Indo WordNet                A WordNet for Hindi

Design and Implementation

Basic relations or lexical links are between synonym setsLexical database is stored in MySQL packageSub-tasks identified Database design Data entry interface Implementation of Organizer Utility Application programs to access and display the

information in the lexical database

Page 15: Indo WordNet                A WordNet for Hindi
Page 16: Indo WordNet                A WordNet for Hindi

Data Entry Interface

GUI designed in Java/JFCSeparate screen for data entry of different categoriesAutomatic generation of synset id’sScreen to view the entered data

Page 17: Indo WordNet                A WordNet for Hindi

Synset Entry Interface

Page 18: Indo WordNet                A WordNet for Hindi
Page 19: Indo WordNet                A WordNet for Hindi

Organizer Utility

Designed to preprocess the dataReflexive pointers are generated e.g. if A hypernym of B then B hyponym of A is

automatically generatedEach semantic relation is mapped to a separate table (normalized)Font conversion Roman Hindi DV-TTYogesh

Page 20: Indo WordNet                A WordNet for Hindi

Storage Structure

Relation between Synsets tblNounHypernyms

Relation between Word-forms tblNounAntonyms

Synset_Id HyperSynset_Id

Synset_Id Synset_Word Anto_Id Anto_Word Anto_Type

Page 21: Indo WordNet                A WordNet for Hindi
Page 22: Indo WordNet                A WordNet for Hindi
Page 23: Indo WordNet                A WordNet for Hindi
Page 24: Indo WordNet                A WordNet for Hindi

System Statistics

Over 8500 synsets entered in the databaseMySQL used as the back-end database serverData entry interface designed in Java/JFCOrganizer utility written in perlWeb based data retrieval system developed in HTML and PHPDV-TTYogesh Font used to display Hindi Text

Page 25: Indo WordNet                A WordNet for Hindi

Application of WordNet

Word Sense DisambiguationInterface to Internet Search EnginesText classificationInformation Retrieval systemDocument Similarity

Page 26: Indo WordNet                A WordNet for Hindi

Conclusion

The structure of Hindi Language have been studied and new features have been introduced in the Hindi WordNetCurrently over 8500 synsets have been inserted into the databaseThe MySQL database has been found to be quite efficientThe web interface for querying the lexical database is under continuous evolution