27
Networking the Asian WordNet on WordNet Management System (WNMS) Virach Sornlertlamvanich National Electronics and Computer Technology Center (NECTEC), Thailand, and Thai Computational Linguistics Laboratory (TCL), NICT, Thailand [email protected] The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan August 2-4, 2010

Networking the Asian WordNet on WordNet … Discussion Addition Correction Voting Translation WN merged-WN X-English X-English X-English Thai-English X-English X-English X-English

  • Upload
    phungtu

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Networking the Asian WordNet on

WordNet Management System (WNMS)

Virach SornlertlamvanichNational Electronics and Computer Technology Center (NECTEC), Thailand, andThai Computational Linguistics Laboratory (TCL), NICT, [email protected]

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Motivation

Need of a computational ontology Implementation

Quick start approach Reusability

Less language resource

Online collaborative environment Social networking

Multilingual development

Cross language web service Sharing

Interoperability

Evaluation

DisseminationThe 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Approaches

Asian WordNet Development

Translation approach

Use of the existing bilingual dictionaries

Synset assignment

KUI for collaborative editing

WNMS (WordNet Management System)

Distributed WordNet service

Service for cross language WordNet retrieval

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Synset Assignment (CS=4)

Example:L0: เป้ าหมาย

E0: aimE1: target

S0: purpose, intent, intention, aim, designS1: aim, object, objective, targetS2: aim

Accept the Synset that includes more than one English Equivalent with confidence score of 4.

L0

E0

S0

S1

E1

S2

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Synset Assignment (CS=3)

Example:L0: จ้ องL1: เพ้ งมอง

E0: stareE1: gaze

S0: stareS1: gaze, stare

Synonym

Accept the Synset that includes more than one English Equivalent from the synonym of the target language with confidence score of 3.

L0 E0

S0

S1

E1

S2

L1

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Synset Assignment (CS=2)

Example:L0: สูติแพทย้

E0: obstetrician

S0: obstetrician, accoucheur

Accept the only Synset that includes the English Equivalent with confidence score of 2.

L0 E0 S0

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Synset Assignment (CS=1)

Example:L0: ช้ อง

E0: holeE1: canal

S0: hole, hollow S1: hole, trap, cakehole, maw, yap, gapS2: canal, duct, epithelial duct, channel

Accept more than one Synset that includes each of the English Equivalent with confidence score of 1. L0

E0

S0

S1

E1

S2

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

Asian WordNet Development

GWN

AWN

ApplicationsDictionaryOntologyCL-SearchMTSummarizationIE/IR….

KUI

Lookup

Discussion

Addition

Correction

Voting

Translation

WN merged-WN

X-English

X-English

X-English

Thai-English

X-English

X-English

X-English

Indonesian-English

August 2-4, 2010

KUI for AWN

KUI (Knowledge Unifying Editor)

In the initial stage, KUI was developed for collaborative editing to review and complete the translation.

Advantage

Suitable for building a community

Disadvantages

Translation is for word translation rather than sense translation

Cannot show the relation between senses

System is not fully distributed

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

WNMS for AWN

WNMS (WordNet Management System)

Sense based translation rather word based translation

Show the relation between senses

System is fully distributed connected through a standard Open API

Collaborative editing tools based on KUI concept

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

WMS for BalkaNet, GWC2004

WordNet Exploitation through a Distributed Network of Servers, I. D. Koutsoubos, and et. al.

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

WNMS (WordNet Management System)

August 2-4, 2010

Participation (Translate)

Input a word to search

Input a translated word, and select degree of confidence

Input comment or memo if have

Delete

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

1

3

2

Participation (Vote)

Read the comment or memo

Votevote up vote down

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

2

1

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

Distributed WordNet Service

Distribute the WordNet service node

Service node can be locally maintained

Synset ID (or Synset Offset) is the key to link between nodes

August 2-4, 2010

Representation of Synset Translation

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Types of Services ‘sense’

Thai Sense (Get word translation by POS and SYNSET_OFFSET)Service URI : http://th.asianwordnet.org/services/sense/output/[callback]/pos/synset_offsetService Name : senseParameter :

pos = PartOfSpeech {n,v,r,s}, synset_offset is an English Princeton WordNet v.3.0 offset, represented in 8 digits

http://th.asianwordnet.org/services/sense/xml/n/02958343

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Types of Services ‘dictionary’

E-Dictionary (Get word translation by word entry)Service URI :

http://th.asianwordnet.org/services/dictionary/output/[callback]/type_of_dict/search_wordService Name : dictionaryParameter : type_of_dict = {en2th, th2en}, search_word is a word you want to search

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Types of Services

Auto complete (Get a list of words existing in WordNet by prefix auto completion)Service URI :

http://th.asianwordnet.org/services/autocomplete/output/[callback]/language/search_wordService Name : autocompleteParameter : language = {en,th}, search_word is a word you want to get autocomplete (Result:limit 50 records found)

WN-Browser (Browse WordNet and its semantic relations)Service URI :

http://th.asianwordnet.org/services/browse/output/[callback]/language/search_wordService Name : browseParameter : language = {en,th}, search_word is a word you want to get all semantic relations

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Visualization of AWN(http://www.asianwordnet.org/)

Asian WordNet Visualization of Asian

WordNet

Function Cross language visualization

3 modes of visualization

Progress (# of word) Thai 80098

Lao 72672

Japanese 66648

Korean 65483

Myanmar 26033

Indonesian 21584

Vietnamese 17767

Mongolian 2283

Bengali 1775

Sinhala 117

Collaboration TCL

ADD members

English->Japanese

Thai->English

Thai->Indonesian

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Guideline in WordNet Translation

Word entry must be translated into the appropriate WORD(s) by avoiding phrase and meaning explanation.

Words in a Synset must be interchangeable.

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Translational Issues

There are many cases that a gloss need to be expressed in a phrase or explanation, especially in the case of technical terms and scientific vocabulary.

Ex. ChaperonPOS NounSynset chaperon, chaperoneGloss one who accompanies and

supervises a young woman or gatherings of young people

Thai ผู้ตามควบคุมหญิงสาว

These concepts are not general for Thai language

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

Translational Issues (cont.)

A gloss can be expressed by two or more Thai words. These words have the core meaning but occur in different context. Should it be divided into more specific concept?Ex. Appear

POS Verb

Synset appear, come outGloss be issued or published; "Did your latest book

appear yet?"; "The new Woody Allen filmhasn’t come out yet”

Thai T1 = ตีพิมพ้; T2 = ออกฉาย

T1 occurs in the context of printed matter

T2 occurs in the context of film or movie

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010

AWN: A Platform for Collaboration

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

http://www.asianwordnet.org

Current

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

AWN Partnership

Hindi

IITB (Indian Institute of Technology, Bombay), India

Pushpak Bhattacharya <[email protected]>

Indonesian

BPPT (BADAN PENGKAJIAN DAN PENERAPAN TEKNOLOGI), Indonesia

Hammam Riza <[email protected]>

Japanese

NICT (National Institute of Information and Communications Technology), Japan

Hitoshi Isahara <[email protected]>, Kou Kuroda <[email protected]>

Nanyang Technological University (NTU), Singapore

Francis Bond <[email protected]>

Lao

NAST (National Authority of Science and Technology), Lao PDR

Valaxay Dalaloy <[email protected]>

Mongolian

NUM (National University of Mongolia), Mongolia

Purev Jaimai <[email protected]>

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

AWN Partnership

Burmese

MCF (Myanmar Computer Federation), Myanmar

Myint Myint Than <[email protected]>

Nepali

MPP (Madan Puraskar Pustakalaya), Nepal

Laxmi Pd Khatiwada <[email protected]>

Sinhala

UCSC (University of Colombo School of Computing), Sri Lanka

Ruvan Weerasinghe <[email protected]>

Thai

NECTEC (National Electronics and Computer Technology Center), Thailand

TCL (Thai Computational Linguistics Laboratory), Thailand

Virach Sornlertlamvanich <[email protected]>

Vietnamese

VAST (Vietnamese Academy of Science and Technology), Vietnam

Luong Chi Mai <[email protected]>

August 2-4, 2010The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

Conclusion and Future Work

Asian WordNet Community

Language resource conversion and alignment

Language technology sharing

Collaborative development platform

AWN and language technology web service

Applications on digital heritage understanding etc.

AsianWordnethttp://www.asianwordnet.org/

Join us!

The 2nd International Workshop with Mentors on Databases, Web and Information Management for Young Researchers, Aoyama Gakuin University, Tokyo, Japan

August 2-4, 2010