26
07/31/12 Pronominal Anaphora Resolution 1 Pronominal Anaphora Resolution in Nepali Language by Dev Bahadur Poudel(03314) Bivod Aale Magar(03307) Nepal Engineering College Changunaryan, Bhaktapur

Pronominal Anaphora resolution

Embed Size (px)

DESCRIPTION

My Undergraduate Thesis

Citation preview

Page 1: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution1

Final Year Project onPronominal Anaphora

Resolution in Nepali Language

by

Dev Bahadur Poudel(03314) Bivod Aale Magar(03307)Nepal Engineering College

Changunaryan, Bhaktapur

Page 2: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution2

Contents

Brief Introduction and Background Approach to Algorithm Implementation in Nepali Discourse Over-view of our system Scope of our system Conclusion

Page 3: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution3

What is Anaphora?

Reference to an entity that has been previously introduced in the discourse.

Page 4: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution4

What is Anaphora Resolution?

Process of determining the antecedent of an anaphor.

Page 5: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution5

रा�म स्कू� ल जा�न्छ । ऊ घरा फकू� न्छ ।

Anaphor resolution in Nepali

AntecedentAnaphor

ऊ =रा�म

Page 6: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution6

Can Machine resolve the anaphora?

Human intelligence can easily find out to which referents the anaphor belongs.

Can we built a system that can resolve the anaphora to the antecendents?

Page 7: Pronominal Anaphora resolution

Corpus

collection of linguistic data, either written texts or a transcription of recorded speech, which can be used as a starting-point of linguistic description or as a means of verifying hypotheses about a language.

07/31/12 Pronominal Anaphora Resolution7

Page 8: Pronominal Anaphora resolution

Unicode

an industry standard allowing computers to represent and manipulate text consistently

consists of about 100,000 characters, a set of code charts for visual reference, an encoding methodology and set of standard character encodings 

Unlike ASCII, which uses 7 bits for each character, Unicode uses 16 bits, which means that it can represent more than 65,000 unique characters.

07/31/12 Pronominal Anaphora Resolution8

Page 9: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution9

Approach to the Algorithm

Non-Probabilistic– Lappin and Leass Algorithm(1994)– A Tree Search Algorithm- Hobbs(1978)

Probabilistic– Centering Algorithm– Mitkov’s weak knowledge algorithm

Page 10: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution10

Approach to the Algorithm

Lappin and Leass Algorithm(1994)Algorithm based on the Sailence

factors given to the noun and pronoun.

Page 11: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution11

Salience factors in Lappin andLeass's Algorithm.

Sentence recency 100

Subject emphasis 80

Existential emphasis 70 Accusative (direct object) emphasis

50 Indirect object and oblique complement

emphasis 40 Non-adverbial emphasis

50 Head noun emphasis 80

Page 12: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution12

Implementation

Can be implemented using different languages

JAVA, PHP Our system uses JAVA

Page 13: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution13

InputTokenizer and

Tagger Salience Factor

Assigner

Output

Block Diagram of the system

Page 14: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution14

Flowchart START

Input Paragraph

Take A sentence

Tokenize

Take token

Check In Corpus

Classify as noun or pronoun

Classify subject/Object

Give Silence value

Calculate total weights

Next sentence ?

Determine correct referents

Half the salience values

Display Results

yes

no

END

Log Error

yes

no

Page 15: Pronominal Anaphora resolution

User Interface

07/31/12 Pronominal Anaphora Resolution15

Page 16: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution16

An Example in Nepali

! = /fd 38L lsGg rfxG5 .

@= xl/n] Tof] k;ndf b]Vof] .

#= p;n] p;nfO{ b]vfof] .

Page 17: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution17

! = /fd 38L lsGg rfxG5 .

Decrease the salient values by factor 2 Decrease the salient values by factor 2 when reading next sentencewhen reading next sentence

Page 18: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution18

@= xl/n] Tof] k;ndf b]Vof] .

xl/ gets (Rec: 100+ Sub: 80+ Non adv: 50+ HN:80 =310)Tof] get 280 (rec:100+ cobj:50+non-adv:50+ HN: 80) Tof] resolved to 38L due to high salience

value of 38L k;n will get (rec:100+non-adv 50+

HN:80)=230

Page 19: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution19

Updated Discourse Model

Divide the previous salience factors by two

Page 20: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution20

p;n] will be resolve to xl/ due to high salience factors. Add Salience factor (recency:100+ subpos: 80+ nonadv:50+HN:80)=310

p;nfO{ can not be xl/ due to syntactic constraints. So,

p;nfO{ will be resolved to /fd . (rec:100+indObj:40+non-adv 50+ HN:80)=270

#= p;n] p;nfO{ b]vfof] .

Updated Discourse ModelUpdated Discourse Model

Page 21: Pronominal Anaphora resolution

Result

07/31/12 Pronominal Anaphora Resolution21

Paragraph

Using

Total Samples

Used

Total Antecedent

s

Total Anaphors

Correctly resolved

Incorrectly Resolved

Zero Anapho

rs

Efficiency

2-sentence 15 37 22 15 7 0 68%

3-sentence 15 50 37 28 9 0 75%

4-sentence 10 35 35 22 11 2 62.8%

5-sentence 10 43 41 25 14 2 60.9%

> 5-sentence 5 28 31 17 11 3 54%

Total 55 193 166 107 52 7 64%

Page 22: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution22

Scope of the Project-Natural language processing-Question answering-Text Summarizing-Information Extraction-Interaction with query interfaces and dialogue interpretation-Natural Language Generation

Page 23: Pronominal Anaphora resolution

Limitations

The lack of tagger and parser limits the system for large corpus and had to go for a hand annotated corpus.

The sentences are limited to the words defined in our corpus

The system is limited to the third person pronouns but not reflexive.

07/31/12 Pronominal Anaphora Resolution23

Page 24: Pronominal Anaphora resolution

Further Works

Morphological analysis can be done The system can be enhanced further work on large

number of sentences. This project can be used with collaboration of other

NLP projects in Nepali language for further research. The statistical methods can be applied to get higher

efficiency.

07/31/12 Pronominal Anaphora Resolution24

Page 25: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution25

Conclusion

Research to see how a basic approach like Lappin and Leass performs for Nepali language.

Applies to non reflexive third person pronouns. Emerging concept in Nepali Language Understanding the discourse - challenging to

computer intelligence Without tagger and parser our system is greatly

dictionary dependent Our work aid to future research in Nepali

language

Page 26: Pronominal Anaphora resolution

07/31/12 Pronominal Anaphora Resolution26

Thank You.