Coreference recognition in arabic

Preview:

DESCRIPTION

 

Citation preview

Coreference Recognition in

Arabic

Alshunabier,AtheerAldakheel,Bushra

Supervisor :Alsaif,Amal

Overview

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing.For example,

"Mary said she would help me"

Introduction

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

The Arabic language differs greatly from the

English language and other Germanic and Latin-based languages. There are certain grammatical differences you must know before you begin to understand the language.

Arabic is a synthetic language as opposed to an analytical one.

Arabic language trait is to leave out short vowels. difficult to read Arabic unless you have a vast knowledge of the written language.

Characteristics of the Arabic Language

Use of multiple consonants attached to a root verb to create a different meaning..

In English : In Arabic :

“He wrote“ "Aktaba.“ “He dictated"

Tri-Consonantal Root Verb

To determine the person who performs an action.

In English : In Arabic :

"drink" "sh-r-b"

"drinker" "sharib"

Active Participle

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

Related work

Kehler (1997) used maximum entropy modeling to assign aprobability distribution to alternative sets of coreference relationships among noun phrase entity templates, whereas we used decision tree learning.

Ge, Hale, and Charniak (1998) used a statistical model for resolving pronouns, where as we used a decision tree learning algorithm and resolved general noun phrases, not just pronouns.

the work of Cardie and Wagstaff (1999) also falls under the machine learning approach.

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

MT-based stemmer

Partition the Arabic words into clusters based on the English translations of the Arabic words. The Arabic words whose English translations, after removing English stopwords.

MT-based stemmer

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

Coreference Resolution

Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world.

Anaphora

Anaphora is a linguistic relation between two textual entities .

Pronominal anaphora in Quran

Why is Arabic Information Extraction difficult?

The Arabic alphabet consists of 28 letters that can be extended to 90 by additional shapes, marks, and vowels.

There are two genders (masculine and feminine), three numbers (singular, dual, and plural), and three grammatical cases (nominative, genitive, and accusative)

Annotated Corpora

The number of corpora annotated both anaphorically and coreferentially have increased.

For English, there are some resources such as the Lancaster Anaphoric Treebank and other .

For Arabic , no expansion in the field of anaphorical or coreferential corpus annotation .

Annotating Tools

The annotation task of anaphoric or coreferential relations require a considerable effort from the human annotator.

such as: .Callisto3,MMAX2 and PALinkA These tools are written in Java

Introduction Characteristics of the Arabic Language Methodology Discussion Conclusion

Agenda

Conclusion

Recommended