22
The Effect of Translation Quality in MT-Based Cross- Language Information Retrieval Jiang Zhu Haifeng Wang Toshiba (Chinese) Research and Dev elopment Center (ACL 2006)

The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Embed Size (px)

DESCRIPTION

The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval. Jiang Zhu Haifeng Wang Toshiba (Chinese) Research and Development Center (ACL 2006). Introduction. In CLIR query translation ,we can use….. Machine Translation (MT) based methods (MT systems) - PowerPoint PPT Presentation

Citation preview

Page 1: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval Jiang Zhu Haifeng Wang

Toshiba (Chinese) Research and Development Center(ACL 2006)

Page 2: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Introduction

In CLIR query translation ,we can use….. Machine Translation (MT) based methods

(MT systems) dictionary-based methods

(bilingual wordlists ) corpus-based methods.

(parallel or comparable corpora)

Page 3: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Why….

Previous research mainly focused on studying the effectiveness of bilingual wordlists or parallel corpora from two aspects: size and lexical coverage.

Why lack research on analyzing effect

of translation quality of MT system on CLIR

performance?

Page 4: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

So, In this paper….

they explore the relationship between the translation quality of the MT system and the retrieval effectiveness.

But the problem is

How to control the translation quality of the MT system!

Page 5: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Experiment Method

Translation resources comprised in this system is a rule-based ECMT which include a large dictionary and a rule base.

We degrade the MT system in two ways. One is to degrade the rule base of the system by

progressively removing rules from it. The other is to degrade the dictionary by gradually

removing word entries from it.

Page 6: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

degrade the rule base

to decrease the size of the rule base by randomly removing rules from the rule base.

Total 27,000 rules in the rule base are randomly divided into 36 segments, each of which contains 750 rules.

Page 7: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

degrade the dictionary

decrease the size of the dictionary by randomly removing a certain number of word entries from the dictionary iteratively.

randomly select 43,200 word entries for degradation. Then equally split these word entries into 36 segments.

(The dictionary contains 169000 word entries. And function words are not removed from the dictionary.)

Page 8: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

MT System

degradation

MT1’

MT2’

MT3’

….

MT36’

Query q

q1’

q2’

q3’

q36’

IR

MAP1’

MAP2’

MAP3’

MAP36’

Use NIST as evaluation measure ,to measure the performance of the MT system by translation quality

System

Page 9: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

MT System

degradation

MT1’

MT2’

MT3’

….

MT36’

Query q

q1’

q2’

q3’

q36’

IR

MAP1’

MAP2’

MAP3’

MAP36’

Use NIST as evaluation measure ,to measure the performance of the MT system by translation quality

System

1

Page 10: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Experiment Data:

The experiments are conducted on the TREC5&6 Chinese collection. (It includes totally 164,789 doc

uments. The topic set contains 54 topics.)

Topic format as figure1 In the experiments, we use tw

o kinds of queries: Title queries (use only the title

field) desc queries (use only the des

cription field).

Figure 1

Page 11: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

MT System

degradation

MT1’

MT2’

MT3’

….

MT36’

q1’

q2’

q3’

q36’

IR

MAP1’

MAP2’

MAP3’

MAP36’

Use NIST as evaluation measure ,to measure the performance of the MT system by translation quality

Query q

2

Page 12: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Evaluation of translation quality Title query

introduce two reference translations for each title query

Chinese title(C-title) in title field of the original TREC topic

Translation of the title query given by a human translator

Page 13: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Evaluation of translation quality Desc query

use only one reference for each desc query

Chinese description (C-desc) in the description field of the original TREC topic

Page 14: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

MT System

degradation

MT1’

MT2’

MT3’

….

MT36’

q1’

q2’

q3’

q36’

IR

MAP1’

MAP2’

MAP3’

MAP36’

Use NIST as evaluation measure ,to measure the performance of the MT system by translation quality

Query q3

Page 15: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Runs rule-title: MT-based title query translation

      with degraded rule base rule-desc: MT-based desc query translation

        with degraded rule base dic-title: MT-based title query translation

      with degraded dictionary dic-desc: MT-based desc query translation

      with degraded dictionary

Page 16: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Results baseline comparison, we conduct Chinese monolingu

al runs with title queries and desc queries

title-cn1: use reference translation 1 of each title query as Chinese query

title-cn2: use reference translation 2 of each title query as Chinese query

desc-cn: use reference translation of each desc query as Chinese query

Run MAP

title-cn1 0.3143

title-cn2 0.3001

desc-cn 0.3514

Page 17: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Results on Rule-Based degradation

Page 18: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Results on Dictionary-Based Degradation

Page 19: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Summary of the ResultsRun NIST Score MAP

title - queries

No degradation 7.3548 0.3126

Complete: rule-title 5.9155 0.2810

Complete: dic-title 6.0067 0.1894

Desc – queries

No degradation 5.0297 0.2877

Complete: rule-desc 4.8497 0.2759

Complete: dic-desc 4.4879 0.2471

Page 20: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Discussion analyze the correlations between NIST scores and

MAP

there is a strong correlation between translation quality and retrieval effectiveness

better performance on MT will lead to a better performance on retrieval.

Run Correlation

rule-title 0.9728

rule-desc 0.9500

dic-title 0.9521

dic-desc 0.9582

Page 21: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Discussion Impacts of Query Format

Comparison on the results shows that the MT system performs better on title queries than on desc queries.

Impacts of Rules and Dictionary

indicates that retrieval effectiveness is more sensitive to the size of the dictionary than to the size of the rule base for title queries.

This is because retrieval systems are typically more tolerant of syntactic than semantic translation errors (Fluhr , 1997)

Page 22: The Effect of Translation Quality in MT-Based Cross-Language Information Retrieval

Conclusion the performance of MT system and IR system

correlates highly with each other two main factors in MT-based CLIR

One factor is the query format the translation resources comprised in the MT

system Therefore in order to improve the retrieval

effectiveness of a MT-based CLIR application, it is more effective to develop a larger dictionary than to develop more rules