Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
A Study of Learning a Merge Model for Multilingual Information Retrieval
Presenter : Cheng-Hui Chen Author : Ming-Feng Tsai, Yu-Ting Wang, Hsin-Hsi Chen
SIGIR 2008
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
2
Outlines Motivation Objectives Methodology Experiments Conclusions Comments
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
3
Motivation· Multilingual information retrieval (MLIR) that result
list usually includes more irrelevant words.· Traditional merging methods for MLIR that assumption
relevant documents are homogeneously distributed over monolingual result lists.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Objectives· The various translation and retrieval qualities in
different collections that to merge a unique result list.· To proposes merge method doesn’t assumption
relevant documents are homogeneously distributed over monolingual result lists.
· The enhancement merge model quality.
4
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology· Traditional MLIR Framework.
─ Raw-score─ Round-robin─ Normalized-by-top1─ Normalized-by-top k
· The Proposes a learning method.
─ FRank
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.MLIR merge process
6
Feature Set1. Query levels2. Document levels3. Translation levels
The Construction of a Merge Model1. FRank ranking
algorithm2. BM25
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Feature set· Query levels
─ The manually classify the terms within a query into several pre-defined categories. Location/country names (Loc) Organization names (Org) Event names (EN) Technical terms (TT)
· Document levels─ The extracted document length (Dlength) and title length
(Tlength).
7
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Feature set· Translation levels
─ The size of a bilingual dictionary used for various language (i.e., DictSize).
─ The average number of translation equivalents within a query (i.e., AvgTAD). If a query has two query terms both with three translation
equivalents.· AvgTAD of the query is (3 + 3)/2 = 3.
8
AvgTAD DicSize
(4+2)/2=2 3
中文(Translation
QT)(Order) (Park) 訂單 公園 順序 停車 命令 隊形
中文翻譯數目查詢詞的數目
EN
Loc
EN
斗六
食べる
Order 、 Park
英 ->中Loc
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.The Construction of Merge model· The FRank’s generalized additive model, a merge
model can be represented as :
─ mt(x) is a weak learner─ αt is the learned weight ─ t is the number of selected weak learners
· The combine with a retreval model (bm25) by using linear combination .
9
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Data set
─ The Details of Experimental Collections
─ The Percentage of Retrieved Documents
10
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Mean Average Precision (MAP)
11
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· The Experimental Results of Our Method
using Different Combination Coefficient λ.
12
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Feature Analysis
13
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions· The proposed merge model can significantly
improve merging quality.· The merge model indicates the key factors are
the number of translatable terms and compound words.
14
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions· The future work
─ Use other learning-based ranking algorithms. Such as RankSVM and RankNet.
─ Extract more representative features to construct a merge model. Such as linguistic features.
─ Expect to discover more relations within query terms. Such as query term association and substitution.
15
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
16
Comments· Advantage
─ Improve merging quality.· Drawback· Application
─ Multilingual Information retrieval.