Upload
runwei-qiang
View
206
Download
12
Embed Size (px)
Citation preview
1
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Exploiting Ranking Factorization Machines for Microblog Retrieval
北京大学计算机科学技术研究所Institute of Computer Science & Technology Peking University
Runwei Qiang Feng Liang Jianwu Yang
Institute of Computer Science and Technology
Peking University
CIKM 2013
Problem Definition
Exploiting Ranking Factorization Machines for Microblog Retrieval2
Tweet Collection
(Q1 , t1)
(Q2 , t2)
…
(Qn , tn)
Q1 Q2 … Qn
timestamp
Not Available !!
Q1 Q2 … Qn
relevance
Real-time Search
At time t, find tweets
about topic X.
—— TREC’2011
ranking
Motivations
Exploiting Ranking Factorization Machines for Microblog Retrieval3
IR for microblog is a non-trivial problem
Length of document is very short
severe vocabulary-mismatch problem, how to apply query
expansion technique?
Abundance of shortened URLs
offer ways to expand document, but how to make use of it?
Large quantities of pointless babble
How to use the tweet quality to filter non-informative message?
Motivations
Exploiting Ranking Factorization Machines for Microblog Retrieval4
Learning to rank methods can make full use of different
models or factors in microblog retrieval
different factors => different features
Many features have been proved useful
Semantic features between query and document
Tweet quality features, i.e. link, retweet, and mention
count/binary
Limitations
Exploiting Ranking Factorization Machines for Microblog Retrieval5
Features are considered independent
Some features are closely related to each other.
RT and @ symbols occur in the same tweet frequently.
Feature utilization
Link feature: binary => semantic information
Small plane crashes at big airport; no one notices- CNN.com
Proposal
Exploiting Ranking Factorization Machines for Microblog Retrieval6
Employ an Ranking FM Framework
Adopts FM as the ranking function to model interactions
between features
Utilize several effective features which are neglected in
existing work
Optimize Ranking FM by two optimization methods
Stochastic Gradient Descent
Adaptive Regularization
Outline
Exploiting Ranking Factorization Machines for Microblog Retrieval7
Ranking FM for Microblog Retrieval
Ranking FM Framework
Optimization Methods
Feature Description
Experiments
Summary
Ranking FM Framework
Exploiting Ranking Factorization Machines for Microblog Retrieval8
Pairwise approach
1
, ,1
p q
p q
q p
y yx x z
y y
, , , p p q qx y x y
( ) ( ) ( ) 2
1
min ( ) ; , ,l
t t t
t p q
t
L l f x x z
Loss function
Hinge Loss
Function
Regularization
term FM ranking
function
Factorization Machines Model
Exploiting Ranking Factorization Machines for Microblog Retrieval9
0
1 1 1
ˆ( ) ,n n n
i i i j i j
i i j i
y x w w x v v x x
2
2 2
0 , ,
1 1 1 1
1ˆ( )
2
n k n n
i i i f i i f i
i f i i
y x w w x v x v x
nested
interations
factorized
parameters
factorization dimensionality
𝑂(𝑘 ∙ 𝑛)
, ,
1
, ·k
ii j f j f
f
v v v v
Learn Ranking FM
Exploiting Ranking Factorization Machines for Microblog Retrieval10
Stochastic Gradient Descent
Grid search on validation set for find the best λ
Adaptive Regularization [2]
( 1) ( ) ( ) ( ) 2
,
ˆ| : arg min (x | ), yT
t t t t
x y S
l y
( 1) ( 1) ( 1) ( ) 2
,
ˆ| : arg min (x | ), yV
t t t t
x y S
l y
Training set
Validation Set
adapt the
regularization
automatically
time-
consuming
Feature Description
Exploiting Ranking Factorization Machines for Microblog Retrieval11
Content Relevance Features (3)
Query & Tweet
BM25、TFIDF、Language Model Score
Semantic Expansion Features (3x3=9)
Query & topic info;
Expanded query & Tweet;
Expanded query & Topic info
BM25、TFIDF、Language Model Score
Quality Features (5)
mention、retweet、hashtag、link binary feature
tweet length
Experimental Setup
Exploiting Ranking Factorization Machines for Microblog Retrieval12
Dataset
TREC Tweet11 Corpus
about 2 weeks twitter data
TopicInfo Corpus
title field of link pages
TREC’11 50 queries
TREC’12 60 queries
Evaluation Metrics
P@30 & MAP
HTTP Code Status # of tweets
200 OK 8,084,724
302 Found 815,794
403 Forbidden 817,273
404 Not Found 868,667
Null Null 67,011
Searchable 8,900,518
Summary statistics of Tweet11 Corpus
HTTP Code Status # of tweets
200 OK 1,225,947
302 Found 688
403 Forbidden 5,050
404 Not Found 92,378
Null Null 265,468
Searchable 1,226,635
Summary statistics of TopicInfo Corpus
Baselines
Exploiting Ranking Factorization Machines for Microblog Retrieval13
KL2SFBLoc [3]
Expanded language model with two-stage query expansion
Perform very well in TREC’11 real time search task
hitURLrun3 [4]
Use a logistic regression model to learn a pairwise ranking for
microblog retrieval
Best Performing system in TREC’12 real time search task
RSVM_Full
Ranking SVM with linear kernel
Same feature set the Ranking FM used
Ranking FM Performance
Exploiting Ranking Factorization Machines for Microblog Retrieval14
Ranking FM
Metric KL2SFBLoc RSVM_Full hitURLrun3 RFM_FullSGD RFM_FullAR
P@30 0.2441 0.2616 0.2701 0.2808 0.2746
MAP 0.2506 0.2597 0.2642 0.2694 0.2678
4% improve on P@30
TREC’12
Best
7% improve on P@30
Feature Study
Exploiting Ranking Factorization Machines for Microblog Retrieval15
0 5 10 15 20 25 30
0.2
0.25
0.3
0.35
0.4
0.45
0.5
N
P@
N
Full
-Quality
-Document Expansion
-Query Expansion
-Content Relevance
Only Content Relevance
Ranking FM of k=3 optimized by SGD
Influence of the hyper-parameter k
Exploiting Ranking Factorization Machines for Microblog Retrieval16
0 5 10 150.255
0.26
0.265
0.27
0.275
0.28
0.285
0.29
k
P@
30
RFM_FullSGD
0 5 10 150.245
0.25
0.255
0.26
0.265
0.27
0.275
kM
AP
RFM_FullSGD
Ranking FM optimized by SGD
Stochastic gradient descent v.s.
Adaptive regularization
Exploiting Ranking Factorization Machines for Microblog Retrieval17
0 5 10 150
0.5
1
1.5
2
2.5
3x 10
4
k
Tra
inin
g tim
e (
s)
Stochastic Gradient Descent
Adaptive Regularization
Method P@5 P@10 P@30 MAP
RFM_FullSGD 0.4068 0.3695 0.2808 0.2694
RFM_FullAR 0.4034 0.3678 0.2746 0.2678
Summary
Exploiting Ranking Factorization Machines for Microblog Retrieval18
Ranking FM Framework
Pairwise approach
Use Factorization Machines as ranking function
Two optimization methods
Stochastic Gradient Descent
Adaptive Regularization
Three groups of features
Content Relevance Features
Semantic Expansion Features
Quality Features
References
Exploiting Ranking Factorization Machines for Microblog Retrieval19
[1] Iadh Ounis, Jimmy Lin, and Ian Soboroff. Overview of the TREC-
2011 MicroblogTrack. In Proceedings of TREC 2011, 2012.
[2] S. Rendle. Learning recommender systems with adaptive
regularization. In Proceedings of the fifth ACM international conference
on Web search and data mining, WSDM ’12, pages 133–142. ACM,
2012.
[3] F. Liang, R. Qiang, and J. Yang. Exploiting real-time information
retrieval in the microblogosphere. JCDL ’12, pages 267–276. ACM,
2012.
[4] Z. Han, X. Li, M. Yang, H. Qi, S. Li, and T. Zhao. Hit at TREC 2012
Microblog Track. In Proceedings of TREC 2012, 2013.
20
Exploiting Ranking Factorization
Machines for Microblog Retrieval
Exploiting Ranking Factorization Machines for Microblog Retrieval
北京大学计算机科学技术研究所Institute of Computer Science & Technology Peking University
Runwei Qiang Feng Liang Jianwu Yang
Institute of Computer Science and Technology
Peking University
CIKM 2013