Upload
maree
View
22
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Diversified Ranking on Large Graphs: An Optimization Viewpoint. Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru. KDD 2011, August 21-24, San Diego, CA. Background: Why Diversity?. A1: Uncertainty & Ambiguity in an Information Need. Case 1: Uncertainty from the query. - PowerPoint PPT Presentation
Citation preview
© 2010 IBM Corporation
Diversified Ranking on Large Graphs: An Optimization Viewpoint
Hanghang Tong, Jingrui He, Zhen Wen, Ching-Yung Lin, Ravi Konuru
KDD 2011, August 21-24, San Diego, CA
2
Background: Why Diversity?
A1: Uncertainty & Ambiguity in an Information Need
Case 1: Uncertainty from the query
Case 2: Uncertainty from the user
3
Background: Why Diversity? (cont.)
A2: Uncertainty & ambiguity of an information need –C1: Product search want different reviews–C2: Political issue debate desire different opinions–C3: Legal search get an overview of a topic–C4: Team assembling find a set of relevant & diversified experts
A3: Become a better and safer employee–Better: A 1% increase in diversity an additional $886 of monthly
revenue–Safer: A 1% increase in diversity an increase of 11.8% in job
retention
4
Problem Definitions & Challenges
Problem 1 (Evaluate/measure a given top-k ranking list)– Given: A large graph A, the query vector p, the damping factor c, and a
subset of k nodes S; – Measure: the goodness of the subset of nodes S by a single number in
terms of (a) the relevance of each node in S wrt the query vector p, and (b) the diversity among all the nodes in the subset S.
Problem 2 (Find a near optimal top-k ranking list)– Given: A large graph A, the query vector p, the damping factor c, and
the budget k;– Find: A subset of k nodes S that maximizes the goodness measure f(S).
Challenges– (for Prob. 1) No existing measure encoding both relevance and diversity– (for Prob. 2) Sub-set level optimization
4
5
Our Solutions (10 seconds introduction!)
Problem 1 (Evaluate/measure a given top-k ranking list) A1: A weighted sum between relevance and similarity
Problem 2 (Find a near optimal top-k ranking list) A2: A greedy algorithm (near-optimal, linear scalability)
5
weight diversityrelevance
6
Measure Relevance (r) by RWR (a.k.a. Personalized PageRank)
Details
1
43
2
5 6
7
9 10
811
12
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 0
0.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0
0.13
0.22
0.13
0.050.9
0.05
0.08
0.04
0.03
0.04
0.02
0
1/3 1/3 0 1/3 0 0 0 0 0 0 0 0
1/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0 0 0 1/3 0 1/2 1/2 1/4 0 0 0 0
0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0
0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 0
0 0 0 0 0 0 0 1/4 0 1/3 0 0
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2
0 0 0 0 0 0 0 0 0 1/3 1/3 0
0.13 0
0.10 0
0.13 0
0.22
0.13 0
0.05 00.1
0.05 0
0.08 0
0.04 0
0.03 0
0.04 0
2 0
1
0.0
n x n n x 1n x 1
Ranking vector Starting vectorAdjacency matrix
1
Restart p
r = c A r + (1-c) e
7
r = c A r + (1-c) e = [c A + (1-c) e 1’ ] r = B r
Diversity ~ reverse of weighted similarity on the personalized graph
Details
B: Personalized Graph(a.k.a ‘Google-Matrix’)
1
43
2
5 6
7
9 10
811
12
B(i,j): How node i and node j are connected in the personalized graph
1
43
2
5 6
7
9 10
811
12
g(S) = w∑r(i) - ∑B(i,j)r(j)i in S i,j in S
8
Properties of g(S): Why is it a Good Measure?
P1: g(S)=0 for an empty set SP2: g(S) is sub-modular for any w>0P3: g(S) is monotonically non-decreasing for any w>=2
A greedy algorithm (Dragon) leads to near-opt. solution– Quality: g(S) >= (1−1/e)g(S*), where S* is the optimal subset maximizing g(S)
– Complexity: O(m) for both time and space
For any w>=2
Details
Footnote: Dragon stands for Diversified Ranking on Graph: An Optimization Viewpoint
9
Experimental Results
9
Quality-Time Balance Scalability
An Illustrative Example Compare w/ alternative choices
Quality
Budget
Budget
TimeTime
Opt. Quality
10
Conclusion
Problem 1 (Evaluate/measure a given top-k ranking list) A1: A weighted sum between relevance and similarity
Problem 2 (Find a near optimal top-k ranking list) A2: A greedy algorithm (near-optimal, linear scalability)
Contact: Hanghang Tong ([email protected])
11
Academic Literature: More Detailed Comparison
[6]
[7]
This Disclosure Proposes (1) The first measure that combines both relevance & diversity (2) The first method that (a) leads to near-optimal solution with (b) linear complexity
For Problem 1 For Problem 2