Upload
dwight-simon
View
222
Download
0
Embed Size (px)
Citation preview
1
A Topic Modeling Approach and its Integration into the Random WalkFramework for Academic Search
1Jie Tang, 2Ruoming Jin, and 1Jing Zhang
1Knowledge Engineering Group, Dept. of Computer Science and Technology
Tsinghua University2Department of Computer Science
Kent State UniversityDec. 25th 2008
2
Motivation
However, the results are still not satisfactory …
“Academic search is treated as document search, but ignore
semantics”
3
Examples – Expertise search
Search with keyword
Modeling using VSM Principles of Data Mining.DJ Hand - Drug Safety, 2007 - drugsafety.adisonline.com
Advances in Knowledge Discovery and Data Mining UM Fayyad, G Piatetsky-Shapiro, P Smyth, R…
Data Mining: Concepts and Techniques J Han, M Kamber - 2001…
Return
Search with semantic modeling
Modeling using semantic topics
Data mining
Data mining
Association Rules
Database systems
Data management
Web databases
Information systems
0.4
0.2
0.150.1
0.05
0.02
Topics
Return
ExpertsExpertise
conferences
Expertise papers
Data mining
11
00
1 1 0 1
1 0 1 0 1
0 1
00 1 1 1 1 1
Query
vector
Doc1
vector
Doc3 vector
Doc4 vector
4
1. How to model the heterogeneous academic network?
2. How to capture the link information for ranking objects in the academic network?
Challenges
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
5
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Online System—ArnetMiner.org
6
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish
Previous Work
Search with keyword• Language Model [Zhai, 01], VSM, etc.
Search with semantic topics• LSI [Berry,95], pLSI [Hofmann, 99], LDA
[Blei,03] [Wei, 06], etc.
Ranking• PageRank [Page, 99], HITS [Kleinberg, 99],
PopRank [Nie, 05], Link Fusion [Xi, 04], AuthorRank [Liu, 05], etc.
Combining links and contents• A Joint Probabilistic Model [Cohn and
Hofmann, 01], Topical PageRank [Nie, 06], etc.
7
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Online System—ArnetMiner.org
8
Modeling the Academic Network using
T
DNd
wzxad
β
Φ
α
A
θ
c
T
μ ψ
T
DNd
wzx
ad
β
Φ
α
ACθ
c
T
D
Ndwz
β
Φ
c
η,σ2
ad x
α
A
θ
ACT1 ACT2 ACT3
authors
Topic
words
conference
Author-Conference-Topic Model [Tang et al., 08]
9
Generative Story of ACT1 Model
• Generative process
Shafiei
Milios
1234
NLP
MLDM
IR
1234ML
NLPIR
DM
Latent Dirichlet Co-clustering
Shafiei and Milios
We present a generative model for clustering documents and terms. Our model is a four hierarchical bayesian model. We present efficient inference techniques based on Markow Chain Monte Carlo. We report results in document modeling, document and terms clustering …
ICDM 0.23KDD 0.19….
mining 0.23clustering 0.19classification 0.17….
ICML 0.23NIPS 0.19….
model 0.23learning 0.19boost 0.17….
P(c|z)
P(w|z)
P(c|z)
P(w|z)
clustering
inference
ICDM
Paper
NIPS
10
ACT Model 1
Generative process:
T
DNd
wzxad
β
Φ
α
A
θ
c
T
μ ψ
ACT1
authors
Topic
words
conference
11
Random walk over the academic network
Modeling academic network with topics
Integrating Topic Model into Random Walk
----------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
Cite------------------------------------------------------------------------------------------------------------------------
Cite
Cite
Citewrite
write
write
Co-write
Co-writeCo-author
Co-author
PC memberchair
publish
publish
publish+=?
12
Combination Method 1
ISWC
IJCAI
WWW
Tree CRF...EOS...
Association...
Paper Graph Gp
Author Graph Ge
Prof. WangProf. Tang
Jing Zhang
Conference Graph Gc
λde
λed
λcd
λdc
λdd
Stage 1:Random walk
Stage 2.Topic-based relevance
Ranking score
Topic-based relevance score
Combination by multiplication
ISWC
IJCAI
WWW
Tree CRF...EOS...
Association...
Prof. WangProf. Tang
Jing Zhang
Data mining
Query
. . .
. . .
Topic layer
13
Query: ontology alignment
ISWC
IJCAI
WWW
Tree CRF...EOS...
Association...
posowl
Web service
Paper Graph Gp
Author Graph Ge
Prof. WangProf. Tang
Jing Zhang
Conference Graph Gc
Hidden Theme Graph Gt
λde
λed
λcd
λdc
λtdλdt
λqtλtq
λdd
Combination Method 2
Ranking score
Transition probability
14
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Online System—ArnetMiner.org
15
Experimental Setting
• Arnetminer data: (http://arnetminer.org)– 14,134 authors, 10,716 papers, 1,434 confs/journals– and relationships between them
• Evaluation measures: – pooled relevance + human judgment– P@5, P@10, P@20, R-pre, MAP
• Baselines:– Language Model (LM)– LDA– Author Topic (AT)
16
Discovered Topics
200 topics have been discovered automatically
from the academic network
17
Expertise Search Results
18
Expertise Search Results (cont.)
19
Online System—ArnetMiner(http://arnetminer.org)
Publication
Social Graph
User Interests and Evolution
Basic Profile Information
Social Graph
ExpertsExpertise
conferences
Expertise papers
20
Outline
• Previous Work
• Our Approach– Ranking with Topic Model and Random Walk
• Experimental Results
• Conclusion & Future Work
21
Conclusion & Future Work
• Investigate the problem of modeling heterogeneous academic network using a unified probabilistic model.
• Propose two methods to combine topic models with the random walk framework for academic search.
• Experimental results show that our approach can significantly improve the performance of academic search.
• Our approach is general. Variations of the approach can be applied to many other applications such as social search and blog search.
22
Thanks!
Q&A & DemoHP: http://keg.cs.tsinghua.edu.cn/persons/tj/
Online URL: http://arnetminer.org