Upload
doreen-parks
View
220
Download
1
Tags:
Embed Size (px)
Citation preview
Brief History of Web Search
•Boolean term matching
•Sergey Brin and Larry Page
•Reputation based ranking
•PageRank
Reputation
•Count links to a page
•Weight links by how many come from a page
•Further weight links by the reputation of the linker
Definitions
•Markov chain
•The conditional probability of each future state depends only on the present state
•Markov matrix
•Transition matrix of a Markov chain
Markov Matrix Properties
•Row-stochastic
•Stationary vector gives long-term probability of each state
•All eigenvalues λ ≤ 1
S may or may not be reducible,so we make one more fix:
The Google Matrix:
Now G is a positive, irreducible, row-stochasticmatrix, and the power method will converge,
but we’ve lost sparsity.
A Linear System Formulation
•Amy Langville and Carl Meyer
•Exploit dangling nodes
•Solve a system instead of iterating
Langville and Meyer
Algorithm 1•Re-order rows and columns so that
dangling nodes are lumped at bottom
•Solve
•Compute
•Normalize
Improvement
•In testing, Algorithm 1 reduces the time necessary to find the PageRank vector by a factor of 1-6
•This time is data-dependent
Langville and Meyer
Algorithm 2•Reorder rows and columns so that
all submatrices have zero rows at bottom
•Solve
•For i = 2 to b, compute
•Normalize
Problem withAlgorithm 2
•Finding submatrices of zero rows takes longer than time saved in solve step
•L & M wait until all submatrices are reordered to solve primary
SourcesDeGroot, M. and Schervish, M., Probability and Statistics, 3rd Ed., Addison Wesley,
2002
Langville, A. and Meyer, C., A Reordering for the PageRank Problem, Journal of Scientific Computing, Vol. 27 No. 6, 2006
Langville, A. and Meyer, C., Deeper Inside PageRank, 2004
Lee, C., Golub, G. and Zenios, S., A Fast Two-Stage Algorithm for Computing PageRank, undated
Rebaza, J., Lecture Notes