Click here to load reader
Upload
arvind-devaraj
View
255
Download
0
Embed Size (px)
DESCRIPTION
how page ranking in search engines work
Citation preview
Page Ranking
presented by
Arvind,Chintan,Raveendra
Motivation
● World Wide Web was released in 1991 and within few years, it justified its nam e.
● In January 2001, the num ber of hosts stood at 110 m illion and the num ber of web-sites had reached 30 m illion .
● A search engine m ust search through these m illions of sites to give m ost ' relevan t' results to the user.
● There the concept of ranking of pages com es usefu l.
Goals of page ranking
• A page m ust have a h igh PageRank if there are m any pages that poin t to it.
• If there are som e pages that poin t to it and have a h igh PageRank then also it m ust have h igh rank.
• Pages that are well cited from m any p laces (like http :/ / www.iisc.ernet.in / ) around the web are
worth looking at. • Pages that have perhaps on ly one citation from
som ething like the Yahoo! hom epage are also generally worth looking at.
Exam ple• A Problem sim ilar to page ranking arises in
rating sport team s• Consider the ranking of cricket team s. Their
perform ance in a tournam en t is shown below.
• Can win coun t alone suffice for rating team s?• Here we like to rate A higher than B since A won
against B.
Won(W) /Lost(L) India Australia Pakistan Kenya Total Wins
India ( A ) - W W L 2
Australia ( B ) L - W W 2
Pakistan ( C ) L L - W 1
Kenya ( D ) W L L - 1
Exam ple(...con td.)• Consider the graph, where an edge
is drawn from loser to winner.
• First assign equal weights (w) to every one and then assign them new weights(w') as follows:
where i lost against a
k(i) is total losses of team i
This is because, we want a team to go higher up the ranking for winning against a team which is already higher up the ranking than for winning against a team which is not highly rated as shown in 2nd figure.
w ' ai
w i k i
•Weights get refined in successive iterations as shown in diagram s beside.•Continuing like th is we converge to an equilibrium state as shown in figure below:
Exam ple(...con td.)
Eigen Vectors
• Speaking in term s of m atrices, we are using a m atrix norm alized along the colum ns
• Then we are m ultip lying M by the in itial weight vector W to get a new weight vector W' which is again m ultip led by M to get W' ' and so on un til we get a vector Wi' such that
W ' = M * W ' = W '
0 1 ½ 00 0 ½ ½0 0 0 ½1 0 0 0
M =
ii+1 i
Eigen Vectors (...con td.)
• This is nothing but the eigen -value problem with eigen -value 1.
• That is we wan t to solve the equation M * W = W
i.e. we wan t to find an eigenvector W with eigen -value 1.
● We could directly have used th is concept to find the required weights for the team s.
Extending to web pages
• We can use the sam e concept to find weights for web pages and rank them .
csa_showcase.com
yahooindia.com linux.org
bogus.com
waste.com
1/ 3
1/ 3
1/ 3
1/ 31/ 3
1/ 3
1/ 2 1/ 2
1
1
Here, M =
Solving the equation M*W=Wgives us the following weights:
csa_showcase.com 0.4
linux.org 0.4
yahooindia.com 0.2
bogus.com 0
waste.com 0
Add- ons
• Page Rank com putation can be considered as a stationary distribution of Markov chains.
• The eigen -value com putation W=M.W can be considered as finding the fixed poin t. This is sim ilar to the equation X=F(X) ( whose convergen t value can be iteratively found, say by Newton Raphson m ethod)
Reference
• Lawrence Page, Sergey Brin , Rajeev Motwan i, and Terry Winograd. “The PageRank Citation Ranking: Bringing Order to the Web” Techn ical Report, Stan ford Un iversity, 1998