15
Google PageRank Algorithm By: Danny Lin

Google PageRank Algorithm By: Danny Lin. Table of Contents Google Search History / What is Page Rank? Page Rank Algorithm Inbound/Outbound Links

Embed Size (px)

Citation preview

Page 1: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Google PageRank Algorithm

By: Danny Lin

Page 2: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Table of Contents Google Search

History / What is Page Rank?

Page Rank Algorithm

Inbound/Outbound Links

Dangling Nodes

Constraints

Calculating your page rank

How to maximize your page rank score

Loopholes

Neat stuff

Page 3: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Google SearchGoogle search using PageRank:

1) Crawl the web and locate all publicly accessible webpages

2) Index the data from step 1 to allow for efficient searches for keywords or phrases

3) Rate the importance of each page in the database – using PageRank

4) Return results in descending order of importance with respect to search

Page 4: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Google’s Original Architectural Design

Source: http://infolab.stanford.edu/~backrub/over.gif

Page 5: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

History• Page Rank was conceptualized by Sergey Brin and Lawrence Page; discussed in their 1998 paper: The anatomy of a large-scale hypertextual web search engine (http://infolab.stanford.edu/~backrub/google.html)

• Used to rank the importance of web pages

Source: https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/PageRank-hi-res.png/1280px-PageRank-hi-res.png

Page 6: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Page Rank AlgorithmPR(A) = (1-d) + d(PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

PR(Tn) - The importance of page Tn.

C(Tn) - The number of outgoing links for page Tn.

PR(Tn)/C(Tn) - The calculated importance passed to page A from page Tn.

d - damping factor (0.85).

Page 7: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Inbound/Outbound LinksWith respect to page A:

Inbound links – links that point towards page A

Outbound links – links within page A pointing towards other pages

Page 8: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Dangling Nodes A dangling node is a page that does not have any outbound links.

Issue: They act as sinks that reduce the importance from the web.

Solution: Assume that the dangling node has a link to every other page. We randomly select the next page at random. This creates a stochastic matrix; all entries are nonnegative and the sum of each column is equal to 1.

Source: http://www.webworkshop.net/images/pr1.gif

Page 9: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

ConstraintsMust be primitive, i.e. for some n, Sn has all positive entries where λ1 = 1 and λ2 < 1

Must be stochastic, i.e. all entries are nonnegative and the sum of each column is equal to 1.

Must be irreducible, i.e. you should not be able to perform row/column permutations such that you end up with a block upper-triangular form. The nodes must be strongly connected.

Page 10: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Calculating your page rank“Page Rank can be calculated using a simple iterative algorithm and corresponds to the principal eigenvector of the normalized link matrix (probability distribution) of the web”

Algorithm to calculate the normalized probability distribution:

1) Multiply stochastic matrix, S, with an random eigenvector, i1, to get new eigenvector, i2…

2) Repeat step 1 until in-1 = in (approx.)

LINEAR ALGEBRA TIME!!! Page Rank calculation time!

Page 11: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

How to maximize your page rank score Internal Linking – having links to other pages within your website

Hierarchical Fully meshed

Good and plentiful content E.g. news website

Provide a useful service or product E.g. phpbb – online bulletin board system

Page 12: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Loopholes SEO (Search Engine Optimization) webpages to increase traffic flow conversions $$

An issues that arose from this: the selling of links from high PR pages

Source: http://www.bloggingcage.com/wp-content/uploads/2015/07/pr8links.png

Page 13: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Neat stuff Overview of a google search (1-2 minutes):

http://www.google.com/insidesearch/howsearchworks/thestory/index.html

How search has evolved (6 minutes):

https://www.youtube.com/watch?v=mTBShTwCnD4

Changes to Google’s search algorithm:

https://moz.com/google-algorithm-change

Page 14: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

ReferencesContent

http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html

http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

http://www.ams.org/samplings/feature-column/fcarc-pagerank

http://infolab.stanford.edu/~backrub/google.html

http://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf

http://www.google.com/insidesearch/howsearchworks/thestory/index.html

Images

https://lh4.googleusercontent.com/-vAlbgOEKiNI/TtkBZvZLnDI/AAAAAAAAMrw/ooZ1Thuutmw/w1034-h587-no/OriginalGooglePage.PNG

http://infolab.stanford.edu/~backrub/over.gif

https://upload.wikimedia.org/wikipedia/commons/thumb/6/69/PageRank-hi-res.png/1280px-PageRank-hi-res.png

http://www.webworkshop.net/images/pr1.gif

http://www.bloggingcage.com/wp-content/uploads/2015/07/pr8links.png

Page 15: Google PageRank Algorithm By: Danny Lin. Table of Contents  Google Search  History / What is Page Rank?  Page Rank Algorithm  Inbound/Outbound Links

Questions?

Source: https://lh4.googleusercontent.com/-vAlbgOEKiNI/TtkBZvZLnDI/AAAAAAAAMrw/ooZ1Thuutmw/w1034-h587-no/OriginalGooglePage.PNG