Upload
itsmedv91
View
90
Download
6
Tags:
Embed Size (px)
Citation preview
Divyansh VermaSAU/AM(M)/2014/14
South Asian UniversityEmail : [email protected]
LINEAR ALGEBRABEHIND
GOOGLE SEARCH
Contents• Search Engine : Google• Magic Behind Google Success• PageRank Algorithm• PageRank - How it works ?• Importance of Linear Algebra in Page Ranking Algorithm• References
Search Engine : GoogleWhat is a search engine?A web search engine is a software system that is designed to search for information on the World Wide Web. Eg : Google, Bing, Yahoo, Ask, etc.
Why Google?• It is the most popular search engine.• It is very simple, fast and precise.• Adaptive to growing internet.
Magic Behind Google SuccessWhen Google went online in 1990’s, one thing that set it apart from other search engines was its search result listings which always delivered “good stuff”.
Search Engines like Google have to do three basic things :1. Look the web and locate all web pages with public access.2. Indexing of searched data for more efficient search.3. Rate the importance of each page in the database, so when the
user does a search, the more important pages are presented first.
Big part of the MAGIC behind Google success is its PageRank Algorithm.
PageRank AlgorithmPageRank Algorithm, developed by Google’s founders, Larry Page and Sergey Brin, when they were graduate students at Stanford University.
PageRank is a link analysis algorithm that ranks the relative importance of all web pages within a network.
Three features for determining PageRank :• Outgoing Links - the number of links found in a page• Incoming Links - the number of times other pages have cited
this page• Rank - A value representing the page's relative importance in
the network.
PageRank – How it Works ?
Mathematical Model of Internet1. Represent Internet as Graph2. Represent Graph as Stochastic Matrix3. Make stochastic matrix more convenient ⇒ Google Matrix4. Find Dominant eigenvector of Google Matrix ⇒ PageRank
Internet as a GraphLink from one web page to another web page.
Web graph : Web pages = nodes, Links = edges
PageRank – How it Works ?
Web graph as a Matrix
Links = nonzero elements in matrix
Every page ‘i’ has li≥1 outlinks. Sij = 1/li if page I has link to page j0 otherwise
S is a Sparse Matrix, as most of the entries are zero.Probability that surfer moves from page i to page j.
12
3
4
5S =
0 1/2 0 1/2 00 0 1/3 1/3 1/30 0 0 1 00 0 0 0 11 0 0 0 0
PageRank – How it Works ?
Google MatrixConvex Combination of two Stochastic Matrix gives a Google Stochastic Matrix which is reducible and more convenient.
G = αS + (1 − α)S1vT
where 0≤ α ≤1 is damping factor,S1 is a matrix whose all entries are 1,
vT is vector that models teleportation corresponding to webpage vi
Eigen Values of G are 1 > α λ2(S) ≥ α λ3(S) ≥ . . .
Unique dominant left eigenvector : πTG = πT, π ≥ 0
Links Teleportation
PageRank – How it Works ?
PageRankDominant Eigen Vector πT gives PageRank corresponding webpage i
πTG = πT, π ≥ 0πi is the PageRank Corresponding to webpage i
How Google Ranks Web pages• Model : Internet → Web Graph → Stochastic Matrix G• Computation : Dominant eigenvector of G for PageRank πi
• Display : πi > πk , then page i may* be displayed before page k
*depending on hypertext analysis
Importance of Linear Algebra
Using techniques of Linear Algebra, one can compute a unique solution for PageRank Problem.
It gives importance of all webpages in terms of PageRank Eigenvector corresponding to each webpage.
No other successful technique other than Linear Algebra is available to solve this problem.
Referenceshttps://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf
http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html
http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.html
http://blog.kleinproject.org/?p=280
THANKYOU