12
Divyansh Verma SAU/AM(M)/2014/14 South Asian University Email : [email protected] LINEAR ALGEBRA BEHIND GOOGLE SEARCH

LINEAR ALGEBRA BEHIND GOOGLE SEARCH

Embed Size (px)

Citation preview

Page 1: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

Divyansh VermaSAU/AM(M)/2014/14

South Asian UniversityEmail : [email protected]

LINEAR ALGEBRABEHIND

GOOGLE SEARCH

Page 2: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

Contents• Search Engine : Google• Magic Behind Google Success• PageRank Algorithm• PageRank - How it works ?• Importance of Linear Algebra in Page Ranking Algorithm• References

Page 3: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

Search Engine : GoogleWhat is a search engine?A web search engine is a software system that is designed to search for information on the World Wide Web. Eg : Google, Bing, Yahoo, Ask, etc.

Why Google?• It is the most popular search engine.• It is very simple, fast and precise.• Adaptive to growing internet.

Page 4: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

Magic Behind Google SuccessWhen Google went online in 1990’s, one thing that set it apart from other search engines was its search result listings which always delivered “good stuff”.

Search Engines like Google have to do three basic things :1. Look the web and locate all web pages with public access.2. Indexing of searched data for more efficient search.3. Rate the importance of each page in the database, so when the

user does a search, the more important pages are presented first.

Big part of the MAGIC behind Google success is its PageRank Algorithm.

Page 5: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

PageRank AlgorithmPageRank Algorithm, developed by Google’s founders, Larry Page and Sergey Brin, when they were graduate students at Stanford University.

PageRank is a link analysis algorithm that ranks the relative importance of all web pages within a network.

Three features for determining PageRank :• Outgoing Links - the number of links found in a page• Incoming Links - the number of times other pages have cited

this page• Rank - A value representing the page's relative importance in

the network.

Page 6: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

PageRank – How it Works ?

Mathematical Model of Internet1. Represent Internet as Graph2. Represent Graph as Stochastic Matrix3. Make stochastic matrix more convenient ⇒ Google Matrix4. Find Dominant eigenvector of Google Matrix ⇒ PageRank

Internet as a GraphLink from one web page to another web page.

Web graph : Web pages = nodes, Links = edges

Page 7: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

PageRank – How it Works ?

Web graph as a Matrix

Links = nonzero elements in matrix

Every page ‘i’ has li≥1 outlinks. Sij = 1/li if page I has link to page j0 otherwise

S is a Sparse Matrix, as most of the entries are zero.Probability that surfer moves from page i to page j.

12

3

4

5S =

0 1/2 0 1/2 00 0 1/3 1/3 1/30 0 0 1 00 0 0 0 11 0 0 0 0

Page 8: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

PageRank – How it Works ?

Google MatrixConvex Combination of two Stochastic Matrix gives a Google Stochastic Matrix which is reducible and more convenient.

G = αS + (1 − α)S1vT

where 0≤ α ≤1 is damping factor,S1 is a matrix whose all entries are 1,

vT is vector that models teleportation corresponding to webpage vi

Eigen Values of G are 1 > α λ2(S) ≥ α λ3(S) ≥ . . .

Unique dominant left eigenvector : πTG = πT, π ≥ 0

Links Teleportation

Page 9: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

PageRank – How it Works ?

PageRankDominant Eigen Vector πT gives PageRank corresponding webpage i

πTG = πT, π ≥ 0πi is the PageRank Corresponding to webpage i

How Google Ranks Web pages• Model : Internet → Web Graph → Stochastic Matrix G• Computation : Dominant eigenvector of G for PageRank πi

• Display : πi > πk , then page i may* be displayed before page k

*depending on hypertext analysis

Page 10: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

Importance of Linear Algebra

Using techniques of Linear Algebra, one can compute a unique solution for PageRank Problem.

It gives importance of all webpages in terms of PageRank Eigenvector corresponding to each webpage.

No other successful technique other than Linear Algebra is available to solve this problem.

Page 11: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

Referenceshttps://www.rose-hulman.edu/~bryan/googleFinalVersionFixed.pdf

http://www.math.cornell.edu/~mec/Winter2009/RalucaRemus/Lecture3/lecture3.html

http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.html

http://blog.kleinproject.org/?p=280

Page 12: LINEAR ALGEBRA BEHIND GOOGLE SEARCH

THANKYOU