23
L23: PageRank JeM. Phillips April 13, 2020

L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

L23: PageRank

Je↵ M. Phillips

April 13, 2020

Page 2: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Final Report

At most 4 pages/student. Don’t cram in too much!

I Succinct title (and names)

I Problem definition and motivation.

I Explain your Data.

I key idea

I What did you do (which techniques, an implementation, acomparison, an extension)

I What did you learn? Artifacts (charts, plots, examples, math)and Intuition (in words, did it work?)

I ← Same ten Posters

o

Page 3: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

webpage-simia.it( Search ). InvertedIndex

year ;i¥¥ . .

bout→ page 't

, P ?,"

car"

→¥pQgirl)

-

Define most relevantwebpages-

D- d-'

ne'EE¥÷÷÷÷:÷:i:smarm :÷÷÷i÷"¥xE÷÷g÷÷:X

Page 4: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Crawlers : program ; that walksaround web : ① read page

update fentwiuecfor

② follow random

inverting hyperlinks

use hyperlink infoL a hired "

www.pie.com" )pie

- 3

Spammersbuild fleet pages i link to your

Page w/ hyperlink tag .

Page 5: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

studies : Alternative to search Engine

Yahoo ! and Look Smart

Built an organized , curatedcollection of websites

Page 6: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

pageR-ankfslpjiturml-fktytg.fi?ggignhspiddeieafe

1ttqitlinkedto

balanceboth pages .

page is important ifa

random MCMC' ' random surfer "

were

to fond it.

-

Web is a big graph G- L V,

E)

V = I set do all pages }

F- = l Ei ; = link pi → B- 3

Petone ML → g* ⇐ converged to vectordistribution

q*Lj ) says how important page ; is .

Page 7: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Compute q* of lwepgraph-

• Keep truck of crawlers : how frequentreturn .

• Buy bag computer : Compute eigcp )← probtra.ca)

• Precompute P#

=P - P - p .

. . ..

. p

← too big°

q*= go ←last night

a :÷÷÷o÷÷÷t÷÷¥power

method

Page 8: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Anatomy of Web

Strongly ConnectedComponent

IN

OUTTubes

tendrilsOUT

tendrilsIN

disconnected

ANATOMY of WEB

is this G ergodic

÷ . . Is•

i'④ yo

Page 9: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Anatomy of Web

0

O

Page 10: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

canwemaheGergodi.ci#• Teleportation ,/ taxation

→ about once every I steps

→ jumpto randomnode

.

P prob trans (G)13--0-15 p

R= a - is ) Pepe Q¥fIf¥¥Z

↳ dense

Rq . - Ul - B) Ptp'

G)ginLt B) Pain + qt nxt vector- -

Page 11: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Spam Farms

÷ .

Google counter

^seafhs¥at

Page 12: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Trust ( noes ? )

only teleport to trustedpages .

r ← qx , pagerank

t ← get trusted teleport

rts ) - ft ; ) if l arse → spam

↳ truthfulness of webpage

Page 13: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)
Page 14: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)
Page 15: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)
Page 16: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Word CountConsider as input all of English Wikipedia stored in DFS. Goal is tocount how many times each word is used.

Page 17: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Inverted IndexConsider as input all of English Wikipedia stored in DFS. Goal is tobuild an index, so each word has a list of pages it is in.

Page 18: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

PhrasesConsider as input all of English Wikipedia stored in DFS. Goal is tobuild an index, on 3-grams (sequence of 3 words) that appears onexactly one page, with link to page.

Page 19: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Label Propagation (Graph)Consider a large graph G = (V ,E ) (e.g., a social network), with asubset of notes V 0 ⇢ V with labels (e.g., {pos, neg}). Each nodestores its label (if any) and edges.Assign a vertex a label if (a) unlabled, (b) has � 5 labeledneighbors, (c) based on majority vote.

Page 20: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Label Propagation (Embedding)Consider a data set X ⇢ Rd , with a subset of points X 0 ⇢ X withlabels (e.g., {pos, neg}). Implicitly defines graph with V = X andE using k = 20 nearest neighbors.Assign a vertex a label if (a) unlabled, (b) has � 5 labeledneighbors, (c) based on majority vote.

Page 21: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Example PageRank

M =

2

664

0 1/2 0 01/3 0 1 1/21/3 0 0 1/21/3 1/2 0 0

3

775

Stripes:

M1 =

2

664

01/31/31/3

3

775 M2 =

2

664

1/2001/2

3

775 M3 =

2

664

0100

3

775 M4 =

2

664

01/21/20

3

775

These are stored as�1 : (1/3, 2), (1/3, 3), (1/3, 4)

�,�

2 : (1/2, 1)(1/2, 4)�,�3 : (1, 3)

�, and

�4 : (1/3, 1), (1/2, 2)

�.

Page 22: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Example PageRank

M =

2

664

0 1/2 0 01/3 0 1 1/21/3 0 0 1/21/3 1/2 0 0

3

775

Stripes:

M1 =

2

664

01/31/31/3

3

775 M2 =

2

664

1/2001/2

3

775 M3 =

2

664

0100

3

775 M4 =

2

664

01/21/20

3

775

These are stored as�1 : (1/3, 2), (1/3, 3), (1/3, 4)

�,�

2 : (1/2, 1)(1/2, 4)�,�3 : (1, 3)

�, and

�4 : (1/3, 1), (1/2, 2)

�.

Page 23: L23: PageRankjeffp/teaching/cs5140/L23-notes.pdf · Google counter ^seafhs¥at. Trust (noes?) only teleport to trusted pages. r ← qx, pagerank t ← get trusted teleport rts )-ft;)

Example PageRank

M =

2

664

0 1/2 0 01/3 0 1 1/21/3 0 0 1/21/3 1/2 0 0

3

775

Blocks:

M1,1 =

0 1/2

1/3 0

�M1,2 =

0 01 1/2

�M2,1 =

1/3 01/3 1/2

�M2,2 =

0 1/20 0

These are stored as�1 : (1/2, 2)

�,�2 : (1/3, 1)

�, as�

2 : (1, 3), (1/2, 4)�, as

�3 : (1/3, 1)

�,�4 : (1/3, 1), (1/2, 2)

�, and

as�3 : (1/2, 4)

�.