25
22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) WWW2006 Workshop LEHIGH UNIVERSITY

22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW) Workshop L EHIGH U NIVERSITY

Embed Size (px)

Citation preview

Page 1: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

LEHIGH

UNIVERSITY

Page 2: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Introduction: Web Search Web search – the access to the Web for

hundreds of millions of people Hundreds of millions of queries per day

Queries + people = TRAFFIC

A HUGE incentive for web site owners to rank highly in search engine results Communicate some message

(advertising, political statement) Install viruses, adware, etc.

Google

Yahoo!

MSN

Search

Ask

A9

Exalead

Gigablast

+

metasearch

+

many

more!

Page 3: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Introduction: Web Spam a.k.a. search engine spam, spamdexing Any technique to manipulate search engine

results Target page gets an undeservedly higher

ranking

Many methods Link farms, keyword stuffing, cloaking, link

bombs, and more

The target of much of our work!

Page 4: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

Propagating Trust and Distrust to Demote Web Spam

Baoning Wu, Vinay Goel, and Brian D. Davison

Computer Science & EngineeringLehigh UniversityBethlehem, PA USA

Page 5: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Outline

Background and motivation Proposed methods Experimental results

Page 6: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Background: PageRank (Page and Brin, 1998) Uses number and status of “parents” to

determine status of child r(i+1) = (1-α) * T * r(i) + α * s

r: PageRank score vector (with N nodes) T: transition matrix (NxN) (1-α): decay factor; α: jump probability s: uniform distribution of 1/N

PageRank score generates a ranking of importance of node

Page 7: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Background: TrustRank (Gyongyi and Garcia-Molina, VLDB 2004) Uses number and trust of “parents” to

determine trust status of child t(i+1) = (1-α) * T * t(i) + α * s

t: TrustRank score vector (with N nodes) T: transition matrix (NxN) (1-α): decay factor s: seed set trust score distribution

Vector of size N, but only seed nodes are non-zero

Demotes web spam by propagating trust from a known good seed set.

Page 8: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Specific Motivation In TrustRank

Parent divides its trust among its children. This may not be optimal – real-world trust

relationships are independent of the number of trusted entities.

Distrust can also be propagated.

A BHyperlink

Trust Propagation

Distrust Propagation

Page 9: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Key steps in propagation

Decay of trust (d) Trust is not perfectly transitive.

Splitting of trust For each parent, how to divide its score

among its children.

Accumulation of trust For each child, how to accumulate the

overall score given the portions from all of its parents.

Page 10: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Outline

Background and motivation Proposed methods Experimental results

Page 11: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Choices for Trust Splitting

Given a node i with trust score TR(i) and O(i) outgoing links: Equal splitting

Gives d*TR(i)/O(i) to each child (used by TrustRank)

Constant splitting Gives d*TR(i) to each child

Logarithmic splitting Gives d*TR(i)/log(1+O(i)) to each child

Page 12: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Choices for Trust Accumulation

Simple summation Sum the trust values from each parent

Maximum share Use the maximum of the trust values

sent by the parents

Maximum parent Sum the trust values but never exceed

the trust score of most-trusted parent

Page 13: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Propagating Distrust

Distrust can be propagated from a seed set of bad nodes.

Similar to trust propagation, but in reverse – follow incoming links, not outgoing links

Same key choices for decay, splitting and accumulation

Page 14: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Combining Trust and Distrust

For each node i, Trust score TR(i) and Distrust score DIS_TR(i), the combination score Total(i) can be

Total(i) = ŋ * TR(i) – ß * DIS_TR(i)

where 0 ≤ ŋ ≤ 1, 0 ≤ ß ≤ 1

Page 15: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Outline

Background and motivation Proposed methods Experimental results

Page 16: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Data set 20M pages from the Swiss search

engine [search.ch] in 2004 350K sites with “.ch” domain

We used only this site graph Seed sets

3,589 labeled sites as using web spam with various techniques (provided)

20,005 sites with pages in dir.search.ch topics as trusted set

Page 17: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Experimental Design

Explore various combinations of trust and distrust propagation

Evaluation Performance of TrustRank is the number

of spam sites found among the highest-ranked ~1% of sites.

We use the same metric in this work.

Page 18: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Baseline result

Algorithm Num. spam sites

PageRank 90

TrustRank 58

Topical TrustRank(Wu et al., WWW2006)

33-42

Page 19: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1

0

5

10

15

20

25

30

35

40

45

50

55

60

65

Jump probability

No.

of s

pam

site

s in

top

10 b

ucke

tsSimple TrustRank Improvement:

Increase jump probability (α)

(α)

defaultα=0.15

Page 20: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Other trust propagation methods

Algorithm ConstantSplitting

LogarithmicSplitting

Decay= 0.1 0.3 0.7 0.9 0.1 0.3 0.7 0.9

Simple Summation

364 364 364 364 364 364 364 364

MaximumShare

34 34 34 34 13 12 20 18

MaximumParent

27 32 33 33 372 27 29 32

Page 21: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Results of propagating distrustCombined equally with TrustRank, 200 seeds

AlgorithmConstantSplitting

LogarithmicSplitting

dDistrust = 0.1 0.3 0.7 0.9 0.1 0.3 0.7 0.9

Simple Summation

53 53 55 55 57 53 53 53

MaximumShare

53 53 53 53 59 53 52 52

MaximumParent

53 53 53 53 57 53 53 53

Page 22: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Combining trust and distrust Using best scoring trust and distrust formulations, beta=(1-eta)

0

2

4

6

8

10

12

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Value of eta

Nu

mb

er o

f sp

am s

ites

in t

op

1.1

%

Trial 1

Trial 2

Trial 3

(Distrust Only) (Trust Only)

>2200

Page 23: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Coverage of trust propagation

Algorithm ConstantSplitting

LogarithmicSplitting

Decay 0.1 0.3 0.7 0.9 0.1 0.3 0.7 0.9

MaximumShare

77.71 77.73 77.74 77.74 77.19 77.72 77.73 77.73

MaximumParent

77.52 77.71 77.73 77.74 76.93 77.60 77.71 77.72

Percentage of sites affected by approach. TrustRank reached 76.05%.

Page 24: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Conclusions Propagating trust based on outdegree does

not appear to be optimal. Alternative splitting and accumulation

methods can help to demote top ranked spam sites.

Propagating distrust can also help to demote top ranked spam sites.

Additional tests needed! E.g., to examine impact on retrieval

Page 25: 22 May 2006 Wu, Goel and Davison Models of Trust for the Web (MTW)  Workshop L EHIGH U NIVERSITY

22 May 2006Wu, Goel and Davison

Models of Trust for the Web (MTW)WWW2006 Workshop

Thank You!

Questions?

Contact Info:Dr. Brian D. Davisondavison(at)cse.lehigh.eduWUME LaboratoryComputer Science and EngineeringLehigh UniversityBethlehem, PA 18015 USA

The WUME Lab http://wume.cse.lehigh.edu/