31
WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Embed Size (px)

Citation preview

Page 1: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

WikiTrust: Turning Wikipedia

Quantity into Quality

B. Thomas Adler, Luca de Alfaro, and Ian Pye

Page 2: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

•Wikipedia:

•3,000,000+ Article,

•1,000,000,000+ Revisions

Our Goal: Crowd-sourcing community consensus

Page 3: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Vandalism

•Prevents Wikipedia being taken fully seriously

•Harder to use Wikipedia in schools

•Harder to make static selections

Page 4: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

•Zero-delay: Use only those features which are available at the time the revision is created. (no lookahead)

•Historical: Use the full set of WikiTrust features, including how the revision is treated by subsequent authors. (lookahead)

Vandalism DetectionGiven a new revision, classify as Vandalism or Regular

Page 5: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

•Wikipedia 1.0 Project: Aims to extract a static snapshot of Wikipedia.

•Use in Schools, Developing Countries, OLPC Project.

Revision SelectionGiven an article, select the “best” revision to show to a user.

Page 6: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Core Concepts•Wikipedia Article

•Many Revisions

•1 Author per Revision

•Author has Reputation, Revision has Trust.

•Binary Classifier: Either A or B.

Page 7: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Zero Day Features•Author is Anonymous (Turns out we

don’t care)

•Time interval after the previous edit (Useful, but only as a predicate time > 12 seconds)

•Time of day of edit (Not used)

Page 8: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Zero Day Features•Difference from previous revisions

(Not really)

•Comment Length (Nope)

Page 9: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Zero Day Features(we care about these)

•Previous Text Trust Histogram

•Current Text Trust Histogram

•Histogram Difference

Page 10: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Text Trust•New text starts with a trust value

proportional to the author's reputation.

•Text can gain trust when revised.

•Cut-and-paste, deletions result in local trust loss.

•We remember deleted text and its trust.

Page 11: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

A Sequence of Differences

•For revisions v1, v2, v3... of a wiki, word trust is computed from the difference between vi, vi-1

•How did we arrive at the current version of an article?

Page 12: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Text Trust: The Algorithm Illustrated

1) Trust of new text

1

Page 13: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Text Trust: The Algorithm Illustrated

1) Trust of new text

2) New block borders have the same trust as new text

2 22

Page 14: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Text Trust: The Algorithm Illustrated

1) Trust of new text

2) New block borders have the same trust as new text

3) The revision effect increases the trust of existing text

3 3

Page 15: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Text Trust: The Algorithm Illustrated

1) Trust of new text

2) New block borders have the same trust as new text

3) The revision effect increases the trust of existing text4) Note: this is not a new border

4

4

Page 16: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Zero Day Features(we care about these)

•Previous Text Trust Histogram

•Current Text Trust Histogram

•Histogram Difference

Page 17: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Historical Features

•Next revision comment length (length > 110 chars)

•Next revision comment has the word revert in it (too noisy)

Page 18: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Historical Features•Author Reputation (How do other

users judge this user’s edits?)

Page 19: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Historical Features

•Minimum Revision Quality

•Average Revision Quality

•Maximum Dissent

Page 20: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Historical Features

•Total Weight of Judges (not at all)

Page 21: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

ROC AUC Scoring

•>0.90 = Excellent

•0.8 - 0.9 = Good

•< 0.8 = Poor

•0.5 = Expected result from flipping a coin

Probability that a binary classifier is correct

Page 22: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Results (PAN 2010)ROC of 0.937

Page 23: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Results (PAN 2010)ROC of 0.937XROC of 0.914 ?

Page 24: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Results (PAN 2010)ROC of 0.937XROC of 0.904 ?

Page 25: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

Other Directions

•Wikipedia 1.0

•Vandalism API

•Newsgroup Reputation

•IP Address Reputation

Page 26: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye

The fraction of change that is in the same direction of the future.

• Qual = 1: vj is a totally good edit

• Qual = -1: vj is reverted

• -1 ≤ Qual ≤ 1

vi

vk

vj

“work done”d(v

i, vj)

d(v

i , vj )-d

(vj , v

k )

“prog

ress”

the past

the future

Revision Quality

Page 27: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye
Page 28: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye
Page 29: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye
Page 30: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye
Page 31: WikiTrust: Turning Wikipedia Quantity into Quality B. Thomas Adler, Luca de Alfaro, and Ian Pye