33
To trust or not, is hardly the question! Wikipedia Sai Moturu

To trust or not, is hardly the question! Sai Moturu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

To trust or not, is hardly the question!

Wikipedia

Sai Moturu

We're never so vulnerable than when we trust someone but paradoxically, if we cannot trust, neither can we find love or joy

- Walter Anderson

Trust Quality

Popularity

Reach

How much we can trust is the right question…

Review two articles

Briefly summarize other publications

Agenda

What are the hallmarks of consistently good information?

Objectivity: unbiased information

Completeness: self explanatory

Pluralism: not restricted to a particular viewpoint

Define prepositions of trust

Content quality

Prepositions of trust

UML Model for Wikipedia

Six macro-areas: Quality of user, user distribution and leadership, stability, controllability, quality of editing and importance of an article.

Using the ten propositions, 50 sources of trust evidence are identified.

Macro-areas of analysis

Necessary to control the meaning of each trust factor in relationship to the others

IF stability is high AND (length is short OR edit is low OR importance is low) THEN warning

IF leadership is high AND dictatorship is high THEN warning

IF length is high AND importance is low THEN warning

Logic conditions

Calculation of Trust

Featured articles vs. Standard articles

Evaluation

Cluster Analysis

BasicThe better the authors, the better the article

quality

PeerReviewAssumption: A contributor reviews the content

before modifying it, thereby approving the content that he/she does not edit

Models

ProbReviewImproved assumption: A contributor may not

review the entire article before modifying itThe farther a word is from another that the

author has written, the lower the probability that he/she has read it

In conflicts, the higher probability is considered

Probability is modeled as a monotonically decaying function of the distance between the words

NaïveThe longer the article is , the better its qualityUsed as a baseline for comparison

Models

1. Initialize all quality and authority values equally

2. For each iteration Use authority values from previous iteration

to compute quality Use quality values to compute authority Normalize all quality and authority values

3. Repeat step 2 until convergence (alternatives: repeat until difference is very small or until maximum iterations have been reached)

Iterative computation

Use a set of articles on countries that have been assigned quality labels by Wikipedia’s Editorial team

Preprocessing: Bot revisions were removed from the analysis.Consecutive edits by a user were removed and

final edit was used.

Evaluation

Normalized discounted cumulative gain at top k (NDCG@k)Suited for ranked articles that have multiple

levels of assessment

Spearman’s rank correlationRelevant for comparing the agreement

between two rankings of the same set of objects

Evalation metrics

Results

ProbReview works best with decay scheme 2 or 3.

Article length seems to be correlated with article quality

Adding this to Basic and PeerReview models showed some improvement but ProbReview did not benefit

Conclusions

Revision trust model may help addressArticle trustFragment trustAuthor trust

A dynamic Bayesian network is used to model the evolution of article trust over revisions

Wikipedia featured articles, clean-up articles and normal articles are used for evaluation

Summary

Results

Uses revision history as well as the reputation of the contributing authors

Assigns trust to text

Summary

Propose the use of a trust tab in Wikipedia

Link-ratio: Ratio between the number of citation and the number of non-cited occurrences of the encyclopedia term

Evaluation: compare link ratio values for featured, normal and clean-up articles

Summary

Propose a content-driven reputation system for authors

Authors gain reputation when their work is preserved by subsequent authors and lose reputation when edits are undone or quickly rolled back

Evaluation: Low-reputation authors have larger than average probability of having poor quality as judged by human observers and are undone by later editors

Summary

A different question: What are the controversial articles?

Uses edit and collaboration historyTwo Models: Basic and Contributor RankContributor Rank model tries to differentiate

between disputes due to the article and those due to the aggressiveness of the contributors, with the former being the one that is to be measured

Evaluation: Identification of labeled controversial articles

Summary

Interesting area to work on

Different angles to consider and different questions too

Data is available easily and has lots of relevant features

Wikipedia editorial team classified articles help evaluation

Great scope for more work in this area

I want to look at this from the health perspective

Conclusions

Feb 29, 2008

Thank You