Reputation Systems for Open Collaboration

REPUTATION SYSTEMS FOR OPEN COLLABORATION

CACM 2010Bo Adler, Luca de Alfaro, Ashutosh Kulshreshtha, Ian Pye

Reviewed by : Minghao Yan

Reputation Systems 2

Introduction• Open Collaboration:

• Egalitarian, meritocratic, self-organizing• Efficient, but with challenges

• quality: spam, vandalism• trust: how much you can rely on that?

• Reputation Systems:• computes reputation scores for objects within a domain, based on the

content of themselves or the external ratings.• help stem abuse• offer indications of content quality• regulates people’s interaction in open collaboraion

• Relevance to our course content• recommendation system• PageRank and HITS are “page” reputation systems

3/25/13


Content-driven vs. User-drivencontent-driven reputation user-driven reputation

automated content analysis explicit user feedback and ratings

derives feedback from analysis of actions uniformly

suffers from biased selections and unpredicted behaviors

can deliver results immediately depends crucially on availability of user feedback

algorithmic nature, hard for users to understand and to trust

easy to understand and trust

WikiTrust, CrowdSensus eBay, Amazon

3/25/13


WikiTrust• a reputation system for wiki authors and content• goals:

• incentivize users to give lasting contributions• help increase quality of content and spot vandalism• offer guide to quality of content

• consists of:• user reputation system

• gain reputation: when user making edits preserved later• lose reputation: when their edits undone by other users in future

• content reputation system• gain reputation: when revised by high-reputation user• lose reputation: when disturbed by edits

3/25/13


User Reputation System• assumptions:

• sequence of revisions made by different author• possible to compare and measure the difference of two revisions• possible to track unchanged content across revisions

• user reputation: • quality and quantity of contributions they make

• contribution quality:• good quality: the change is preserved in subsequent revisions• bad quality: the change is rolled back in subsequent revisions• measure on how good the contribution is?

3/25/13


Contribution Quality• relies on an edit distance function d:

• d(r,r’) = how many words have been deleted, inserted, replaced and displaced from r to r’

• language independent

b: the current revisiona: a past revisionc: a future revision

-1 <=q( b | a, c ) <= 1q( b | a, c ) = 1 : revision b fully preservedq( b | a, c ) = -1 : revision b fully reverted

unable to judge newly created revisions!

3/25/13


User Reputation• only consider non-negative reputation values• new user assigned reputation close to 0• calculating revision:

• 5 subsequent, 5 preceding, 2 previous by high-reputation author and 2 previous with high average text reputation

• why? – to let it be difficult to subvert• calculating user reputation:

• r(B) = k * d(a,b) * q(b | a,c) * log(r(C))• r(B) is reputation increment of author B of revision b• r(C) is reputation of author C of revision c• why using logarithm? – balances the influence of reputation

contribution between users

3/25/13


User Reputation• resistant to manipulation

• only way to damage reputation is to revert revision• maintain fairness, resistant to sybil attack

• increase reputation of B only if C has higher reputation• sybil attack – creating fake identities to gain reputation

• evaluation• ability of using user reputation to predict quality of future contribution

• recall is high: high-reputation user are unlikely to be reverted

• precision is low: many novice authors make good contributions

3/25/13


Content Reputation• informative, robust, explainable• how ? – according to which the content has been revised,

and the reputation of the author of the revision• edit part – assigned small faction of the author’s reputation• unchanged part – gains reputation

• tweaks• deleting, re-arranging text – low reputation mark• raise reputation only up to author’s own reputation• associate word with last few editing authors who raised the text’s

reputation• block moves• adopting edit distance weight

3/25/13


Crowdsensus• a reputation system to analyze user edits to Google Maps• goals

• measure accuracy of users contributing information• reconstruct possible correct listing information

• design space• relies on the existence of ground truth• user reputation is not visible• identity notion is stronger• global computation is possible

3/25/13


Crowdsensus• input

• triple(u, a, v) – user u asserts attribute a has value v• structure – fixpoint graph algorithm

• vertices are users and attributes• for each (u, a, v), insert an edge valued v from u to a and back• each user vertex is associated with a truthfulness value qu

• iterations• all qu are initialized to an a-priori default• user vertex send (q, v) pairs to attribute vertex• attribute inference algorithm to derive the probability distribution over

(v1, v2, ..., vn)• send back the user vertex the probability of vi is correct• truthfulness inference algorithm estimates the truthfulness of users• go for another iteration

3/25/13


Crowdsensus• heart of crowdsensus – attribute inference algorithm

• standard algorithm – Bayesian inference• bad for real cases• information are not independent• business attributes have different characteristics

• complete system• for multiple correct value attributes• dealing with spam• protecting system from abuse• integrated with other data pipeline components

3/25/13


Design Space• content-driven vs. user-driven• reputation system visible to user?• week identity vs. strong identity• existence of ground truth

• affect which algorithm used• chronological vs. global reputation updates

• global model can utilize information in graph topology (PageRank, HITS)

• chronological model can leverage past and future to prevent attack (sybil attack)

3/25/13


Design Space

WikiTrust content-driven

visible to users

weak identity

no ground truth

chronological updates

Crowdsensus

content-driven

not visible to users

strong identity

existence of ground truth

global updates

3/25/13


Conclusion• reputation systems are the on-line equivalent of the body

of laws regulates real-world people interactions• reputation systems provide ways for users to evaluate

content and improve trust level• design of reputation systems should leverage different

aspects• reputation systems should be robust, and invulnerable to

attacks (or their is no trust)• reputation systems with population-dynamic approach• reputation systems with multiple goals

3/25/13


Pros• well defined reputation systems characteristics and goals• discussion on design aspects and influence on reputation

systems• detail level wikitrust implementation tweaks for preventing

system from abuse and attacks• comparison of two content-driven systems well illustrated

and supported the discussion of system design considerations

• provided good evaluation measures of systems accuracy on wiki real data

3/25/13


Cons• lack of deeper explanation of algorithms in Crowdsensus• lack of evidence of Crowdsensus algorithm’s better

performance than standard Bayesian inference on real data

• lack of comparison between user-driven and content-driven model’s performance and how these two can work together

3/25/13

Documents

Reputation Systems for Open Collaboration