Upload
yannis
View
37
Download
0
Embed Size (px)
DESCRIPTION
Reputation Systems for Open Collaboration. CACM 2010 Bo Adler, Luca de Alfaro, Ashutosh Kulshreshtha , Ian Pye. Reviewed by : Minghao Yan. Introduction. Open Collaboration: Egalitarian, meritocratic, self-organizing Efficient, but with challenges - PowerPoint PPT Presentation
Citation preview
REPUTATION SYSTEMS FOR OPEN COLLABORATION
CACM 2010Bo Adler, Luca de Alfaro, Ashutosh Kulshreshtha, Ian Pye
Reviewed by : Minghao Yan
Reputation Systems 2
Introduction• Open Collaboration:
• Egalitarian, meritocratic, self-organizing• Efficient, but with challenges
• quality: spam, vandalism• trust: how much you can rely on that?
• Reputation Systems:• computes reputation scores for objects within a domain, based on the
content of themselves or the external ratings.• help stem abuse• offer indications of content quality• regulates people’s interaction in open collaboraion
• Relevance to our course content• recommendation system• PageRank and HITS are “page” reputation systems
3/25/13
Reputation Systems 3
Content-driven vs. User-drivencontent-driven reputation user-driven reputation
automated content analysis explicit user feedback and ratings
derives feedback from analysis of actions uniformly
suffers from biased selections and unpredicted behaviors
can deliver results immediately depends crucially on availability of user feedback
algorithmic nature, hard for users to understand and to trust
easy to understand and trust
WikiTrust, CrowdSensus eBay, Amazon
3/25/13
Reputation Systems 4
WikiTrust• a reputation system for wiki authors and content• goals:
• incentivize users to give lasting contributions• help increase quality of content and spot vandalism• offer guide to quality of content
• consists of:• user reputation system
• gain reputation: when user making edits preserved later• lose reputation: when their edits undone by other users in future
• content reputation system• gain reputation: when revised by high-reputation user• lose reputation: when disturbed by edits
3/25/13
Reputation Systems 5
User Reputation System• assumptions:
• sequence of revisions made by different author• possible to compare and measure the difference of two revisions• possible to track unchanged content across revisions
• user reputation: • quality and quantity of contributions they make
• contribution quality:• good quality: the change is preserved in subsequent revisions• bad quality: the change is rolled back in subsequent revisions• measure on how good the contribution is?
3/25/13
Reputation Systems 6
Contribution Quality• relies on an edit distance function d:
• d(r,r’) = how many words have been deleted, inserted, replaced and displaced from r to r’
• language independent
b: the current revisiona: a past revisionc: a future revision
-1 <=q( b | a, c ) <= 1q( b | a, c ) = 1 : revision b fully preservedq( b | a, c ) = -1 : revision b fully reverted
unable to judge newly created revisions!
3/25/13
Reputation Systems 7
User Reputation• only consider non-negative reputation values• new user assigned reputation close to 0• calculating revision:
• 5 subsequent, 5 preceding, 2 previous by high-reputation author and 2 previous with high average text reputation
• why? – to let it be difficult to subvert• calculating user reputation:
• r(B) = k * d(a,b) * q(b | a,c) * log(r(C))• r(B) is reputation increment of author B of revision b• r(C) is reputation of author C of revision c• why using logarithm? – balances the influence of reputation
contribution between users
3/25/13
Reputation Systems 8
User Reputation• resistant to manipulation
• only way to damage reputation is to revert revision• maintain fairness, resistant to sybil attack
• increase reputation of B only if C has higher reputation• sybil attack – creating fake identities to gain reputation
• evaluation• ability of using user reputation to predict quality of future contribution
• recall is high: high-reputation user are unlikely to be reverted
• precision is low: many novice authors make good contributions
3/25/13
Reputation Systems 9
Content Reputation• informative, robust, explainable• how ? – according to which the content has been revised,
and the reputation of the author of the revision• edit part – assigned small faction of the author’s reputation• unchanged part – gains reputation
• tweaks• deleting, re-arranging text – low reputation mark• raise reputation only up to author’s own reputation• associate word with last few editing authors who raised the text’s
reputation• block moves• adopting edit distance weight
3/25/13
Reputation Systems 10
Crowdsensus• a reputation system to analyze user edits to Google Maps• goals
• measure accuracy of users contributing information• reconstruct possible correct listing information
• design space• relies on the existence of ground truth• user reputation is not visible• identity notion is stronger• global computation is possible
3/25/13
Reputation Systems 11
Crowdsensus• input
• triple(u, a, v) – user u asserts attribute a has value v• structure – fixpoint graph algorithm
• vertices are users and attributes• for each (u, a, v), insert an edge valued v from u to a and back• each user vertex is associated with a truthfulness value qu
• iterations• all qu are initialized to an a-priori default• user vertex send (q, v) pairs to attribute vertex• attribute inference algorithm to derive the probability distribution over
(v1, v2, ..., vn)• send back the user vertex the probability of vi is correct• truthfulness inference algorithm estimates the truthfulness of users• go for another iteration
3/25/13
Reputation Systems 12
Crowdsensus• heart of crowdsensus – attribute inference algorithm
• standard algorithm – Bayesian inference• bad for real cases• information are not independent• business attributes have different characteristics
• complete system• for multiple correct value attributes• dealing with spam• protecting system from abuse• integrated with other data pipeline components
3/25/13
Reputation Systems 13
Design Space• content-driven vs. user-driven• reputation system visible to user?• week identity vs. strong identity• existence of ground truth
• affect which algorithm used• chronological vs. global reputation updates
• global model can utilize information in graph topology (PageRank, HITS)
• chronological model can leverage past and future to prevent attack (sybil attack)
3/25/13
Reputation Systems 14
Design Space
WikiTrust content-driven
visible to users
weak identity
no ground truth
chronological updates
Crowdsensus
content-driven
not visible to users
strong identity
existence of ground truth
global updates
3/25/13
Reputation Systems 15
Conclusion• reputation systems are the on-line equivalent of the body
of laws regulates real-world people interactions• reputation systems provide ways for users to evaluate
content and improve trust level• design of reputation systems should leverage different
aspects• reputation systems should be robust, and invulnerable to
attacks (or their is no trust)• reputation systems with population-dynamic approach• reputation systems with multiple goals
3/25/13
Reputation Systems 16
Pros• well defined reputation systems characteristics and goals• discussion on design aspects and influence on reputation
systems• detail level wikitrust implementation tweaks for preventing
system from abuse and attacks• comparison of two content-driven systems well illustrated
and supported the discussion of system design considerations
• provided good evaluation measures of systems accuracy on wiki real data
3/25/13
Reputation Systems 17
Cons• lack of deeper explanation of algorithms in Crowdsensus• lack of evidence of Crowdsensus algorithm’s better
performance than standard Bayesian inference on real data
• lack of comparison between user-driven and content-driven model’s performance and how these two can work together
3/25/13