Can Peer-to-Peer File-sharing be of Help for Research
Communities ?
Julita Vassileva
Computer Science Department (MADMUC Lab)
University of Saskatchewan
2
Outline
• Motivation• Problems: user participation, trust• Motivating user participation
– User modelling– Reward with better QofS– Social awareness (visualization)
• Ensuring trust• Conclusions
3
Motivation
• Need a search engine for locally stored papers– Web –links disappear, protected sites– Hard disks too large
• Why P2P?– Harvest the resources of a
community of users – Advantages of a
distributed approach vs centralizedmaranGraphics Inc.
5
COMTELLA
• A P2P (Gnutella based) system for file sharing and service– users share academic papers, code snippets
• Non-centralized digital library for a research group / class
• Can be downloaded from: http://bistrica.usask.ca/madmuc/news.htm
6
Christopher CoxNSERC Summer2002 project
Helen BretzkeCRA-W and NSERC Summer’2002 project
Lingling SunGraduate student
Yamini UpadrashtaGraduate student
Vassileva J. (2002) Supporting Peer-to-Peer User Communities, in R. Meersman, Z. Tari et al. (Eds.) "On the Move to Meaningful Internet Systems 2002: CoopIS, DOA, and ODBASE" Coordinated International Conferences Proceedings, Irvine, Springer LNCS 2519, 230-247.
Vassileva, J. (2002) Motivating Participation in Peer to Peer Communities,in P.Peta, R.Tolksdorf, F. Zambonelli (Eds.) Engineering Societies in theAgents World III, Proceedings of the 3rd International Workshop ESAW'02,Madrid, Springer LNAI 2577, 141-155.
Bretzke H., Vassileva J. (2003) Motivating Cooperation in Peer to Peer Networks, in P.Brusilovsky, A. Corbett, F.De Rosis (eds.) Proceedings of the 9th International Conference, on User Modelling, UM03, Johnstown, PA,Springer LNCS, 218-227.
8
Problems• User Participation
– “critical mass” needed– most users are free-riders– why do people contribute?
• satisfies a need (is useful)• doesn’t cost (effort, money, inconvenience)• there is some incentive (money, glory, power)• serves a greater cause (e.g. cancer research,
SETI@home, etc.)• Trust
– sure that contributing won’t cause harm – able to identify trustworthy peers
9
First condition: system must be useful
• Allow searching own files– Any file stored on disk can be found with Comtella – Shared files can be stored anywhere on disk
• Integration with other tools– With Browser (e.g. IE, Netscape, Mozilla, etc.)
• allows viewing files directly from Comtella• prompts the user to share papers when a PDF file opened
– With Word Processor (e.g. MS Word)• generating lists of references automatically
• Additional functionality– Adding annotations and ratings to papers
10
Levels of participation
• Bring new files
• Provide disk space / processor time
• Dispatch requests
• Stay on-line
• Use and quit
11
socially motivated
Why do people offer their time and resources? Different people have different motivations:
materialistic
Some are altruists
Some would help their friends and hope to make new friends through helping
Some seek glory
Some seek high marks
or money…
How to motivate participation?
altruistic
Some would expect better service utilitarian
12
Incentives
Micro-payments for each transaction?
Shirky says it won’t ever work (e.g. Mojo-nation): Flat rates work better (e.g. Internet, cable)
How to map virtual currency into real money?
13
socially motivated
Why do people offer their time and resources? Different people have different motivations:
materialistic
Some are altruists (for the cause)
Some would help their friends and hope to make new friends through helping
Some seek glory
Some seek high marks
or money…
How to motivate participation?
altruistic
Some would expect better service utilitarian
14
Know your user!
• User Type: Altruist? Socialist? Utilitarian?
• User Interests: What does she search / need?
• User Relationships and Community: Who shares interest with the user? Potential “friends” and “foes”.
Modelling
15
• Define a taxonomy of subject categories (e.g. ACM subject index)
• Keep track of the categories of queries ( user interests)
• Keep track of resources offered by the user in each interest category
• Update user level of interest in each sub-category using reinforcement learning
• Cluster users in interest-based groups
Modelling user interests
16
Computing user interests
• Reinforcement learningThe user’s strength of interest S in an area a is
calculated based on how frequently and how recently the user has searched in this area.
Sa(et, t) = i * Sa(e t-1, t-1) + (1 - i) * et where et [0, 1] is calculated as et = 1/ d, and
d = 1 + level_distance between the level of the sub-area of the query and the level of the area a in the
ontology hierarchy. Currently, the ontology hierarchy has only 2 levels, so et = 0.5
17
Modelling user relationships• Monitor whose files the user chooses, the quality of the
files (does the user keep the files), and who downloads files offered by the user
• Represent each user relationship:For each area of interest– Strength – how successful service was given
(reinforcement learning used, similar to user interests)
– Balance – reciprocity of services used/ given
• Adapt P2P topology – form a neighborhood for search using the best relationships (“friends”) in the area of search
Gnutella
18
Computing the balance of a relationship
• BXY = (N XY - N YX ) / (N XY + N YX ) BXY [-1, 1]
N XY - number of times X took from Y
N YX - number of times Y took from X
19
Modelling user type
• Monitor user’s actions regarding file sharing, relative time spent on-line, acts of interrupting service, total balance of user’s giving / taking
• Update a number in [-1, 1] representing user’s cooperativeness
• Motivational actions in the interface triggered by passing certain thresholds
20
Computing user type
• The measure of user cooperativeness at time t C(wt, t) = i * C(w t-1, t-1) + (1 - i) * wt, w [-1,0) (0,1] represent the weight of evidence, where w < 0 is a
selfish act while w > 0 is an altruistic act.
overallBalance = (1/n)*Y (BXY)
userType = (C(wt, t) + overallBalance) /2 If userType is in [-1, -0.5) then user is selfish, if it is in [-0.5) ( 0.5] then
user is reciprocal, and if it is in (0.5, 1] then user is altruistic.
21
Rewarding relationships• People who share a lot of useful files and behave
cooperatively will have more friends• Friends are treated differently
– Transfers not interrupted– Queries processed with priority– Queries are propagated farther
• Queries sent to friends in the area– Higher chance of having relevant files – Faster responses– Better quality of files
• People with more friends get better Quality of Service!
22
Evaluation results - simulation
Comparing the round trip time obtained for queries without a friends’ list with the round trip time for queries with friends’ list
0.00 100.00 200.00 300.00 400.00
1
2
3
Simulation Run
Round Trip Time (ms)
with list
without list
23
Evaluation results – user experiment
0.00
200.00
400.00
600.00
800.00
1000.00
1200.00
1400.00
0.0 1000000.0 2000000.0 3000000.0 4000000.0 5000000.0 6000000.0
Total Size of Files Shared in Bytes
Aver
age Q
uery
Elap
se Ti
me
8 usersover a week
24
• The simulation results show that peers obtain results faster when searching for files in categories for which they have friends
• The user evaluation still underwayDoes the QoS reward motivate participation?
Summary of results
25
Social awareness
In cities, the sidewalks provide the right kinds and numbers of interactions from which neighborhoods emerge.
In isolation, selfishness is logical
To gain perspective, users needfeedback about their social environment
26
A matter of scale
An astronomical metaphor
• Provides visual feedback
• Resolves scale
• Attractive & interesting
27
Views of the community• Connectivity (currently reachable peers)
• Ranking of peers by contribution
• number of shared files
• balance of relationships
• Papers shared by each peer
• Interests of each peer
28
Architecture
Server
• Collect info. from peers• Generate community views
Introducing a non-vital server or many servers
Server
Server
31
Personalized views
• Who are my friends in this area?
• How strong is my relationship with them?
• How much have they contributed?
• Do I owe them or do they owe me?
• Which files do they share?
• What have they been searching for / downloading recently?
32
Trust
• We already model the strength of relationships between users– Based on counting # downloads /uploads– We can incorporate an explicit measure of
the quality of resource
• Idea: Let users:– Rate their resources (quality of paper)– Add annotations (summaries) of papers
33
Immediate benefit
• Learning effect: compiling reviews of articles
• Visualization of document ranking in given category of interest: “top 10 list”
Professor / Boss will know who has read and annotated paper and who has not could have a motivation effect on participation.
34
Reputation
• Global reputation of peers can be computed– Ranking of peers based on
• how many highly rated papers they share• how many times they have introduced a new paper in the
system that has become highly rated• how the users’ ratings correlate with those of their peers
and with high-rank peers
– Emergence of “Power peers”:• What extra rights will they have (reward)? • Could have a motivational effect, as in Slashdot.com
35
Community views• Connectivity (currently reachable peers)
• What are these peers interested in / sharing
• Ranking of peers by contribution
• Shared interest clusters
• Personalized views (who are my friends?)
• Ranking of resources (papers)
• Reputation of peers
36
Updating trust in peers
• Relationships subjective trust in the source of the paper (the other peer)– Trust depends on the evaluation criteria of the peer
• Compare own rating of paper with the rating given by the source
If ratings are sufficiently close, increase trust in source, else decrease trust
– Trust depends on category of interest• Combined trust measures for peers?
• Peers share their trust measures (gossip)
S
P
37
Trust and reputationYao Wang
Ph.D. student
Wang Y., Vassileva J. (to appear) Bayesian Network-Based Trust Model, Proc. of IEEE/ WIC International Conference on Web Intelligence (WI 2003), October 13-17, 2003, Halifax, Canada.
(best paper award nominee)
38
Applying a Bayesian network trust model to COMTELLA
T
File quality Paper category(subject area)
Paper ratingReliability
(download)
39
Future work
• Incorporating a trust & reputation mechanism into Comtella:– to protect from malicious file-sharers– to ensure that users share papers with
appropriate peers and benefit most from their articles and comments
40
“take-home” messages• Motivating user participation is crucial
• Building in mechanisms for trust and reputation
– Encouraging contribution building relationships– Rewards by better quality of service reputation / visibility– Techniques:
– Modeling user interests, relationships, user type– Creating community awareness through visualization
– Will allow users to find reputable sources – May protect community from malicious or irresponsible peers