CHI2007 talk on Conflicts in Wikipedia

Preview:

DESCRIPTION

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed H. Chi.He Says, She Says: Conflict and Coordination in Wikipedia.In Proc. of ACM Conference on Human Factors in Computing Systems (CHI2007), pp. 453--462, April 2007. ACM Press. San Jose, CA.http://www-users.cs.umn.edu/~echi/papers/2007-CHI/2007-Wikipedia-coordination-PARC-CHI2007.pdf

Citation preview

He Says, She Says: Conflict and Coordination in Wikipedia

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group

Palo Alto Research Center

What is Wikipedia?

“Wikipedia is the best thing ever. Anyone in the world can write anything they want about any subject, so you know you’re getting the best possible information.”

– Steve Carell, The Office

Spreading conflict

Spreading conflict

Spreading conflict

Spreading conflict

Spreading conflict

Policy and procedure

“The degree of success that one meets in dealing with conflicts... often depends on the efficiency with which one can quote policy and precedent.” - Wikipedia admin (survey

data)

Collaborative work beneath the surface

• Visitors only look at article pages• But much of Wikipedia comprised of

other pages– Conflict resolution, coordination, policies and

procedures

Characterizing coordination and conflict

Characterizing coordination and conflict

Exponential growth

Costs of growth

• Increase in conflict and coordination costs– Software development (Boehm, 1981; Brooks, 1975)

– MUDs/MOOs (Curtis, 1992; Dibbell, 1993)

– Mailing lists (Sproull & Kiesler, 1991)

• How has growth affected Wikipedia?– Millions of new users and articles

Infrastructure

• Analyze entire history of Wikipedia– Every edit to every article

• Large amount of data– 4+ million pages– 58+ million revisions– 800+ Gb– as of June 2006

• Distributed processing– Hadoop distributed filesystem– Map/reduce to process data in parallel

Types of work

Direct work Immediately consumable

Indirect workCoordination,

conflict

Maintenance work Reverts, vandalism

Article Talk, user, procedure

Less direct work

• Decrease in proportion of edits to article page

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion

70%

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it P

rop

ort

ion

More indirect work

• Increase in proportion of edits to user talk

8%

More indirect work

• Increase in proportion of edits to user talk

• Increase in proportion of edits to procedure

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

2001 2002 2003 2004 2005 2006

Edi

t pr

opor

tion 11

%

More maintenance work

• Increase in proportion of edits that are reverts

00.020.040.060.08

0.10.120.140.160.18

0.2

2001 2002 2003 2004 2005 2006

Ed

it p

rop

ort

ion

7%

More wasted work

• Increase in proportion of edits that are reverts

• Increase in proportion of edits reverting vandalism

00.005

0.010.015

0.02

0.0250.03

2001 2002 2003 2004 2005

Ed

it p

rop

ort

ion

1-2%

Global level

• Conflict and coordination costs are growing– Less direct work (articles)+ More indirect work (article talk, user,

procedure)+ More maintenance work (reverts, vandalism)

60%

65%

70%

75%

80%

85%

90%

95%

100%

2001 2002 2003 2004 2005 2006

Pe

rce

nta

ge

of t

ota

l ed

its

Article

User

Article Talk

User Talk

Other

Maintenance

Characterizing coordination and conflict

Conflict at the article level

• What defines conflict in articles?• Build a characterization model of article

conflict– Identify page features and metrics

associated with conflict– Automatically identify high-conflict articles

Page metrics

• Chose metrics for identifying conflict in articles– Easily computable, scalable

Metric type Page Type

Revisions (#)Article, talk, article/talk

Page lengthArticle, talk, article/talk

Unique editorsArticle, talk, article/talk

Unique editors / revisions

Article, talk

Links from other articles Article, talk

Links to other articles Article, talk

Anonymous edits (#, %) Article, talk

Administrator edits (#, %)

Article, talk

Minor edits (#, %) Article, talk

Reverts (#, by unique editors)

Article

Defining conflict

• Operational definition for conflict • Revisions tagged controversial

• Conflict revision count

Machine learning

• Predict conflict from page metrics– Training set of “controversial” pages– Support vector machine regression

predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998)

• Not just conflict/no conflict, but how much conflict

Performance: Cross-validation

• 5x cross-validation, R2 = 0.897

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Predicted controversial revisions

Act

ual c

ontrov

ersial

revi

sion

s

Performance: Cross-validation

• 5x cross-validation, R2 = 0.897

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Predicted controversial revisions

Act

ual c

ontrov

ersial

revi

sion

s

Determinants of conflict

1. —Revisions (talk)2. —Minor edits (talk)3. ˜Unique editors (talk)4. —Revisions (article)5. ˜Unique editors (article)6. —Anonymous edits (talk)7. ˜Anonymous edits (article)

Highly weighted metrics of conflict model:

Identifying untagged articles

• Detect conflicts for unlabeled articles– Majority of articles have never been conflict

tagged

• Testing model generalization– Applied model to untagged articles– Sample rated by expert Wikipedians

• Significant positive correlation with predicted scores– By rank correlation, p < 0.013 (Spearman’s

rho)

Characterizing coordination and conflict

Conflict at the user level

• How can we identify conflict between users?

• Reverts as a proxy for user conflict• Revert patterns between users• Force directed layout to cluster users

– Group similar viewpoints– Find conflicts between groups

Dokdo/Takeshima opinion groups

Group A

Group B Group C

Group D

Terry Schiavo

Mediators

Sympathetic to parents

Sympathetic to husband

Anonymous (vandals/spammers)

Summary: Characterizing Wikipedia

• Coordination costs and conflict are increasing

• Global-level: Trend identification– Decrease in direct article work– Increase in indirect coordination work– Increase in maintenance work

• Article-level: Prediction using Machine learning– Identify characteristics of article conflict– Detect conflict-heavy articles needing extra

attention

• User-level: User Conflict Visualization– Make sense of user conflicts and identify shared

viewpoints

Future Work

• Applied to many domains– Corporate memory (Socialtext)– Intelligence gathering (Intellipedia)– Scholarly research (Scholarpedia)– Collaborative problem solving (Lostpedia)

• Application: Social Dashboard– Identify high conflict articles– Surface editing patterns to readers– Route attention to articles that need it most

Future work

He Says, She Says: Conflict and Coordination in Wikipedia

Aniket Kittur, Bongwon Suh, Bryan Pendleton, Ed ChiUCLA Augmented Social Cognition Group

Palo Alto Research Center

Thank you!

Recommended