Upload
ed-chi
View
1.366
Download
0
Embed Size (px)
DESCRIPTION
Presentation given at Wikimedia Foundation and WikiSym 2009 conference on the surprising data that show a change in how Wikipedia is growing, and how the editor population might be changing inside the system. Also shows the resistance to new content, and population shifts.
Citation preview
The Singularity is Not Near:Slowing Growth of Wikipedia
Bongwon Suh,
Gregorio Convertino, Ed H. Chi, Peter Pirolli
Augmented Social Cognition Area
Palo Alto Research Center
Exponential Growth of Wikipedia
Number of Articles (Log Scale)
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia’s_growth
Active Editors
Growth of Monthly Edits
Monthly Edits
Something happened in early 2007
Monthly Edits
Slowing Growth in Global Activity
Monthly Active Editors
So What Happened?
No major incident reported– Wikipedia temporarily blocks the entire
nation of Qatar by mistake– Microsoft offers cash for Wikipedia edit– “Essjay” controversy
Characterizing the evolution of contribution patterns inside Wikipedia– Data-driven approach
http://www.flickr.com/photos/lightpaintbrush/3747649160/
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve
– Which model can explain the slowing growth?
2. Activity Patterns by Editor Classes
3. Analysis of Population by Editor Class
Global Activity of Wikipedia
Edit Activity
Active Editor
Earlier Exponential Growth Model
Number of Articles
Log Scale Linear Scale
Earlier Exponential Growth Model
Edits beget edits– more number of previous edits, more number of new edits
€
N(t) = N0 ⋅ert
€
dN
dt= r ⋅N
Growth rate of population
Currentpopulation
Growth rate depends on current population Nr = growth rate of the population
Logistic Growth Model
Ecological population growth model– r, growth rate of the population– K, carrying capacity (due to resource limitation)
€
dN
dt= r ⋅N ⋅(1−
N
K)
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20100
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Year
Po
pu
lati
on
K
http://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia’s_growth
Number of New Articles
Follows a logistic growth curve
New Article
A Modified Logistic Model Carrying Capacity as a function of time.
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
Year
Po
pu
lati
on
K(t)
Struggle for Existence - Darwin
Biological system– Competition increases as
population hit the limits of the ecology
– Advantage go to members of the population that have competitive dominance over others
Analogy– Limited opportunities to make
novel contributions– Increased patterns of conflict
and dominance
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve
– Which model can explain the slowing growth?
2. Activity Patterns by Editor Classes
3. Analysis of Population by Editor Class
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve.
– Logistic growth is now a better explanation
2. Activity Patterns by Editor Classes– Patterns of conflict and dominance?
3. Analysis of Population by Editor Class
Defining Editor Classes
Highly skewed contribution pattern– Top 3% users contribute 50%+ edits– A lot of single-edit users
Five Editor Classes– Monthly edit count– No bot, vandalism included in the analysis– 1000+: editors who made more than 1000 edits in that month– 100-999– 10-99– 2-9– 1
Activity By Editor ClassesMonthly Edits by Editor Class (in thousands)
Monthly % of Edits by Editor Class
Revert as a measure of resistance
Revert: Undoing one or more edits– The page being restored to a version that
existed sometime previously. – Often used to fight vandalism
Revert ratio as resistance metric– # of reverted edits / # of total edit– This analysis excludes vandalism to
model “resistance”
Ratio of Reverted Contribution
Monthly Ratio of Reverted Edits
A take away: Growing Resistance
Two interpretations:– Overall increased resistance
from the Wikipedia community to changing content
– Disparity of treatment of edits» Occasional editors have been
reverted in a higher rate
Example of increased patterns of conflict and dominance
Photo: http://www.flickr.com/photos/efan78/3619921561/
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve.– Logistic growth is now a better explanation
2. Activity Patterns by Editor Classes– Patterns of conflict and dominance?
3. Analysis of Population by Editor Class
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve.– Logistic growth is now a better explanation
2. Activity Patterns by Editor Classes– Yes. We see patterns of conflict and dominance
– Different editor classes experience differing resistances
3. Analysis of Population by Editor Class
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve.– Logistic growth is now a better explanation
2. Activity Patterns by Editor Classes– Patterns of conflict and dominance
– Different editor classes experience differing resistances
3. Analysis of Population by Editor Class– Are the population stable?
Population AnalysisMonthly Active Editors * Log scale
Population by Editor Class
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve.– Logistic growth is now a better explanation
2. Activity Patterns by Editor Classes– Patterns of conflict and dominance– Different editor classes experience differing resistances
3. Analysis of Population by Editor Class– Are the population stable?
Slowing Growth of Wikipedia
1. Global Activity– No longer following an exponential growth curve.– Logistic growth is now a better explanation
2. Activity Patterns by Editor Classes– Patterns of conflict and dominance– Different editor classes experience differing resistances
3. Analysis of Population by Editor Class– Middle class a smaller portion of the pyramid
It depends…
So… Is It Bad?
http://alfalfamedia.com/wp-content/uploads/2009/02/content-is-king.jpghttp://www.collegesofdistinction.com/collegetemplates/images/2386/community.jpg
If what you care is sustainability of the community…
Community is King– Sustainability of a system– Member to leader– Shrinking middle class suggest less
resilience in a community– Connects with Population ecological
research [Gunderson & Holling 2001]
Illustration: http://www.thestar.com/article/540066
Factors Affecting the Growth of Wikipedia
People-ware– Growing resistance to changing content– Coordination cost and bureaucracy
Knowledge-ware: Availability of easy topics to write about Tool-ware: Quality of tools used by editors and admins
http://www.aerostich.com/http://www.mikestreetmedia.co.uk/blog/wp-content/uploads/2009/01/knowledge.jpghttp://youropenbook.agitprop.co.uk/growing.php?p=2
Thank you
Bongwon Suh [email protected]
Augmented Social Cognition Area
Palo Alto Research Center
Monthly Ratio of Reverted Edits
Ecological Growth Model
r, growth rate K, carrying capacity
€
dN
dt= rN(1−
N
K)
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 20100
500000
1000000
1500000
2000000
2500000
3000000
3500000
4000000
Year
r dominateswhen N is small
K dominates when N K ⇒
€
(1−N
K) ≈1
€
(1−N
K) ≈ 0
r/K Selection Theory
r-Strategist– Growth or exploitation– Less-crowded niches / produce many offspring
K-Strategist– Conservation– Strong competitors in crowded niches / invest more heavily in
fewer offspring
Evolution cycle– Resilience of an ecological system– Gunderson & Holling 2001
User-Block and Vandalism
Page-Protection and Vandalism
Moore’s Law – Ray Kurzweil’s Extension
http://en.wikipedia.org/wiki/Accelerating_change
Exponential Growth of Computing
http://en.wikipedia.org/wiki/Accelerating_change
The Singularity
“The Singularity is Near” by Ray Kurzweil (2005)
Year 2045– $1000 buys a computer a billion times more
powerful than the human brain.– A.I. surpass human beings as the smartest
and most capable life forms on the Earth.– An extremely disruptive, world-altering event
that forever changes the course of human history.
– There will be a movie!
Logistic Growth Model
Exponential growth model– Growth rate depends on the current N
Ecological population growth model– r, growth rate of the population– K, carrying capacity (due to resource limitation)
€
dN
dt= rN(1−
N
K)
€
dN
dt= r *N