Building Real Time, Open-Source Tools for Wikipedia

Preview:

Citation preview

WikiWash: Ideation to ProductA tool for uncovering spin on Wikipedia

Rob KenediEntrepreneur-In-Residence, @twg

@rkenedi

1. Context

2. The problem

3. TWG’s approach

4. The solution: WikiWash

5. Lessons learned

6. What’s next

Table of contents

Context

TechrakingConnecting journalists with technologists

and designers to address problems around

the world

The Problem

Today, some of the most relevant stories can only be told by poring over datasets and crunching numbers in Excel. It’s imperative reporters have tools to find the stories hidden in the data.

-Luke Simcoe, Data Journalist, Metro News Canada

Currently, English Wikipedia includes 4,852,854 articles.More than 800 new articles are added every single day. *

*Source: Wikipedia

There is no political power without control of the archive, if not of memory.

-Jacques Derrida (1998)

TWG’s Approach

Unique ValuePropositions

Problem

Spin is introduced into Wiki pages by biased edits

Can’t connect edits with users, or uncover agendas / story angles

Can’t get the data out of the system

Hard to vizualize data to find patterns

Can’t track changes to pages (relating to branded entities)

Can’t find all brand references on Wikipedia

Wikipedia perceived to be susceptible to biased revisions.

Very hard to track revisions on Wikipedia, either historically or as they occur.

Associate page edits with users, and download the data

Ability to compare multiple pages to uncover patterns in edits, and download the data

Ability to track activity and alert to edit activity / trends (that may indicate bias intent)

No. pages ‘un-washed’

Number of connections / biased edits uncovered

Number of edits to Wikipedia pages caused by uncovered biases

Number of stories published siting data from the site

Viral, word of mouth

Partnerships with print / online media organizations - cross promotion

Social media referrals

PPC, Display, Email, SEO

Clearly demonstrate

connections between

Wikipedia page edits and

the users making those

edits.

Ability to track and uncover spin and malicious edits.

Track page edits in near-real-time, and offer alerts that uncover trends and emerging stories.

Developed by and for working reporters

Reporters

Activists

Academics & Students

Citizen Journalists

Hig

h U

se$

PR & Media

Brand Stakeholders

Wikipedia

Existing Wikipedia revision history page

Wikistats

Wikiwatchdog

Article Revision Stats, Wiki Blame, etc

IT Infrastructure

Continuous reporting / scraping (unless partner up with Wikipedia)

Marketing & Promotion

Free for single use on historic data/edits

Subscription model for activity alerts and real-time tracking (uncover breaking stories / bias)

Competitors / Comparables Cost Structure Revenue Streams

Solution Unfair Advantages

ChannelsKey Metrics

Customer Segments

Unique ValuePropositions

Problem

Spin is introduced into Wiki pages by biased edits

Can’t connect edits with users, or uncover agendas / story angles

Can’t get the data out of the system

Hard to vizualize data to find patterns

Can’t track changes to pages (relating to branded entities)

Can’t find all brand references on Wikipedia

Wikipedia perceived to be susceptible to biased revisions.

Very hard to track revisions on Wikipedia, either historically or as they occur.

Associate page edits with users, and download the data

Ability to compare multiple pages to uncover patterns in edits, and download the data

Ability to track activity and alert to edit activity / trends (that may indicate bias intent)

No. pages ‘un-washed’

Number of connections / biased edits uncovered

Number of edits to Wikipedia pages caused by uncovered biases

Number of stories published siting data from the site

Viral, word of mouth

Partnerships with print / online media organizations - cross promotion

Social media referrals

PPC, Display, Email, SEO

Clearly demonstrate

connections between

Wikipedia page edits and

the users making those

edits.

Ability to track and uncover spin and malicious edits.

Track page edits in near-real-time, and offer alerts that uncover trends and emerging stories.

Developed by and for working reporters

Reporters

Activists

Academics & Students

Citizen Journalists

Hig

h U

se$

PR & Media

Brand Stakeholders

Wikipedia

Existing Wikipedia revision history page

Wikistats

Wikiwatchdog

Article Revision Stats, Wiki Blame, etc

IT Infrastructure

Continuous reporting / scraping (unless partner up with Wikipedia)

Marketing & Promotion

Free for single use on historic data/edits

Subscription model for activity alerts and real-time tracking (uncover breaking stories / bias)

Competitors / Comparables Cost Structure Revenue Streams

Solution Unfair Advantages

ChannelsKey Metrics

Customer Segments

The Solution: WikiWash

WikiWash decreases spin and bias on Wikipedia, by holding those making changes accountable

How does it do that?• Realtime

• Open source

• Export your data

• Free!

• Works with Wikipedia’s API

• Built in Javascript

• Uses Node.js, Express.js, Angular.js, Socket.IO

to facilitate involvement from others

http://blog.twg.ca/2014/11/building-wikiwash/

wikiwash.org

Limited by API; caching data

Lessons LearnedFocus on realtime changes

Trending articles aid understanding

Focus on product first, aesthetics second

Wikitext

Wikipedia API Load Times

Preemptive Caching

Limited by API; caching data

Lessons LearnedFocus on realtime changes

Trending articles aid understanding

Focus on product first, aesthetics second

Limited by API; caching data

Lessons LearnedFocus on realtime changes

Trending articles aid understanding

Focus on product first, aesthetics second

Limited by API; caching data

Lessons LearnedFocus on realtime changes

Trending articles aid understanding

Focus on product first, aesthetics second

What’s Next?

• Notifications via email

• Website embed capability

• Access to Wikipedia’s firehose

• UX improvements

• Language support

• Next / previous navigation

Feature RoadmapWHAT’S NEXT

• Clear product ownership

• Product / market fit

• Pirate Metrics as a guide

• Qualitative & quantitative feedback

• Incrementally invest until inflection point

How TWG Decides Next StepsWHAT’S NEXT

Fork the project on Github to improve it

github.com/twg/wikiwash

Send us your feedback, ideas and feature requests

wikiwash@twg.ca

Let’s talk about how to bring digital products to market or a live demo of WikiWash

Thank You

@rkenedi

rkenedi@twg.ca