85
From publisher to platform: How the Guardian embraced the internet using content, search, and Open Source Stephen Dunn, Guardian News and Media [email protected], 25th May, 2011 Twitter: @cuica, @openplatform Thursday, 26 May 2011

How The Guardian Embraced the Internet using Content , Search, and Open Source

Embed Size (px)

DESCRIPTION

In 2009 The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications with The Guardian's rich content. The content API, hosted on Solr instances on EC2, contains JSON representations of all Guardian articles back to 1999 - over 1 million articles, and is an increasingly complete representation of the output of the organisation. The DataStore contains curated data sets for use in applications and virtualizations.

Citation preview

Page 1: How The Guardian Embraced the Internet using Content , Search, and Open Source

From publisher to platform: How the Guardian embraced the internet using content, search, and Open Source

Stephen Dunn, Guardian News and [email protected], 25th May, 2011

Twitter: @cuica, @openplatform

Thursday, 26 May 2011

Page 2: How The Guardian Embraced the Internet using Content , Search, and Open Source

1

From publisher to platformHow the Guardian embraced the Internet using content, search, and Open SourceStephen Dunn, Guardian News and Media

2

Thursday, 26 May 2011

Page 3: How The Guardian Embraced the Internet using Content , Search, and Open Source

The publishing era

3

Thursday, 26 May 2011

Page 4: How The Guardian Embraced the Internet using Content , Search, and Open Source

We started a long time ago:

Thursday, 26 May 2011

Page 5: How The Guardian Embraced the Internet using Content , Search, and Open Source

Swine flu

Keyword page

Twitter updates

Content partnerships

Audio

Video Open platform API

Live blogs

Comment

Mobile siteApps

Newspapers

Thursday, 26 May 2011

Page 6: How The Guardian Embraced the Internet using Content , Search, and Open Source

To secure the financial and editorial independence of the Guardian in perpetuity. To promote freedom in the press and liberal journalism globally.

To become the world's leading liberal voice.

To secure the financial and editorial independence of the Guardian in perpetuity

To promote freedom in the press and liberal journalism globally

Thursday, 26 May 2011

Page 7: How The Guardian Embraced the Internet using Content , Search, and Open Source

7

Open Web Principles

Thursday, 26 May 2011

Page 8: How The Guardian Embraced the Internet using Content , Search, and Open Source

8

2009

Thursday, 26 May 2011

Page 9: How The Guardian Embraced the Internet using Content , Search, and Open Source

• “A cool URI is one that does not change” Tim Berners-Lee 1998• 1.5 million resources redirected to new scheme

9

1. Permanent

http://www.flickr.com/photos/fstorr/

Thursday, 26 May 2011

Page 10: How The Guardian Embraced the Internet using Content , Search, and Open Source

10

2. Addressable★ Resources are “about” something - ready for the

social web.

★ We live in “the age of point-at-things” (Coates 2005)

Thursday, 26 May 2011

Page 11: How The Guardian Embraced the Internet using Content , Search, and Open Source

11

★ Multiple routes to content

★ Tagging drives discovery

3. Discoverable

Thursday, 26 May 2011

Page 12: How The Guardian Embraced the Internet using Content , Search, and Open Source

12

4. Open

Thursday, 26 May 2011

Page 14: How The Guardian Embraced the Internet using Content , Search, and Open Source

Results...

14

Thursday, 26 May 2011

Page 15: How The Guardian Embraced the Internet using Content , Search, and Open Source

15

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Oct 2006 Nov 2007 Dec 2008

Uni

que

Use

rs Pre - project

First release

Final ReleaseSite traffic growthUnique Users

40M

Thursday, 26 May 2011

Page 16: How The Guardian Embraced the Internet using Content , Search, and Open Source

However...

16

Thursday, 26 May 2011

Page 17: How The Guardian Embraced the Internet using Content , Search, and Open Source

17

1 Billion+Internet Users!

Thursday, 26 May 2011

Page 18: How The Guardian Embraced the Internet using Content , Search, and Open Source

18

Thursday, 26 May 2011

Page 19: How The Guardian Embraced the Internet using Content , Search, and Open Source

19

Thursday, 26 May 2011

Page 20: How The Guardian Embraced the Internet using Content , Search, and Open Source

20

Thursday, 26 May 2011

Page 21: How The Guardian Embraced the Internet using Content , Search, and Open Source

21

...“How I stopped worrying about my website and learned to love the whole internet.”

Matt McAlister

Thursday, 26 May 2011

Page 22: How The Guardian Embraced the Internet using Content , Search, and Open Source

22

The Open Strategy

OPEN IN

Bring in data and apps from the Internet

OPEN OUT

Enable partners to build applications using Guardian content and services for other platforms

Thursday, 26 May 2011

Page 23: How The Guardian Embraced the Internet using Content , Search, and Open Source

23

Thursday, 26 May 2011

Page 24: How The Guardian Embraced the Internet using Content , Search, and Open Source

24

"Our most interesting experiments lie in combining what we know with the experience, opinions and expertise of the people who want to participate rather than passively receive.”

Thursday, 26 May 2011

Page 25: How The Guardian Embraced the Internet using Content , Search, and Open Source

25

Thursday, 26 May 2011

Page 26: How The Guardian Embraced the Internet using Content , Search, and Open Source

26

Thursday, 26 May 2011

Page 27: How The Guardian Embraced the Internet using Content , Search, and Open Source

27

Thursday, 26 May 2011

Page 28: How The Guardian Embraced the Internet using Content , Search, and Open Source

28

Thursday, 26 May 2011

Page 29: How The Guardian Embraced the Internet using Content , Search, and Open Source

29

Thursday, 26 May 2011

Page 30: How The Guardian Embraced the Internet using Content , Search, and Open Source

30

Thursday, 26 May 2011

Page 31: How The Guardian Embraced the Internet using Content , Search, and Open Source

31

Thursday, 26 May 2011

Page 32: How The Guardian Embraced the Internet using Content , Search, and Open Source

32

Thursday, 26 May 2011

Page 33: How The Guardian Embraced the Internet using Content , Search, and Open Source

33

Thursday, 26 May 2011

Page 34: How The Guardian Embraced the Internet using Content , Search, and Open Source

34

“The Guardian alongside Al Jazeera was the one news source that everybody on the streets in Tahrir - not just in Cairo but in surrounding cities and major centers of revolutionary activity - that people were talking about.”

Jack Shenker

Thursday, 26 May 2011

Page 35: How The Guardian Embraced the Internet using Content , Search, and Open Source

3522

The Open Strategy

OPEN IN

Bring in data and apps from the Internet

OPEN OUT

Enable partners to build applications using Guardian content and services for other platforms

Thursday, 26 May 2011

Page 36: How The Guardian Embraced the Internet using Content , Search, and Open Source

36

The Open Platform

Thursday, 26 May 2011

Page 37: How The Guardian Embraced the Internet using Content , Search, and Open Source

37

The suite of services enabling partners to build applications with

the Guardian

Thursday, 26 May 2011

Page 38: How The Guardian Embraced the Internet using Content , Search, and Open Source

3822

OPEN IN

Bring in data and apps from the Internet

OPEN OUT

Enable partners to build applications using Guardian content and services for other platforms

Thursday, 26 May 2011

Page 39: How The Guardian Embraced the Internet using Content , Search, and Open Source

CONTENT APIA service for selecting and

collecting content from the Guardian

for re-use

DATA STOREA directory of

useful data curated by Guardian editors

POLITICS APIOpen database of candidates, voting records, constituencies, election results,

live data on election day

Thursday, 26 May 2011

Page 40: How The Guardian Embraced the Internet using Content , Search, and Open Source

Mutualised news!

40

Thursday, 26 May 2011

Page 41: How The Guardian Embraced the Internet using Content , Search, and Open Source

Mutualised news!

41

Thursday, 26 May 2011

Page 42: How The Guardian Embraced the Internet using Content , Search, and Open Source

Mutualised news!

42

Thursday, 26 May 2011

Page 43: How The Guardian Embraced the Internet using Content , Search, and Open Source

43

Thursday, 26 May 2011

Page 44: How The Guardian Embraced the Internet using Content , Search, and Open Source

44

Thursday, 26 May 2011

Page 45: How The Guardian Embraced the Internet using Content , Search, and Open Source

45

Thursday, 26 May 2011

Page 46: How The Guardian Embraced the Internet using Content , Search, and Open Source

46

Thursday, 26 May 2011

Page 47: How The Guardian Embraced the Internet using Content , Search, and Open Source

DATA STOREA directory of

useful data curated by Guardian

editors

Thursday, 26 May 2011

Page 48: How The Guardian Embraced the Internet using Content , Search, and Open Source

POLITICS APIOpen database of candidates, voting

records, constituencies, election results, live data on election day

Thursday, 26 May 2011

Page 49: How The Guardian Embraced the Internet using Content , Search, and Open Source

POLITICS APIOpen database of candidates, voting

records, constituencies, election results, live data on election day

49

Thursday, 26 May 2011

Page 50: How The Guardian Embraced the Internet using Content , Search, and Open Source

<OBLIGATORY DOGFOOD SLIDE >

50

Thursday, 26 May 2011

Page 51: How The Guardian Embraced the Internet using Content , Search, and Open Source

51

Thursday, 26 May 2011

Page 52: How The Guardian Embraced the Internet using Content , Search, and Open Source

Thursday, 26 May 2011

Page 53: How The Guardian Embraced the Internet using Content , Search, and Open Source

Thursday, 26 May 2011

Page 54: How The Guardian Embraced the Internet using Content , Search, and Open Source

Thursday, 26 May 2011

Page 55: How The Guardian Embraced the Internet using Content , Search, and Open Source

Thursday, 26 May 2011

Page 56: How The Guardian Embraced the Internet using Content , Search, and Open Source

56

Open for Business

Thursday, 26 May 2011

Page 57: How The Guardian Embraced the Internet using Content , Search, and Open Source

57

3 Tiers of access3 Revenue models

Keyless: Take our headlines. You keep associated revenues.

Approved: Take our full article content, but with an advert. Guardian keeps ad revenue, you keep rest-of-page revenue.

Bespoke: Take, reformat, augment our content Revenue model to be negotiated. Combination of Media, Fees, Downloads.

Thursday, 26 May 2011

Page 58: How The Guardian Embraced the Internet using Content , Search, and Open Source

58

Thursday, 26 May 2011

Page 59: How The Guardian Embraced the Internet using Content , Search, and Open Source

59

What this means

Open Out: Developers can now access full content APIs on demand with keys post-approved

Platform is positioned as a place to do business

So rapid scalability, reliability and performance are now core requirements

Thursday, 26 May 2011

Page 60: How The Guardian Embraced the Internet using Content , Search, and Open Source

OPEN INBring in data and

apps from the internet

OPEN OUTAllow partners to build applications using Guardian

content and services for other

platforms

Thursday, 26 May 2011

Page 61: How The Guardian Embraced the Internet using Content , Search, and Open Source

61

A framework for integrating 3rd party

applications into guardian.co.uk

MICROAPPS Simple REST/HTTP framework allows lightweight development

Applications proxied for performance

Apps generally hosted in the cloud, allows hot deployment into production

Thursday, 26 May 2011

Page 62: How The Guardian Embraced the Internet using Content , Search, and Open Source

62

A framework for integrating 3rd party

applications into guardian.co.uk

MICROAPPS

Thursday, 26 May 2011

Page 63: How The Guardian Embraced the Internet using Content , Search, and Open Source

• What could I cook?

Thursday, 26 May 2011

Page 64: How The Guardian Embraced the Internet using Content , Search, and Open Source

64

Bringing it together

Thursday, 26 May 2011

Page 65: How The Guardian Embraced the Internet using Content , Search, and Open Source

65

Thursday, 26 May 2011

Page 66: How The Guardian Embraced the Internet using Content , Search, and Open Source

App showcase

66

Thursday, 26 May 2011

Page 67: How The Guardian Embraced the Internet using Content , Search, and Open Source

67

From publisher to platform

Seeking massive growth, but no longer only broadcasting content on the website

User/partner engagement & contribution onJournalismdatasoftwareapplicationsrevenue and ads

Support developers and partners with data and APIs,need scalability, reliability, speed

Thursday, 26 May 2011

Page 68: How The Guardian Embraced the Internet using Content , Search, and Open Source

68

Evolving the architecture

Thursday, 26 May 2011

Page 69: How The Guardian Embraced the Internet using Content , Search, and Open Source

App server App server App server

Web server Web server Web server

CMS

Oracle

Memcached (added later)

Thursday, 26 May 2011

Page 70: How The Guardian Embraced the Internet using Content , Search, and Open Source

App server App server App server

Web server Web server Web server

CMS

Oracle

Memcached

Why RDBMS?

5 years ago, fewer alternatives

Understand operations procedures

Can easily recruit DBAs / devs

Developer/ops tools

Business critical system: a safe choice

Thursday, 26 May 2011

Page 71: How The Guardian Embraced the Internet using Content , Search, and Open Source

71

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Sep 2006 Sep 2007 Sep 2008

Uni

que

Use

rs

Scaling trafficUnique Users

Thursday, 26 May 2011

Page 72: How The Guardian Embraced the Internet using Content , Search, and Open Source

72

Thursday, 26 May 2011

Page 73: How The Guardian Embraced the Internet using Content , Search, and Open Source

73

Thursday, 26 May 2011

Page 74: How The Guardian Embraced the Internet using Content , Search, and Open Source

74

Thursday, 26 May 2011

Page 75: How The Guardian Embraced the Internet using Content , Search, and Open Source

75

Thursday, 26 May 2011

Page 76: How The Guardian Embraced the Internet using Content , Search, and Open Source

76

Thursday, 26 May 2011

Page 77: How The Guardian Embraced the Internet using Content , Search, and Open Source

77

Thursday, 26 May 2011

Page 78: How The Guardian Embraced the Internet using Content , Search, and Open Source

We chose Solr/Lucene

78

Can perform complex queries, including full-text search

We can change the schema with no downtime

Most queries are of similar cost

Scales very well horizontally

“Just worked” in the cloud

No strange control processes/engines

Developers just loved working with it!

Thursday, 26 May 2011

Page 79: How The Guardian Embraced the Internet using Content , Search, and Open Source

79

Thursday, 26 May 2011

Page 80: How The Guardian Embraced the Internet using Content , Search, and Open Source

App server

Web servers

CMS

Memcached

RDBMS

80

Solr

Solr

Solr

Solr

Solr

Solr

Cloud, EC2

Api

Thursday, 26 May 2011

Page 81: How The Guardian Embraced the Internet using Content , Search, and Open Source

8122

OPEN IN

Bring in data and apps from the Internet

OPEN OUT

Enable partners to build applications using Guardian content and services for other platforms

What about Open In?

Thursday, 26 May 2011

Page 82: How The Guardian Embraced the Internet using Content , Search, and Open Source

App server

Web servers

CMS

Memcached

RDBMS

82

App

App

App

App

App

App

Apps

Proxy

external hostingapp engine etc

Thursday, 26 May 2011

Page 83: How The Guardian Embraced the Internet using Content , Search, and Open Source

App server

Web servers

CMS

Memcached

Solr

Core

Solr

Solr

Solr

Solr

Solr

Cloud, EC2

Out

App

App

App

App

App

App

In

Proxy

external hostingapp engine etc

rdbms

83

Thursday, 26 May 2011

Page 84: How The Guardian Embraced the Internet using Content , Search, and Open Source

84

Thursday, 26 May 2011

Page 85: How The Guardian Embraced the Internet using Content , Search, and Open Source

85

Thursday, 26 May 2011