Web-scale IA using Linked Open Data

Preview:

DESCRIPTION

A preview of the talk I'll be giving at the 2014 IA Summit in San Diego. An introduction to the web of data, and how the BBC and other organisations create products which remix original content with third-party data.

Citation preview

@MikeAtherton | #IAS14

LINKED OPEN DATAMike Atherton

RedUXD

The path ahead

WEB-SCALE IA USING

@MikeAtherton | #IAS14

@MikeAtherton | #IAS14

Tim Berners-Lee 1989

@MikeAtherton | #IAS14

Defining standards• Use a common format for publishing

documents (HTML)

• Use a common system of addresses to identify and locate documents (URL)

• Establish a method of contextual linking between documents (HREF hyperlink)

@MikeAtherton | #IAS14 Actual feedback on Tim Berners-Lee’s proposal

@MikeAtherton | #IAS14 WEBSITES! WEBSITES! PARTY TIME! EXCELLENT!

@MikeAtherton | #IAS14

What wonderful things we wrote for people!

@MikeAtherton | #IAS14

@MikeAtherton | #IAS14

As humans we can extract meaning and context from documents automatically.

Spot the difference.

@MikeAtherton | #IAS14

The context of keywords doesn’t travel with them.

Tag: “Apple”

@MikeAtherton | #IAS14

We can pick out the important things and relationships just by reading.

For humans, the distinction between documents and data is subtle.

@MikeAtherton | #IAS14

By defining real-world things, we can teach computers the relationships between those things.

Computers need to be told which things our documents contain.

@MikeAtherton | #IAS14

If a computer knows what ‘Mount Everest’ is and what ‘tall’ means, it can do the legwork for us.

“How tall is Mount Everest?”

@MikeAtherton | #IAS14

By understanding terms and linking to data services, computers can even find out things they don’t know.

“Where can I get a beer?”

@MikeAtherton | #IAS14

Actual queries from Facebook’s Graph Search tool.

Cross-referencing data points gives new insight.

http://actualfacebookgraphsearches.tumblr.com/

@MikeAtherton | #IAS14

TED conference 2009

@MikeAtherton | #IAS14

Use web addresses to represent real-world things

Tim Berners-Lee

Rule #1 of data publishing

@MikeAtherton | #IAS14

Return useful data about each resource, in a standard format.

Tim Berners-Lee

Rule #2 of data publishing

@MikeAtherton | #IAS14

Include links to other data, so people can discover more things.

Tim Berners-Lee

Rule #3 of data publishing

@MikeAtherton | #IAS14

Linked data• Use web addresses to represent real-world

things

• Return useful data about each resource, in a standard format.

• Include links to other data, so people can dissever more things.

Data sources combined create more insight than studying them separately.

@MikeAtherton | #IAS14

Researchers attempting to discover new drugs to treat Alzheimer’s Disease.

“Which proteins are involved in signal transduction AND are related to pyramidal neurons?”

Web search 223,000 results, 0 answers

Linked healthcare data query 32 results, 32 answers

@MikeAtherton | #IAS14

We can even create entirely new value propositions from remixing existing content.

Linked data helps us make sense of information.

data.gov.uk Newspaper Hyper-local news publishing

Land Registry Price Paid Historical property data

Voter power Local constituency data

A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014

@MikeAtherton | #IAS14

CONTENT MODELS AT WEB-SCALE

Where next for your content model?

THEME PARKS CONTENT MODEL

Location

Resort

Park

Hotel

Weenie

Land

Meal

Restaurant

Attraction

Character

Creator

Work

locatedIn

hasWeenie

ParentResort locatedIn

locatedIn

hasEvent

features

adaptationOf

adaptationOfhasAttraction

CreatedBy

appearsIn

hasPark

contains

@MikeAtherton | #IAS14

But ideally, those addresses should offer robot-readable data.

Use http web addresses to represent real-world things.

http://disneyland.disney.go.com/attractions/disneyland/haunted-

mansion/

http://www.geonames.org/ontology#locatedIn

http://en.wikipedia.org/wiki/New_Orleans_Square

The Haunted Mansion (is) located in New Orleans Square

@MikeAtherton | #IAS14

The Resource Description Framework is the web’s lingua franca for data integration.

RDF lets different data sources play nice together.

<subject> <predicate> <object>

<Charles Dickens> <is the author of> <Great Expectations>

@MikeAtherton | #IAS14

RDF is an abstract syntax, so all these ‘serialisations’ are equivalent.

RDF can be written in different ways.

RDF/XML

Turtle

<http://dbpedia.org/resource/Charles_Dickens> <http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations> <http://dbpedia.org/resource/Charles_Dickens> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person>

<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <dbpedia-owl:Person xmlns:dbpedia-owl="http://dbpedia.org/ontology/"

rdf:about=“http://dbpedia.org/resource/Charles_Dickens"> <dbpedia-owl:artist rdf:resource=“http://dbpedia.org/resource/Great_Expectations”/>

</dbpedia-owl:Person> </rdf:RDF>

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://dbpedia.org/resource/Charles_Dickens> a <http://dbpedia.org/ontology/Person> ;

<http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations>

N-Triples

@MikeAtherton | #IAS14

Your data

BBC

New York Times

@MikeAtherton | #IAS14

But only for humans! What if we had a common way to define a concept for a robot?

Wikipedia is great for defining individual concepts.

@MikeAtherton | #IAS14

It turns Wikipedia content into machine-readable linked data.

DBpedia is Wikipedia for robots.

@MikeAtherton | #IAS14

Ok computer…• When did Disneyland first open?

• What is its official homepage?

• Who operates the park?

• When is it open?

• What’s it’s theme?

@MikeAtherton | #IAS14

It crowdsources music metadata for use by humans and robots.

MusicBrainz is the open music encyclopedia.

@MikeAtherton | #IAS14

By saying our concept is the ‘same as’ an accepted identifier, we all speak the same language.

Shared identifiers act as intermediaries.

http://dbpedia.org/resource/China

A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014

@MikeAtherton | #IAS14

BBC MUSICHow BBC Music used linked data to get more people listening to the radio.

@MikeAtherton | #IAS14

How can I find out which one I should listen to?

10 national BBC radio stations.

@MikeAtherton | #IAS14

A continuously-updated record of every song played on-air.

The BBC radio playout system was a data goldmine.

@MikeAtherton | #IAS14

The sources combine to create a new and useful product.

Linked data builds a composite picture of the world.

+Which song is playing on the radio now? Who is this artist?

What other stuff has this artist done? What TV or radio clips of this artist do we have?

@MikeAtherton | #IAS14

@MikeAtherton | #IAS14

By exposing common identifiers, your website becomes its own API.

Maintaining identifiers in your URIs makes playing with your stuff easier.

http://www.bbc.co.uk/music/artists/4d2956d1-a3f7-44bb-9a41-67563e1a0c94

http://musicbrainz.org/artist/4d2956d1-a3f7-44bb-9a41-67563e1a0c94

@MikeAtherton | #IAS14

Creating an artist profile on MusicBrainz automatically creates a BBC Music artist page.

Linked data lets you use the web as a content management system.

!freshonthenet.co.uk/musicbrainz/

@MikeAtherton | #IAS14

Filling in the blanks• The BBC knows when it's played a record by Tom Waits

• MusicBrainz knows all the records Tom Waits ever released

• DBpedia knows Tom Waits is from San Diego

• DBpedia knows Blink-182 are also from San Diego

• The BBC knows when it's played a record by Blink-182

@MikeAtherton | #IAS14

The ‘SPARQL Protocol and RDF Query Language’ lets us query linked data as easily as a local database.

If you want linked data magic, try SPARQL.

SQL: Centralised relational queries SPARQL: Distributed graph queries

@MikeAtherton | #IAS14

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> PREFIX dbpprop: <http://dbpedia.org/property/> ! SELECT ?s ?title ?author WHERE { ?s rdf:type dbpedia-owl:Book. ?s dbpedia-owl:author ?author_uri . ?author_uri dbpedia-owl:birthName ?author . ?s dbpprop:name ?title . FILTER (REGEX(STR(?title), "Great Expectations", "i")) }

SPARQL query ‘Who wrote Great Expectations?’

The places where the terms we’ll use are defined

Bring back the stuff that matches what I’m about to say…

…which is the birth name of anything said to be the author of a book…

…but only if that book is titled ‘Great Expectations’

@MikeAtherton | #IAS14

The various sources that make up BBC Wildlife Finder are structured according to the Wildlife Ontology.

The ontology defines how everything hangs together.

http://dbpedia.org/resource/Giant_panda

http://www.bbc.co.uk/programmes/p00k3nx

http://www.bbc.co.uk/news/world-asia-china-24784767

http://worldwildlife.org/species/giant-panda http://www.iucnredlist.org/details/712/0

http://www.bbc.co.uk/ontologies/wildlife/2010-11-04.shtml

A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014

@MikeAtherton | #IAS14

ONTOLOGIESCreating, publishing, and consuming the words that help us mean what we say.

@MikeAtherton | #IAS14BBC Wildlife Ontology

http://www.bbc.co.uk/ontologies/wildlife/2010-11-04.shtml

@MikeAtherton | #IAS14

BBC News attempted to model news coverage to better represent how events are related.

Ontologies start with a high-level understanding of the IA.

@MikeAtherton | #IAS14

The chronological chain of events and the graph of supporting coverage are modelled to aid understanding.

News updates connect in sequence to a storyline.

@MikeAtherton | #IAS14

BBC News give journalists the tools to tag stories with web-scale identifiers as they write.

Articles and storylines are tagged with people, places, and subjects.

@MikeAtherton | #IAS14

News Storylines model http://purl.org/ontology/storyline

@MikeAtherton | #IAS14Detail from the News Storyline ontology

http://purl.org/ontology/storyline

@MikeAtherton | #IAS14

Used by NYT to aggregate news stories around themed topic pages.

The New York Times offers identifiers for people, places, and subjects.

@MikeAtherton | #IAS14

Mashing up sources of data can yield playful or surprisingly useful results.

Linked open data weaves tales of the unexpected.

@MikeAtherton | #IAS14

Earn your stars!make your stuff available under an open license

make it available as structured data

use non-proprietary formats

use URIs to identify things

link your data to other data to provide context

@MikeAtherton | #IAS14

Linked Open Data cloud 2007

@MikeAtherton | #IAS14

Linked Open Data cloud 2011

A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014

@MikeAtherton | #IAS14

FIRST STEPS WITH LINKED DATA

This all sounds awesome! Now what?

@MikeAtherton | #IAS14

1. Markup content with RDFa• RDFa is RDF embedded into HTML code to state our subject, predicate, and object.

• Typically: <subject>: The page we’re adding the markup to<predicate>: The verb, as defined by an an external vocabulary<object>: The external URI we’re expressing a relationship to

@MikeAtherton | #IAS14

RDFa in action<div class="vote2013-council-meta" resource="http://www.bbc.co.uk/news/politics/councils/[GSSID]">

<div vocab=“http://iptc.org/std/rNews/2011-10-07#” rel="about" resource="http://www.bbc.co.uk/things/[GUID]#id">

<div vocab="http://www.w3.org/2002/07/owl#" rel="sameAs" resource="http://opendatacommunities.org/id/[COUNCIL-TYPE]/[COUNCIL-NAME]"></div>

<div vocab="http://www.bbc.co.uk/ontologies/politics#" rel="governsGSS" resource="http://statistics.data.gov.uk/id/statistical-geography/[GSSID]"></div>

</div>

</div>

Define which city council this page is about.

Define what we mean by ‘about’ using the rNews vocabulary.

State that the city council we’re talking about is the same one referenced at Open Data Communities.

Using our own ontology, state that this council governs a region identified on data.gov.uk

Thanks to @r4isstatic for this example!

@MikeAtherton | #IAS14

2. Publish an ontology• Ontologies describe your content model in detail, defining the vocabulary for things,

types of thing, and types of relationship:

• Classes: ‘person’, ‘book’, ‘wine’

• Properties: ‘age’, ‘ISBN’, ‘hasDistillery’

• Individuals: ‘Charles Dickens’, ‘Great Expectations’, ‘Laphroaig’

@MikeAtherton | #IAS14

Many ontologies are published and available for reuse, or to build upon.

Ontologies are guidebooks to help us explore and understand subjects.

Schema.org General purpose vocabulary

FOAF Person-to-person relationships

rNews News story publishing

@MikeAtherton | #IAS14

3. Make your CMS work harder• Content management systems mostly suck for this stuff, but some - like Drupal and

Umbraco - have growing support for publishing RDF (and other semantic formats).

• These systems even have some SPARQL support allowing linked data to be added to your own page views.

• Most showcase linked data projects aren’t using an off-the-shelf CMS, but things are improving with more semantically-friendly CMSs like Webnodes and Ximdex.

@MikeAtherton | #IAS14

<rdf:Description rdf:about="/nature/species/Giant_Panda"> <foaf:primaryTopic rdf:resource="/nature/species/Giant_Panda#species"/> <rdfs:seeAlso rdf:resource="/nature/species"/> </rdf:Description> ! <wo:Species rdf:about="/nature/life/Giant_Panda#species"> <rdfs:label>Giant panda</rdfs:label> <wo:name rdf:resource="http://www.bbc.co.uk/nature/species/Giant_Panda#name"/> <foaf:depiction rdf:resource="http://ichef.bbci.co.uk/naturelibrary/images/ic/640x360/g/gi/giant_panda/giant_panda_1.jpg"/> ! <dc:description>The giant panda is a rare, endangered and elusive <a href="http://www.bbc.co.uk/nature/life/Bear">bear</a>, making the videos below of a newborn baby giant panda and the remarkable courtship scene filmed in the wild unique. Giant pandas are famous for their love of bamboo, a diet so nutritionally poor that the pandas have to consume up to 20kg each day. The extra digit on the panda's hand helps them to tear the bamboo and their gut is covered with a thick layer of mucus to protect against splinters. Habitat loss is the greatest cause of the giant panda's decline, and today their range is restricted to six separate mountain ranges in western <a href="http://www.bbc.co.uk/nature/places/China">China</a>. <br/> <br/> <b>Did you know?</b><br/> A giant panda is born pink, hairless, blind and 1/900th the size of its mother. <br/></dc:description> <owl:sameAs rdf:resource="http://dbpedia.org/resource/Giant_Panda"/>

http://www.bbc.co.uk/nature/life/Giant_Panda http://www.bbc.co.uk/nature/life/Giant_Panda.rdf

Human version Robot version

@MikeAtherton | #IAS14

4. Be a pirate!• Use your model to audit the content you have ready to go.

• Find the gaps - the concepts which are important to the subject domain, but which you don’t have content for.

• Sail the high seas in search of third-party content or data.

• Enrich your content with third-party data, then pay it forward by publishing linked data back out to the web.

@MikeAtherton | #IAS14

Islands of treasure await the brave adventurer.

@MikeAtherton | #IAS14

5. IAs should code<CONTROVERSY KLAXON!>

• Unexpected ideas can come from throwing different sources of data together.

• A little coding knowledge goes a long way toward building rough prototypes which prove concepts. Right now, more designer-friendly tools don’t exist.

• Python and Rails are popular choices among IAs who don’t mind a little hacking.

• If UX designers are encouraged to use native web tools, shouldn’t we also?

@MikeAtherton | #IAS14

DBpedia Animal descriptions

Freebase Species taxonomy

Geonames Location data

BBC Wildlife finder Video clips

Flickr API Tagged photos

‘Wildlife Near You’ was an experiment in bootstrapping an

entire content-rich product from no original content whatsoever.

@MikeAtherton | #IAS14

All the world’s a stage. What stories will we tell?

@MikeAtherton | #IAS14

Consider how your offering benefits the web as a whole.

Stitch your content into the fabric of the web.

A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014

@MikeAtherton | #IAS14

THE PATH AHEADWhere next for the web, and for information architecture?

@MikeAtherton | #IAS14

It’s been an amazing ride, but the best is yet to come.

@MikeAtherton | #IAS14

The web was designed to break down barriers.

@MikeAtherton | #IAS14

The web was designed to build bridges of understanding.

@MikeAtherton | #IAS14 https://www.flickr.com/photos/raveland

Time to let the robot army do the heavy lifting.

@MikeAtherton | #IAS14

Time to tool up and face the challenges that lie ahead.

@MikeAtherton | #IAS14

‘Designers should code’. But that’s not code…

@MikeAtherton | #IAS14

That’s code! With zombies.

@MikeAtherton | #IAS14

Today the tools are rough and ready, as once they were for HTML.

@MikeAtherton | #IAS14

The Linked Data and Information Architecture communities have much

to discuss.

@MikeAtherton | #IAS14

Information Architecture must continue to evolve, learn from others, and expand its range of influence.

@MikeAtherton | #IAS14

Ready to play?

Thanks for listening.This presentation now available at http://slideshare.net/reduxd

To find out more about getting started with Linked Data, visit EUCLID. http://euclid-project.eu/

Dedicated to the coalition of the willing: Silver Oliver @silveroliver Michael Smethurst @fantasticlife Paul Rissen @r4isstatic Tom Scott @derivadow Leigh Dodds @ldodds Chris Sizemore @onpause and the London and Reading Linked Data Meetup groups

Interested in content modelling? http://www.slideshare.net/reduxd/beyond-the-polar-bear

Recommended