Upload
mike-atherton
View
12
Download
2
Tags:
Embed Size (px)
DESCRIPTION
A preview of the talk I'll be giving at the 2014 IA Summit in San Diego. An introduction to the web of data, and how the BBC and other organisations create products which remix original content with third-party data.
Citation preview
@MikeAtherton | #IAS14
LINKED OPEN DATAMike Atherton
RedUXD
The path ahead
WEB-SCALE IA USING
@MikeAtherton | #IAS14
@MikeAtherton | #IAS14
Tim Berners-Lee 1989
@MikeAtherton | #IAS14
Defining standards• Use a common format for publishing
documents (HTML)
• Use a common system of addresses to identify and locate documents (URL)
• Establish a method of contextual linking between documents (HREF hyperlink)
@MikeAtherton | #IAS14 Actual feedback on Tim Berners-Lee’s proposal
@MikeAtherton | #IAS14 WEBSITES! WEBSITES! PARTY TIME! EXCELLENT!
@MikeAtherton | #IAS14
What wonderful things we wrote for people!
@MikeAtherton | #IAS14
@MikeAtherton | #IAS14
As humans we can extract meaning and context from documents automatically.
Spot the difference.
@MikeAtherton | #IAS14
The context of keywords doesn’t travel with them.
Tag: “Apple”
@MikeAtherton | #IAS14
We can pick out the important things and relationships just by reading.
For humans, the distinction between documents and data is subtle.
@MikeAtherton | #IAS14
By defining real-world things, we can teach computers the relationships between those things.
Computers need to be told which things our documents contain.
@MikeAtherton | #IAS14
If a computer knows what ‘Mount Everest’ is and what ‘tall’ means, it can do the legwork for us.
“How tall is Mount Everest?”
@MikeAtherton | #IAS14
By understanding terms and linking to data services, computers can even find out things they don’t know.
“Where can I get a beer?”
@MikeAtherton | #IAS14
Actual queries from Facebook’s Graph Search tool.
Cross-referencing data points gives new insight.
http://actualfacebookgraphsearches.tumblr.com/
@MikeAtherton | #IAS14
TED conference 2009
@MikeAtherton | #IAS14
Use web addresses to represent real-world things
Tim Berners-Lee
Rule #1 of data publishing
@MikeAtherton | #IAS14
Return useful data about each resource, in a standard format.
Tim Berners-Lee
Rule #2 of data publishing
@MikeAtherton | #IAS14
Include links to other data, so people can discover more things.
Tim Berners-Lee
Rule #3 of data publishing
@MikeAtherton | #IAS14
Linked data• Use web addresses to represent real-world
things
• Return useful data about each resource, in a standard format.
• Include links to other data, so people can dissever more things.
Data sources combined create more insight than studying them separately.
@MikeAtherton | #IAS14
Researchers attempting to discover new drugs to treat Alzheimer’s Disease.
“Which proteins are involved in signal transduction AND are related to pyramidal neurons?”
Web search 223,000 results, 0 answers
Linked healthcare data query 32 results, 32 answers
@MikeAtherton | #IAS14
We can even create entirely new value propositions from remixing existing content.
Linked data helps us make sense of information.
data.gov.uk Newspaper Hyper-local news publishing
Land Registry Price Paid Historical property data
Voter power Local constituency data
A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014
@MikeAtherton | #IAS14
CONTENT MODELS AT WEB-SCALE
Where next for your content model?
THEME PARKS CONTENT MODEL
Location
Resort
Park
Hotel
Weenie
Land
Meal
Restaurant
Attraction
Character
Creator
Work
locatedIn
hasWeenie
ParentResort locatedIn
locatedIn
hasEvent
features
adaptationOf
adaptationOfhasAttraction
CreatedBy
appearsIn
hasPark
contains
@MikeAtherton | #IAS14
But ideally, those addresses should offer robot-readable data.
Use http web addresses to represent real-world things.
http://disneyland.disney.go.com/attractions/disneyland/haunted-
mansion/
http://www.geonames.org/ontology#locatedIn
http://en.wikipedia.org/wiki/New_Orleans_Square
The Haunted Mansion (is) located in New Orleans Square
@MikeAtherton | #IAS14
The Resource Description Framework is the web’s lingua franca for data integration.
RDF lets different data sources play nice together.
<subject> <predicate> <object>
<Charles Dickens> <is the author of> <Great Expectations>
@MikeAtherton | #IAS14
RDF is an abstract syntax, so all these ‘serialisations’ are equivalent.
RDF can be written in different ways.
RDF/XML
Turtle
<http://dbpedia.org/resource/Charles_Dickens> <http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations> <http://dbpedia.org/resource/Charles_Dickens> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person>
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <dbpedia-owl:Person xmlns:dbpedia-owl="http://dbpedia.org/ontology/"
rdf:about=“http://dbpedia.org/resource/Charles_Dickens"> <dbpedia-owl:artist rdf:resource=“http://dbpedia.org/resource/Great_Expectations”/>
</dbpedia-owl:Person> </rdf:RDF>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://dbpedia.org/resource/Charles_Dickens> a <http://dbpedia.org/ontology/Person> ;
<http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations>
N-Triples
@MikeAtherton | #IAS14
Your data
BBC
New York Times
@MikeAtherton | #IAS14
But only for humans! What if we had a common way to define a concept for a robot?
Wikipedia is great for defining individual concepts.
@MikeAtherton | #IAS14
It turns Wikipedia content into machine-readable linked data.
DBpedia is Wikipedia for robots.
@MikeAtherton | #IAS14
Ok computer…• When did Disneyland first open?
• What is its official homepage?
• Who operates the park?
• When is it open?
• What’s it’s theme?
@MikeAtherton | #IAS14
It crowdsources music metadata for use by humans and robots.
MusicBrainz is the open music encyclopedia.
@MikeAtherton | #IAS14
By saying our concept is the ‘same as’ an accepted identifier, we all speak the same language.
Shared identifiers act as intermediaries.
http://dbpedia.org/resource/China
A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014
@MikeAtherton | #IAS14
BBC MUSICHow BBC Music used linked data to get more people listening to the radio.
@MikeAtherton | #IAS14
How can I find out which one I should listen to?
10 national BBC radio stations.
@MikeAtherton | #IAS14
A continuously-updated record of every song played on-air.
The BBC radio playout system was a data goldmine.
@MikeAtherton | #IAS14
The sources combine to create a new and useful product.
Linked data builds a composite picture of the world.
+Which song is playing on the radio now? Who is this artist?
What other stuff has this artist done? What TV or radio clips of this artist do we have?
@MikeAtherton | #IAS14
@MikeAtherton | #IAS14
By exposing common identifiers, your website becomes its own API.
Maintaining identifiers in your URIs makes playing with your stuff easier.
http://www.bbc.co.uk/music/artists/4d2956d1-a3f7-44bb-9a41-67563e1a0c94
http://musicbrainz.org/artist/4d2956d1-a3f7-44bb-9a41-67563e1a0c94
@MikeAtherton | #IAS14
Creating an artist profile on MusicBrainz automatically creates a BBC Music artist page.
Linked data lets you use the web as a content management system.
!freshonthenet.co.uk/musicbrainz/
@MikeAtherton | #IAS14
Filling in the blanks• The BBC knows when it's played a record by Tom Waits
• MusicBrainz knows all the records Tom Waits ever released
• DBpedia knows Tom Waits is from San Diego
• DBpedia knows Blink-182 are also from San Diego
• The BBC knows when it's played a record by Blink-182
@MikeAtherton | #IAS14
The ‘SPARQL Protocol and RDF Query Language’ lets us query linked data as easily as a local database.
If you want linked data magic, try SPARQL.
SQL: Centralised relational queries SPARQL: Distributed graph queries
@MikeAtherton | #IAS14
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> PREFIX dbpprop: <http://dbpedia.org/property/> ! SELECT ?s ?title ?author WHERE { ?s rdf:type dbpedia-owl:Book. ?s dbpedia-owl:author ?author_uri . ?author_uri dbpedia-owl:birthName ?author . ?s dbpprop:name ?title . FILTER (REGEX(STR(?title), "Great Expectations", "i")) }
SPARQL query ‘Who wrote Great Expectations?’
The places where the terms we’ll use are defined
Bring back the stuff that matches what I’m about to say…
…which is the birth name of anything said to be the author of a book…
…but only if that book is titled ‘Great Expectations’
@MikeAtherton | #IAS14
The various sources that make up BBC Wildlife Finder are structured according to the Wildlife Ontology.
The ontology defines how everything hangs together.
http://dbpedia.org/resource/Giant_panda
http://www.bbc.co.uk/programmes/p00k3nx
http://www.bbc.co.uk/news/world-asia-china-24784767
http://worldwildlife.org/species/giant-panda http://www.iucnredlist.org/details/712/0
http://www.bbc.co.uk/ontologies/wildlife/2010-11-04.shtml
A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014
@MikeAtherton | #IAS14
ONTOLOGIESCreating, publishing, and consuming the words that help us mean what we say.
@MikeAtherton | #IAS14BBC Wildlife Ontology
http://www.bbc.co.uk/ontologies/wildlife/2010-11-04.shtml
@MikeAtherton | #IAS14
BBC News attempted to model news coverage to better represent how events are related.
Ontologies start with a high-level understanding of the IA.
@MikeAtherton | #IAS14
The chronological chain of events and the graph of supporting coverage are modelled to aid understanding.
News updates connect in sequence to a storyline.
@MikeAtherton | #IAS14
BBC News give journalists the tools to tag stories with web-scale identifiers as they write.
Articles and storylines are tagged with people, places, and subjects.
@MikeAtherton | #IAS14
News Storylines model http://purl.org/ontology/storyline
@MikeAtherton | #IAS14Detail from the News Storyline ontology
http://purl.org/ontology/storyline
@MikeAtherton | #IAS14
Used by NYT to aggregate news stories around themed topic pages.
The New York Times offers identifiers for people, places, and subjects.
@MikeAtherton | #IAS14
Mashing up sources of data can yield playful or surprisingly useful results.
Linked open data weaves tales of the unexpected.
@MikeAtherton | #IAS14
Earn your stars!make your stuff available under an open license
make it available as structured data
use non-proprietary formats
use URIs to identify things
link your data to other data to provide context
@MikeAtherton | #IAS14
Linked Open Data cloud 2007
@MikeAtherton | #IAS14
Linked Open Data cloud 2011
A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014
@MikeAtherton | #IAS14
FIRST STEPS WITH LINKED DATA
This all sounds awesome! Now what?
@MikeAtherton | #IAS14
1. Markup content with RDFa• RDFa is RDF embedded into HTML code to state our subject, predicate, and object.
• Typically: <subject>: The page we’re adding the markup to<predicate>: The verb, as defined by an an external vocabulary<object>: The external URI we’re expressing a relationship to
@MikeAtherton | #IAS14
RDFa in action<div class="vote2013-council-meta" resource="http://www.bbc.co.uk/news/politics/councils/[GSSID]">
<div vocab=“http://iptc.org/std/rNews/2011-10-07#” rel="about" resource="http://www.bbc.co.uk/things/[GUID]#id">
<div vocab="http://www.w3.org/2002/07/owl#" rel="sameAs" resource="http://opendatacommunities.org/id/[COUNCIL-TYPE]/[COUNCIL-NAME]"></div>
<div vocab="http://www.bbc.co.uk/ontologies/politics#" rel="governsGSS" resource="http://statistics.data.gov.uk/id/statistical-geography/[GSSID]"></div>
</div>
</div>
Define which city council this page is about.
Define what we mean by ‘about’ using the rNews vocabulary.
State that the city council we’re talking about is the same one referenced at Open Data Communities.
Using our own ontology, state that this council governs a region identified on data.gov.uk
Thanks to @r4isstatic for this example!
@MikeAtherton | #IAS14
2. Publish an ontology• Ontologies describe your content model in detail, defining the vocabulary for things,
types of thing, and types of relationship:
• Classes: ‘person’, ‘book’, ‘wine’
• Properties: ‘age’, ‘ISBN’, ‘hasDistillery’
• Individuals: ‘Charles Dickens’, ‘Great Expectations’, ‘Laphroaig’
@MikeAtherton | #IAS14
Many ontologies are published and available for reuse, or to build upon.
Ontologies are guidebooks to help us explore and understand subjects.
Schema.org General purpose vocabulary
FOAF Person-to-person relationships
rNews News story publishing
@MikeAtherton | #IAS14
3. Make your CMS work harder• Content management systems mostly suck for this stuff, but some - like Drupal and
Umbraco - have growing support for publishing RDF (and other semantic formats).
• These systems even have some SPARQL support allowing linked data to be added to your own page views.
• Most showcase linked data projects aren’t using an off-the-shelf CMS, but things are improving with more semantically-friendly CMSs like Webnodes and Ximdex.
@MikeAtherton | #IAS14
<rdf:Description rdf:about="/nature/species/Giant_Panda"> <foaf:primaryTopic rdf:resource="/nature/species/Giant_Panda#species"/> <rdfs:seeAlso rdf:resource="/nature/species"/> </rdf:Description> ! <wo:Species rdf:about="/nature/life/Giant_Panda#species"> <rdfs:label>Giant panda</rdfs:label> <wo:name rdf:resource="http://www.bbc.co.uk/nature/species/Giant_Panda#name"/> <foaf:depiction rdf:resource="http://ichef.bbci.co.uk/naturelibrary/images/ic/640x360/g/gi/giant_panda/giant_panda_1.jpg"/> ! <dc:description>The giant panda is a rare, endangered and elusive <a href="http://www.bbc.co.uk/nature/life/Bear">bear</a>, making the videos below of a newborn baby giant panda and the remarkable courtship scene filmed in the wild unique. Giant pandas are famous for their love of bamboo, a diet so nutritionally poor that the pandas have to consume up to 20kg each day. The extra digit on the panda's hand helps them to tear the bamboo and their gut is covered with a thick layer of mucus to protect against splinters. Habitat loss is the greatest cause of the giant panda's decline, and today their range is restricted to six separate mountain ranges in western <a href="http://www.bbc.co.uk/nature/places/China">China</a>. <br/> <br/> <b>Did you know?</b><br/> A giant panda is born pink, hairless, blind and 1/900th the size of its mother. <br/></dc:description> <owl:sameAs rdf:resource="http://dbpedia.org/resource/Giant_Panda"/>
http://www.bbc.co.uk/nature/life/Giant_Panda http://www.bbc.co.uk/nature/life/Giant_Panda.rdf
Human version Robot version
@MikeAtherton | #IAS14
4. Be a pirate!• Use your model to audit the content you have ready to go.
• Find the gaps - the concepts which are important to the subject domain, but which you don’t have content for.
• Sail the high seas in search of third-party content or data.
• Enrich your content with third-party data, then pay it forward by publishing linked data back out to the web.
@MikeAtherton | #IAS14
Islands of treasure await the brave adventurer.
@MikeAtherton | #IAS14
5. IAs should code<CONTROVERSY KLAXON!>
• Unexpected ideas can come from throwing different sources of data together.
• A little coding knowledge goes a long way toward building rough prototypes which prove concepts. Right now, more designer-friendly tools don’t exist.
• Python and Rails are popular choices among IAs who don’t mind a little hacking.
• If UX designers are encouraged to use native web tools, shouldn’t we also?
@MikeAtherton | #IAS14
DBpedia Animal descriptions
Freebase Species taxonomy
Geonames Location data
BBC Wildlife finder Video clips
Flickr API Tagged photos
‘Wildlife Near You’ was an experiment in bootstrapping an
entire content-rich product from no original content whatsoever.
@MikeAtherton | #IAS14
All the world’s a stage. What stories will we tell?
@MikeAtherton | #IAS14
Consider how your offering benefits the web as a whole.
Stitch your content into the fabric of the web.
A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014
@MikeAtherton | #IAS14
THE PATH AHEADWhere next for the web, and for information architecture?
@MikeAtherton | #IAS14
It’s been an amazing ride, but the best is yet to come.
@MikeAtherton | #IAS14
The web was designed to break down barriers.
@MikeAtherton | #IAS14
The web was designed to build bridges of understanding.
@MikeAtherton | #IAS14 https://www.flickr.com/photos/raveland
Time to let the robot army do the heavy lifting.
@MikeAtherton | #IAS14
Time to tool up and face the challenges that lie ahead.
@MikeAtherton | #IAS14
‘Designers should code’. But that’s not code…
@MikeAtherton | #IAS14
That’s code! With zombies.
@MikeAtherton | #IAS14
Today the tools are rough and ready, as once they were for HTML.
@MikeAtherton | #IAS14
The Linked Data and Information Architecture communities have much
to discuss.
@MikeAtherton | #IAS14
Information Architecture must continue to evolve, learn from others, and expand its range of influence.
@MikeAtherton | #IAS14
Ready to play?
Thanks for listening.This presentation now available at http://slideshare.net/reduxd
To find out more about getting started with Linked Data, visit EUCLID. http://euclid-project.eu/
Dedicated to the coalition of the willing: Silver Oliver @silveroliver Michael Smethurst @fantasticlife Paul Rissen @r4isstatic Tom Scott @derivadow Leigh Dodds @ldodds Chris Sizemore @onpause and the London and Reading Linked Data Meetup groups
Interested in content modelling? http://www.slideshare.net/reduxd/beyond-the-polar-bear