Web-scale IA using Linked Open Data

@MikeAtherton | #IAS14

LINKED OPEN DATAMike Atherton

RedUXD

The path ahead

WEB-SCALE IA USING

Tim Berners-Lee 1989

Defining standards• Use a common format for publishing

documents (HTML)

• Use a common system of addresses to identify and locate documents (URL)

• Establish a method of contextual linking between documents (HREF hyperlink)

@MikeAtherton | #IAS14 Actual feedback on Tim Berners-Lee’s proposal

@MikeAtherton | #IAS14 WEBSITES! WEBSITES! PARTY TIME! EXCELLENT!

What wonderful things we wrote for people!

As humans we can extract meaning and context from documents automatically.

Spot the difference.

The context of keywords doesn’t travel with them.

Tag: “Apple”

We can pick out the important things and relationships just by reading.

For humans, the distinction between documents and data is subtle.

By defining real-world things, we can teach computers the relationships between those things.

Computers need to be told which things our documents contain.

If a computer knows what ‘Mount Everest’ is and what ‘tall’ means, it can do the legwork for us.

“How tall is Mount Everest?”

By understanding terms and linking to data services, computers can even find out things they don’t know.

“Where can I get a beer?”

Actual queries from Facebook’s Graph Search tool.

Cross-referencing data points gives new insight.

http://actualfacebookgraphsearches.tumblr.com/

TED conference 2009

Use web addresses to represent real-world things

Tim Berners-Lee

Rule #1 of data publishing

Return useful data about each resource, in a standard format.

Tim Berners-Lee

Include links to other data, so people can discover more things.

Tim Berners-Lee

Linked data• Use web addresses to represent real-world

things

• Return useful data about each resource, in a standard format.

• Include links to other data, so people can dissever more things.

Data sources combined create more insight than studying them separately.

Researchers attempting to discover new drugs to treat Alzheimer’s Disease.

“Which proteins are involved in signal transduction AND are related to pyramidal neurons?”

Web search 223,000 results, 0 answers

Linked healthcare data query 32 results, 32 answers

We can even create entirely new value propositions from remixing existing content.

Linked data helps us make sense of information.

data.gov.uk Newspaper Hyper-local news publishing

Land Registry Price Paid Historical property data

Voter power Local constituency data

A MINIMUM VIABLE PRESENTATION FOR IA SUMMIT 2014

CONTENT MODELS AT WEB-SCALE

Where next for your content model?

THEME PARKS CONTENT MODEL

Location

Resort

Weenie

Restaurant

Attraction

Character

Creator

locatedIn

hasWeenie

ParentResort locatedIn

locatedIn

hasEvent

features

adaptationOf

adaptationOfhasAttraction

CreatedBy

appearsIn

hasPark

contains

But ideally, those addresses should offer robot-readable data.

Use http web addresses to represent real-world things.

http://disneyland.disney.go.com/attractions/disneyland/haunted-

mansion/

http://www.geonames.org/ontology#locatedIn

http://en.wikipedia.org/wiki/New_Orleans_Square

The Haunted Mansion (is) located in New Orleans Square

The Resource Description Framework is the web’s lingua franca for data integration.

RDF lets different data sources play nice together.

RDF is an abstract syntax, so all these ‘serialisations’ are equivalent.

RDF can be written in different ways.

RDF/XML

Turtle

<http://dbpedia.org/resource/Charles_Dickens> <http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations> <http://dbpedia.org/resource/Charles_Dickens> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://dbpedia.org/ontology/Person>

<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <dbpedia-owl:Person xmlns:dbpedia-owl="http://dbpedia.org/ontology/"

rdf:about=“http://dbpedia.org/resource/Charles_Dickens"> <dbpedia-owl:artist rdf:resource=“http://dbpedia.org/resource/Great_Expectations”/>

</dbpedia-owl:Person> </rdf:RDF>

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

<http://dbpedia.org/resource/Charles_Dickens> a <http://dbpedia.org/ontology/Person> ;

<http://dbpedia.org/ontology/author> <http://dbpedia.org/resource/Great_Expectations>

N-Triples

Your data

New York Times

But only for humans! What if we had a common way to define a concept for a robot?

Wikipedia is great for defining individual concepts.

It turns Wikipedia content into machine-readable linked data.

DBpedia is Wikipedia for robots.

Ok computer…• When did Disneyland first open?

• What is its official homepage?

• Who operates the park?

• When is it open?

• What’s it’s theme?

It crowdsources music metadata for use by humans and robots.

MusicBrainz is the open music encyclopedia.

By saying our concept is the ‘same as’ an accepted identifier, we all speak the same language.

Shared identifiers act as intermediaries.

http://dbpedia.org/resource/China

BBC MUSICHow BBC Music used linked data to get more people listening to the radio.

How can I find out which one I should listen to?

10 national BBC radio stations.

A continuously-updated record of every song played on-air.

The BBC radio playout system was a data goldmine.

The sources combine to create a new and useful product.

Linked data builds a composite picture of the world.

+Which song is playing on the radio now? Who is this artist?

What other stuff has this artist done? What TV or radio clips of this artist do we have?

By exposing common identifiers, your website becomes its own API.

Maintaining identifiers in your URIs makes playing with your stuff easier.

http://www.bbc.co.uk/music/artists/4d2956d1-a3f7-44bb-9a41-67563e1a0c94

http://musicbrainz.org/artist/4d2956d1-a3f7-44bb-9a41-67563e1a0c94

Creating an artist profile on MusicBrainz automatically creates a BBC Music artist page.

Linked data lets you use the web as a content management system.

!freshonthenet.co.uk/musicbrainz/

Filling in the blanks• The BBC knows when it's played a record by Tom Waits

• MusicBrainz knows all the records Tom Waits ever released

• DBpedia knows Tom Waits is from San Diego

• DBpedia knows Blink-182 are also from San Diego

• The BBC knows when it's played a record by Blink-182

The ‘SPARQL Protocol and RDF Query Language’ lets us query linked data as easily as a local database.

If you want linked data magic, try SPARQL.

SQL: Centralised relational queries SPARQL: Distributed graph queries

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX dbpedia-owl: <http://dbpedia.org/ontology/> PREFIX dbpprop: <http://dbpedia.org/property/> ! SELECT ?s ?title ?author WHERE { ?s rdf:type dbpedia-owl:Book. ?s dbpedia-owl:author ?author_uri . ?author_uri dbpedia-owl:birthName ?author . ?s dbpprop:name ?title . FILTER (REGEX(STR(?title), "Great Expectations", "i")) }

SPARQL query ‘Who wrote Great Expectations?’

The places where the terms we’ll use are defined

Bring back the stuff that matches what I’m about to say…

…which is the birth name of anything said to be the author of a book…

…but only if that book is titled ‘Great Expectations’

The various sources that make up BBC Wildlife Finder are structured according to the Wildlife Ontology.

The ontology defines how everything hangs together.

http://dbpedia.org/resource/Giant_panda

http://www.bbc.co.uk/programmes/p00k3nx

http://www.bbc.co.uk/news/world-asia-china-24784767

http://worldwildlife.org/species/giant-panda http://www.iucnredlist.org/details/712/0

http://www.bbc.co.uk/ontologies/wildlife/2010-11-04.shtml

ONTOLOGIESCreating, publishing, and consuming the words that help us mean what we say.

@MikeAtherton | #IAS14BBC Wildlife Ontology

http://www.bbc.co.uk/ontologies/wildlife/2010-11-04.shtml

BBC News attempted to model news coverage to better represent how events are related.

Ontologies start with a high-level understanding of the IA.

The chronological chain of events and the graph of supporting coverage are modelled to aid understanding.

News updates connect in sequence to a storyline.

BBC News give journalists the tools to tag stories with web-scale identifiers as they write.

Articles and storylines are tagged with people, places, and subjects.

News Storylines model http://purl.org/ontology/storyline

@MikeAtherton | #IAS14Detail from the News Storyline ontology

http://purl.org/ontology/storyline

Used by NYT to aggregate news stories around themed topic pages.

The New York Times offers identifiers for people, places, and subjects.

Mashing up sources of data can yield playful or surprisingly useful results.

Linked open data weaves tales of the unexpected.

Earn your stars!make your stuff available under an open license

make it available as structured data

use non-proprietary formats

use URIs to identify things

link your data to other data to provide context

Linked Open Data cloud 2007

Linked Open Data cloud 2011

FIRST STEPS WITH LINKED DATA

This all sounds awesome! Now what?

1. Markup content with RDFa• RDFa is RDF embedded into HTML code to state our subject, predicate, and object.

• Typically: <subject>: The page we’re adding the markup to<predicate>: The verb, as defined by an an external vocabulary<object>: The external URI we’re expressing a relationship to

RDFa in action<div class="vote2013-council-meta" resource="http://www.bbc.co.uk/news/politics/councils/[GSSID]">

</div>

Define which city council this page is about.

Define what we mean by ‘about’ using the rNews vocabulary.

State that the city council we’re talking about is the same one referenced at Open Data Communities.

Using our own ontology, state that this council governs a region identified on data.gov.uk

Thanks to @r4isstatic for this example!

2. Publish an ontology• Ontologies describe your content model in detail, defining the vocabulary for things,

types of thing, and types of relationship:

• Classes: ‘person’, ‘book’, ‘wine’

• Properties: ‘age’, ‘ISBN’, ‘hasDistillery’

• Individuals: ‘Charles Dickens’, ‘Great Expectations’, ‘Laphroaig’

Many ontologies are published and available for reuse, or to build upon.

Ontologies are guidebooks to help us explore and understand subjects.

Schema.org General purpose vocabulary

FOAF Person-to-person relationships

rNews News story publishing

3. Make your CMS work harder• Content management systems mostly suck for this stuff, but some - like Drupal and

Umbraco - have growing support for publishing RDF (and other semantic formats).

• These systems even have some SPARQL support allowing linked data to be added to your own page views.

• Most showcase linked data projects aren’t using an off-the-shelf CMS, but things are improving with more semantically-friendly CMSs like Webnodes and Ximdex.

<rdf:Description rdf:about="/nature/species/Giant_Panda"> <foaf:primaryTopic rdf:resource="/nature/species/Giant_Panda#species"/> <rdfs:seeAlso rdf:resource="/nature/species"/> </rdf:Description> ! <wo:Species rdf:about="/nature/life/Giant_Panda#species"> <rdfs:label>Giant panda</rdfs:label> <wo:name rdf:resource="http://www.bbc.co.uk/nature/species/Giant_Panda#name"/> <foaf:depiction rdf:resource="http://ichef.bbci.co.uk/naturelibrary/images/ic/640x360/g/gi/giant_panda/giant_panda_1.jpg"/> ! <dc:description>The giant panda is a rare, endangered and elusive <a href="http://www.bbc.co.uk/nature/life/Bear">bear</a>, making the videos below of a newborn baby giant panda and the remarkable courtship scene filmed in the wild unique. Giant pandas are famous for their love of bamboo, a diet so nutritionally poor that the pandas have to consume up to 20kg each day. The extra digit on the panda's hand helps them to tear the bamboo and their gut is covered with a thick layer of mucus to protect against splinters. Habitat loss is the greatest cause of the giant panda's decline, and today their range is restricted to six separate mountain ranges in western <a href="http://www.bbc.co.uk/nature/places/China">China</a>. <br/> <br/> <b>Did you know?</b><br/> A giant panda is born pink, hairless, blind and 1/900th the size of its mother. <br/></dc:description> <owl:sameAs rdf:resource="http://dbpedia.org/resource/Giant_Panda"/>

http://www.bbc.co.uk/nature/life/Giant_Panda http://www.bbc.co.uk/nature/life/Giant_Panda.rdf

Human version Robot version

4. Be a pirate!• Use your model to audit the content you have ready to go.

• Find the gaps - the concepts which are important to the subject domain, but which you don’t have content for.

• Sail the high seas in search of third-party content or data.

• Enrich your content with third-party data, then pay it forward by publishing linked data back out to the web.

Islands of treasure await the brave adventurer.

5. IAs should code<CONTROVERSY KLAXON!>

• Unexpected ideas can come from throwing different sources of data together.

• A little coding knowledge goes a long way toward building rough prototypes which prove concepts. Right now, more designer-friendly tools don’t exist.

• Python and Rails are popular choices among IAs who don’t mind a little hacking.

• If UX designers are encouraged to use native web tools, shouldn’t we also?

DBpedia Animal descriptions

Freebase Species taxonomy

Geonames Location data

BBC Wildlife finder Video clips

Flickr API Tagged photos

‘Wildlife Near You’ was an experiment in bootstrapping an

entire content-rich product from no original content whatsoever.

All the world’s a stage. What stories will we tell?

Consider how your offering benefits the web as a whole.

Stitch your content into the fabric of the web.

THE PATH AHEADWhere next for the web, and for information architecture?

It’s been an amazing ride, but the best is yet to come.

The web was designed to break down barriers.

The web was designed to build bridges of understanding.

@MikeAtherton | #IAS14 https://www.flickr.com/photos/raveland

Time to let the robot army do the heavy lifting.

Time to tool up and face the challenges that lie ahead.

‘Designers should code’. But that’s not code…

That’s code! With zombies.

Today the tools are rough and ready, as once they were for HTML.

The Linked Data and Information Architecture communities have much

to discuss.

Information Architecture must continue to evolve, learn from others, and expand its range of influence.

Ready to play?

Thanks for listening.This presentation now available at http://slideshare.net/reduxd

To find out more about getting started with Linked Data, visit EUCLID. http://euclid-project.eu/

Dedicated to the coalition of the willing: Silver Oliver @silveroliver Michael Smethurst @fantasticlife Paul Rissen @r4isstatic Tom Scott @derivadow Leigh Dodds @ldodds Chris Sizemore @onpause and the London and Reading Linked Data Meetup groups

Interested in content modelling? http://www.slideshare.net/reduxd/beyond-the-polar-bear

Web-scale IA using Linked Open Data

Technology

Tim Baker Regional-scale Proterozoic IOCG-mineralized ...dthorkel/linked/hunt et al 2005 min dep.pdf · Regional-scale Proterozoic IOCG-mineralized breccia systems: ... Iron oxide–copper–gold

Deep-waterclasticsystemsinthe Upper Carboniferous (Upper ...csmgeo.csm.jmu.edu/Geollab/Whitmeyer/IrelandField...a ﬁrst-order basin-scale linked sedimentary system. This system can

MusicaNet · 2020. 11. 15. · 12 48 51 ia, ia. ia. div. Al le - lu — ia, ia, lu mp lu ia, ia, Al lu ia, Al Iu — ia, molto rit. le ia. ia. molto rit. le lu — ia, lu - ia, molto

1 Linked Lists (Lec 6). 2 Introduction Singly Linked Lists Circularly Linked Lists Doubly Linked Lists Multiply Linked Lists Applications

Pilot Plant Capabilities · 2018. 7. 13. · OMNIUM Pilot Plant Capabilities Hampton, IA Scale up operations from lab scale batches that would simulate actual production in our larger

Total Lobbying - reesonomics.eu · many scandals, Total still benefits from a lot of aids at national scale but also at European scale. Some members of the French government or linked

Learning Linked Data - NOTSL · 01/12/2017 · Learning Linked Data ... Bensmann, Felix, Benjamin Zapilko, and Philipp Mayr 2017. Interlinking large-scale library data with authority

· DVGW NW-6506BR5780 · P-IX 18304/IA · DVGW …...· P-IX 18304/IA · SVGW 0706-5234 · SINTEF 1744 Technology Product image Scale drawing Focus E Single lever kitchen mixer Surfaces:

Mapping Large Scale Research Metadata to Linked Data: A

Building linked data large-scale chemistry platform - challenges, lessons and solutions

Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab

Français 1A - faculty.pingry.k12.nj.usfaculty.pingry.k12.nj.us/jroxbury/linked files/French IA/Uni… · Web viewFrançais 1A. Feuilles de révision de l’examen final. The exam

SCALE-FREE NETWORKS: Background, evolutionary models & simulationweb.nchu.edu.tw/~pfsum/papers/scale-free-networks.pdf · 2012-08-18 · Stanley Milgram (60s) [A.L. Barabasi, Linked,

Enabling Linked Data in Open PHACTS...LOD2 –Linked Open Data large scale integration project LDBC - Transparency and Relevance for Graph DB, RDF performance GeoKnow - GeoData is

5 Pentatonic Scale Patterns - Guitar Mastery Method · PDF file5 Pentatonic Scale Patterns - In key of C Major linked up in CAGED sequence MASTERY = Root Note Pattern Pattern Pattern

Comparative genome-scale modelling of Staphylococcus ... · Comparative genome-scale modelling of Staphylococcus aureus strains identifies strain-specific metabolic capabilities linked

From Bad to Worse to Nearly Perfect: Making Large Scale IA Changes to a Live Site

ASSET NAME: WM PRIM FC A CMYK.ai COLOR SYSTEM ......COLORS: CMYK SCALE: 100% ASSET NAME: WM_MON_2C_D_CMYK.ai PAGE: 13 LINKED FILE NAME: N/A COLOR SYSTEM: PROCESS COLORS: CMYK SCALE:

Horicon Marsh Paddling Guide d a ou e l th a v k R e d C e ...dnr.wi.gov/topic/lands/documents/horicon/canoeing.pdfIA IA IA IA IA IA I* IA I* IA!y!y!y!y IA IA IA IA IA IA IA!| !|!|!|

Entity Linkage in the Linked Data - univ-artois.fr · 2019-11-04 · ¡ Linked Data referstoacollection of interrelated datasets n Used forlarge-scale integration of, reasoning on,