21
Muddy Boots From Flickr user Garrulus http://flickr.com/photos/garrulus/82714475/ Rattle Research http://www.rattleresearch.com Muddy Boots, tramping new trails through the BBCs existing pristine navigation paths

BBC Muddy Boots 08

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: BBC Muddy Boots 08

Muddy Boots

From Flickr user Garrulushttp://flickr.com/photos/garrulus/82714475/

Rattle Researchhttp://www.rattleresearch.com

Muddy Boots, tramping new trails through the BBCs existing pristine navigation paths

Page 2: BBC Muddy Boots 08

A brief interlude, Muddy Boots was developed as part of the Innovation labs process. Rattle are a digital R&D agency specializing in social innovation on the web

Page 3: BBC Muddy Boots 08

Innovation Labs+

A brief interlude, Muddy Boots was developed as part of the Innovation labs process. Rattle are a digital R&D agency specializing in social innovation on the web

Page 4: BBC Muddy Boots 08

Identify Key Entities

(context) Find reference URIs from Wikipedia

Find URI context

from del.icio.us

Rank URI’s via context ‘rating’

Original Workflow

Original MuddyBoots workflow for creating related links based on Wikipedia data. Produced interesting results but ‘interesting’ isn’t testable or measurable and we had one big problem

Page 5: BBC Muddy Boots 08

Ambiguity of term - how can you tell which ‘Apple’ is referenced in the article

Page 6: BBC Muddy Boots 08

Apple Apple ?

Ambiguity of term - how can you tell which ‘Apple’ is referenced in the article

Page 7: BBC Muddy Boots 08

Simplify the problem :

http://www.flickr.com/photos/donnagrayson/195244498/sizes/l/

Back to basics, only try to do one thing well - and solve the problem of ambiguity

Page 8: BBC Muddy Boots 08

Simplify the problem :

Unambiguously identify the “main actors” in a news story

Then add semantic markup for them

Answers the “who” in who, what, where, why ...

http://www.flickr.com/photos/donnagrayson/195244498/sizes/l/

Back to basics, only try to do one thing well - and solve the problem of ambiguity

Page 9: BBC Muddy Boots 08

Extract (& Classify Entities) Find In

DBpedia / Wikipedia

Extract Required Attributes Parse

Content &Markup

One Possible Workflow

Classify Entities via DBpedia

Entity extraction - many methods available, entity classification via DBpedia is very extensible, finally microformat markup, good for machines and the semantic web as a whole

Page 10: BBC Muddy Boots 08

Entity Extraction (& classification ?)

Leveraging existing web services to perform entity extraction is useful, especially when employing a voting system. We also used a local named entity extraction service, this is more useful in the future as we have some direction over it’s evolution

Page 11: BBC Muddy Boots 08

Entity Extraction (& classification ?)

Yahoo term extraction

TagThe.net

Lingpipe

Voting System

Leveraging existing web services to perform entity extraction is useful, especially when employing a voting system. We also used a local named entity extraction service, this is more useful in the future as we have some direction over it’s evolution

Page 12: BBC Muddy Boots 08

Using Wikipedia as a controlled vocabulary

Lucene vs DBpedia ‘disambiguates’ predicates, both have different advantages - Lucene finds something every time, whereas disambiguating with the predicates provides less false matches but also less results

Page 13: BBC Muddy Boots 08

Using Wikipedia as a controlled vocabulary

conText

Uses Lucene based index to lookup entitiesand find the ‘best match’ for an entity (no explicit

disambiguation required)

Muddy Boots

Uses DBpedia to find resource match and thendisambiguates using ‘disambiguates’ predicates and

comparing original story text to each resource

Lucene vs DBpedia ‘disambiguates’ predicates, both have different advantages - Lucene finds something every time, whereas disambiguating with the predicates provides less false matches but also less results

Page 14: BBC Muddy Boots 08

Entity classification and attribute selection

Example of classifying a person, use the predicates in DBpedia to perform classification, certain predicates only exist for a person

Page 15: BBC Muddy Boots 08

Sample of MuddyBoots output, classification of a BBC news article. Demonstrates ‘main actor’ discovery and automated microformatting and inclusion of extra content from DBpedia in a ‘featured actors’ sidebar. The inclusion of microformats means machines can now query this page in a more granular fashion

Page 16: BBC Muddy Boots 08

Added bonus of creating semantic links and using ‘web scale identifiers’. BBC Music beta aggregates around Music Brains identifiers, DBpedia knows about MusicBrainz, therefor we can provide news feeds for any artist on BBC Music beta using this relationship

Page 17: BBC Muddy Boots 08

The problems

Incorrect data in DBpedia/WikipediaTime sensitive data in DBpedia“Really tricky” disambiguations

Query and response times

The Queen, vs Queen is still a problem

Page 18: BBC Muddy Boots 08

The problems

Incorrect data in DBpedia/WikipediaTime sensitive data in DBpedia“Really tricky” disambiguations

Query and response times

The Queen, vs Queen is still a problem

Page 19: BBC Muddy Boots 08

Next steps

Testing phaseImproved NE classification

Speed improvementsAdd more entity types

Identify applications

Page 20: BBC Muddy Boots 08

Next steps

Testing phaseImproved NE classification

Speed improvementsAdd more entity types

Identify applications

Page 21: BBC Muddy Boots 08

http://www.flickr.com/photos/25094278@N02/2368194103/sizes/l/

Lets more away from silos of data, towards a shared, linked vision of our data