Semanticnews 230913-final

Mark A Greenwood, Jonathon Hare, David R Newman, Wim Peters

SemanticMedia@TheBritishLibraryMonday 23rd September 2013

The Project Vision• Semantic News is 6 month project:• June to November 2013• Two 50% FTEs (1 Southampton, 1 Sheffield)

• An interactive `second screen’ to provide contextual information on Question Time questions• Use multiple data sources• Perform named entity recognition• Exploit Linked Open Datasets• Towards an almost real-time system

Where is the Data? (1)• Question Time in

2010• 34 episodes, 163

questions• BBC Subtitles• XML encoded• Broadcast as the

subtitles stream

Where is the Data? (2)• BBC Programmes Data• XML encoded• Information about the

programme, (panellists, topics, broadcast dates, etc.)

• Tweets• Taken from the Twitter

‘Garden Hose’ (10% stream)

Pre-parsing Subtitles Data• Raw XML subtitles• Remove duplicate words• Parse into CSV • time offset• sentence

• Break into questions• BBC Programmes data provides question time

offsets • Compare with subtitles time offsets and split

Pre-parsing Twitter Data• Twitter ‘Garden Hose’ for 2010 Dataset• Used Apache Hadoop and filtered on:• @bbcqt, @bbcquestiontime• #bbcqt, #bbcquestiontime, #questiontime• “Question Time” “David Dimbleby”

• Collated JSON results and imported into OpenRefine• Removed irrelevant fields• Filtered out tweets that did not contain “bbc”• Exported as CSV

Information Extraction with GATE● General Architecture for Text Engineering (GATE)

● Developed by University of Sheffield since 2000● Used by many researchers, scientists and

organisations all over the world● Includes various components for language processing

● Parsers, machine learning tools, stemmers, IR tools, IE components for various languages...

● Also performs visualising and manipulating of text, annotations, ontologies, parse trees, etc., and tools for evaluation

Linguistic pre-processing● Techniques

● Tokenization● Sentence Splitting● Language Identification● POS tagging● Morphological analysis

● Adapted for use with social media like Twitter

Named Entity Recognition● Approaches

● Gazetteer lookup● JAPE grammars● Co-reference

● Types● Location: countries, regions, cities etc.● Organisation: names of companies, government organisations,

committees, agencies, universities, etc.● Person: names of people ● Date: absolute dates like ‘October 2012’ or ‘2007’, as well as

relative dates, such as ‘last year’. ● Measurements: e.g. “8,596 km”, “one fifth”, percentages and

probabilities

Enrichment: LODIE● Under constant development in various projects

● Associates the most probable LOD URI with named entities

● Disambiguation against DBPedia

● Various techniques to enhance recall

Enrichment: LODIE

“Ken Clarke: The Labour plotters hide behind the knife and stab with the cloak! Brilliant!!”

“Hain just lost Labour votes by supporting the £25k �benefits of an extremist.”

Representing Extracted Information

Conceptualising a Question

http://www.youtube.com/watch?v=O3l9Mi-KylI

Show Me The Data!• Use (Linked) Open Data Datasets• Crime Data• Election Data (constituencies, majorities, etc.)• MP voting records• School league tables• NHS performance league tables• Economic Figures (GDP, Inflation, Unemployment)

• Compare and contrast

Let’s have some questions from our audience.

Semanticnews 230913-final

Technology

Mkt Final Final Final

TESIS FINAL FINAL FINAL

Thesis Final Final Final

Tesis Final Finaaaaal Final Final Final

Rex i 230913

Trabajo Final Final Final

Final final final powerpoint

Detalles constructivos laminas final final final final

Final, Final, Final Test

Rapport Partiel 26/09 Réf BC760-230913

28995-UNITED SSE 50 CHINA ETF Pros A4 230913 v3internetfileserver.phillip.com.sg/Poems/stocks/etf/eut/Prospectus/sse50_etf.pdf · UNITED SSE 50 CHINA ETF Directory Managers UOB Asset

Final Final Final Final

Magissa JDR Edanna-230913

LF CCB250 230913.qxp LFDS template - Farnell element14 · 2017. 10. 3. · xppower.com 3 Start Up Delay From AC Turn On Figure. 2 V1 & V2 start up example from AC turn on (580 ms)

FINAL FINAL FINAL FINAL FINAL FINAL FY 2019-20 OVERALL …FINAL FINAL FINAL FINAL FINAL FINAL FINAL MAY 2019. FY 2019-2020 . OVERALL WORK PROGRAM FOR THE SAN FRANCISCO BAY AREA. David

Opening shots final final final final final final

Fibromialgia Final Final Final

IL DIRIGENTE SCOLASTICO prof.ssa Ada Vendrame...Ministero dell'Istruzione Istituto Comprensivo n. 5 “Luigi Coletti”Via Abruzzo, 1 – 31100 TREVISO - Tel. 0422 230913 C.F. 94136070268

SPRINT - Moped Division A55... · Tomos Sprint Spare parts manuals Tomos reserves the right to make modification without notice. No. * Part No. Denomination B D 1 230913 STARTER SHAFT-pedal

Contraloría del Estado | Contraloría del Estado · Acta de ini io de Auditoría AUD-DIR/JAL/CD. JUDICIAL- SEPAF/13 VIENE DEL FOLIO 230913-03-SEPAF----— FOLIO: POR LA CONTRAL Mtro