NoTube: Metadata Enrichment

Preview:

DESCRIPTION

 

Citation preview

WP4:  TV  Data  Text  Enrichment  

Pavel  Mihaylov  (OT)  and  partners  

Contents  

Ontotext  and  its  role  in  the  project  

WP4:  text,  audio  and  video  

Goals  and  achievements  

Demo  

Conclusions  

26-­‐27  March  2012   NoTube  3rd  review   2  

•  Seman5c  technology  developer  est.  in  2000  –  Staff:  65  employees  and  mulMple  contractors  

•  Global  leader  in  semanMc  technologies  –  Seman5c  Databases:  high  performance  RDF  DBMS,  scalable  reasoning  

–  Seman5c  Search:  text-­‐mining  (IE),  InformaMon  Retrieval  (IR)  

–  Web  Mining:  focused  crawling,  screen  scraping,  data  fusion    

•  Role  in  NoTube  –  WP4  leader  

–  Seman5c  Enrichment  

–  Experience  from  mulMple  European  projects  

26-­‐27  March  2012   NoTube  3rd  review   3  

WP4:  Content  Enrichment  

Content  •  Text:  EPGs,  programme  descripMons  

•  Audio  •  Video  

Enrichment  •  Adding  metadata  •  Content  about  content  

26-­‐27  March  2012   NoTube  3rd  review   4  

Goal:  Text  enrichment  

SemanMc  annotaMon  component  

Recognising  items  of  interest  in  text  

Assigning  links  to  Linked  Open  Data  

• Analyses  short  or  free-­‐text  text  segments  

• Extends  them  with  further  world  knowledge  

26-­‐27  March  2012   NoTube  3rd  review   5  

Goal:  Text  enrichment  (2)  

26-­‐27  March  2012   NoTube  3rd  review   6  

Live  at  the  Apollo  2/6  Not  Going  Out  star  Lee  Mack  presents  sets  from  American  comic  Rich  Hall  and  Scotland’s  very  own  Danny  Bhoy.  

Goal:  MulMlingual  

TV  world  

English  

German  

Italian  

Dutch  

Arabic  Bulgarian  

French  

Korean  

Turkish  

26-­‐27  March  2012   NoTube  3rd  review   7  

Goal:  Graph  enrichment  

• EnMMes  extracted  from  text  

Build  upon  basic  enrichment  

• Follow  a  chain  of  LOD  predicates  

Exploit  relaMons  in  SemanMc  Repository  

• A  richer  set  of  enMMes  

Enrich  the  basic  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   8  

Goal:  Graph  enrichment  (2)  

26-­‐27  March  2012   NoTube  3rd  review   9  

Goal:  Graph  enrichment  (3)  

• Film  • TelevisionShow  • Work  • Band/MusicalArMst  • Actor  • Place  

Classes  to  enrich  

26-­‐27  March  2012   NoTube  3rd  review   10  

Film  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   11  

•   Film  class  •   At  least  one  common  indirect  relaMon  

•   TelevisionShow  class  •   At  least  two  common  indirect  relaMons  

TelevisionShow  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   12  

•   Work  except  Film  and  TelevisionShow  •   At  least  one  common  indirect  rela?on  

Work  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   13  

Band/MusicalArMst  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   14  

•   Band  and  MusicalAr5st  •   At  least  one  direct  rela?on  

Actor  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   15  

•   Actor  class  •   Starring  relaMon  from  at  least  two  common  Works  

Place  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   16  

•   Place  class  •   At  least  one  direct  rela?on  

Lupedia  

Text  enrichment  service  

•  Input:  plain  text,  e.g.  programme  descripMons  • Output:  Linked  Open  Data  enrichment  •  XML,  json,  RDFa  

•  Features:  • MulMlingual  • Graph  enrichment  • MulMple  vocabularies  •  Configurable  •  Fast  

26-­‐27  March  2012   NoTube  3rd  review   17  

Lupedia  over  Mme  

Becer  service  

MulMlingualism  

New  matching  opMons  and  

filters  

HeurisMcs  

Predicate,  heurisMcs  and  class  weights  

DisambiguaMon  Most  specific  class  in  output  

MulMple  vocabularies  

Selectable  vocabulary  

Graph  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   18  

EvaluaMon  summary  

Lupedia  compared  to  OpenCalais  and  AlchemyAPI  

•  Only  two  other  similar  services  •  Much  becer  coverage  than  either  of  them  •  Comparable  precision  

•  Custom  vocabularies  &  filters  •  Tuned  to  TV  domain  

Lupedia  is  a  unique  service  

26-­‐27  March  2012   NoTube  3rd  review   19  

Links  to  other  WPs  

•  EnMty  URIs  point  to  WP1  models  

WP1  

•  Lupedia  in  NLP  based  profiling  and  enrichment  

WP3  

•  Lupedia  in  SmartLink  and  Watch’n’Buy  

WP5  

•  IntegraMon,  enrichment  in  demo  apps  

WP6  

•  7a  news  enrichment  

•  7c  programme  descripMon  enrichment  

WP7  26-­‐27  March  2012   NoTube  3rd  review   20  

Lupedia  demo  

26-­‐27  March  2012   NoTube  3rd  review   21  

http://lupedia.ontotext.com/

Emerging  compeMMon  

Lupedia   Yahoo   WikiMachine   En5tyPedia  

LOD  output   DBpedia  &  LinkedMDB  

DBpedia   ?  

MulMlingual   ar,  bg,  nl,  en,  fr,  de,  it,  ko,  tr  

en,  zh   en,  pt,  it   ?  

Confidence   yes   yes   yes   ?  

Graph  enrichment  

yes   yes*   no   ?  

Remark   Tuned  to  TV  domain,  one  of  the  pioneers  

No  direct  access  to  LOD,  graph  enrichment  too  abstract  

Too  generic,  precision  seems  lower  

Not  yet  released  

26-­‐27  March  2012   NoTube  3rd  review   22  

Lessons  &  Impact  

Lessons  learnt:  • Emerging  similar  services  clearly  show  the  need  for  such  services  

• Coverage  and  language  support  are  important  

Lupedia  recognised  as  one  of  the  major  players  and  included  in  NERD:  • AggregaMng  named  enMty  services  and  comparing  their  performance  

• hcp://nerd.eurecom.fr  

Various  partners  willing  to  use  

Lupedia  in  other  projects  

26-­‐27  March  2012   NoTube  3rd  review   23  

Life  aker  NoTube  

Will  be  kept  alive  as  a  

demo  service  

Closed  source  

Possibly  an  OpenCalais-­‐like  service  in  

future  

26-­‐27  March  2012   NoTube  3rd  review   24  

QuesMons?  

26-­‐27  March  2012   NoTube  3rd  review   25  

Recommended