25
WP4: TV Data Text Enrichment Pavel Mihaylov (OT) and partners

NoTube: Metadata Enrichment

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: NoTube: Metadata Enrichment

WP4:  TV  Data  Text  Enrichment  

Pavel  Mihaylov  (OT)  and  partners  

Page 2: NoTube: Metadata Enrichment

Contents  

Ontotext  and  its  role  in  the  project  

WP4:  text,  audio  and  video  

Goals  and  achievements  

Demo  

Conclusions  

26-­‐27  March  2012   NoTube  3rd  review   2  

Page 3: NoTube: Metadata Enrichment

•  Seman5c  technology  developer  est.  in  2000  –  Staff:  65  employees  and  mulMple  contractors  

•  Global  leader  in  semanMc  technologies  –  Seman5c  Databases:  high  performance  RDF  DBMS,  scalable  reasoning  

–  Seman5c  Search:  text-­‐mining  (IE),  InformaMon  Retrieval  (IR)  

–  Web  Mining:  focused  crawling,  screen  scraping,  data  fusion    

•  Role  in  NoTube  –  WP4  leader  

–  Seman5c  Enrichment  

–  Experience  from  mulMple  European  projects  

26-­‐27  March  2012   NoTube  3rd  review   3  

Page 4: NoTube: Metadata Enrichment

WP4:  Content  Enrichment  

Content  •  Text:  EPGs,  programme  descripMons  

•  Audio  •  Video  

Enrichment  •  Adding  metadata  •  Content  about  content  

26-­‐27  March  2012   NoTube  3rd  review   4  

Page 5: NoTube: Metadata Enrichment

Goal:  Text  enrichment  

SemanMc  annotaMon  component  

Recognising  items  of  interest  in  text  

Assigning  links  to  Linked  Open  Data  

• Analyses  short  or  free-­‐text  text  segments  

• Extends  them  with  further  world  knowledge  

26-­‐27  March  2012   NoTube  3rd  review   5  

Page 6: NoTube: Metadata Enrichment

Goal:  Text  enrichment  (2)  

26-­‐27  March  2012   NoTube  3rd  review   6  

Live  at  the  Apollo  2/6  Not  Going  Out  star  Lee  Mack  presents  sets  from  American  comic  Rich  Hall  and  Scotland’s  very  own  Danny  Bhoy.  

Page 7: NoTube: Metadata Enrichment

Goal:  MulMlingual  

TV  world  

English  

German  

Italian  

Dutch  

Arabic  Bulgarian  

French  

Korean  

Turkish  

26-­‐27  March  2012   NoTube  3rd  review   7  

Page 8: NoTube: Metadata Enrichment

Goal:  Graph  enrichment  

• EnMMes  extracted  from  text  

Build  upon  basic  enrichment  

• Follow  a  chain  of  LOD  predicates  

Exploit  relaMons  in  SemanMc  Repository  

• A  richer  set  of  enMMes  

Enrich  the  basic  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   8  

Page 9: NoTube: Metadata Enrichment

Goal:  Graph  enrichment  (2)  

26-­‐27  March  2012   NoTube  3rd  review   9  

Page 10: NoTube: Metadata Enrichment

Goal:  Graph  enrichment  (3)  

• Film  • TelevisionShow  • Work  • Band/MusicalArMst  • Actor  • Place  

Classes  to  enrich  

26-­‐27  March  2012   NoTube  3rd  review   10  

Page 11: NoTube: Metadata Enrichment

Film  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   11  

•   Film  class  •   At  least  one  common  indirect  relaMon  

Page 12: NoTube: Metadata Enrichment

•   TelevisionShow  class  •   At  least  two  common  indirect  relaMons  

TelevisionShow  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   12  

Page 13: NoTube: Metadata Enrichment

•   Work  except  Film  and  TelevisionShow  •   At  least  one  common  indirect  rela?on  

Work  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   13  

Page 14: NoTube: Metadata Enrichment

Band/MusicalArMst  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   14  

•   Band  and  MusicalAr5st  •   At  least  one  direct  rela?on  

Page 15: NoTube: Metadata Enrichment

Actor  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   15  

•   Actor  class  •   Starring  relaMon  from  at  least  two  common  Works  

Page 16: NoTube: Metadata Enrichment

Place  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   16  

•   Place  class  •   At  least  one  direct  rela?on  

Page 17: NoTube: Metadata Enrichment

Lupedia  

Text  enrichment  service  

•  Input:  plain  text,  e.g.  programme  descripMons  • Output:  Linked  Open  Data  enrichment  •  XML,  json,  RDFa  

•  Features:  • MulMlingual  • Graph  enrichment  • MulMple  vocabularies  •  Configurable  •  Fast  

26-­‐27  March  2012   NoTube  3rd  review   17  

Page 18: NoTube: Metadata Enrichment

Lupedia  over  Mme  

Becer  service  

MulMlingualism  

New  matching  opMons  and  

filters  

HeurisMcs  

Predicate,  heurisMcs  and  class  weights  

DisambiguaMon  Most  specific  class  in  output  

MulMple  vocabularies  

Selectable  vocabulary  

Graph  enrichment  

26-­‐27  March  2012   NoTube  3rd  review   18  

Page 19: NoTube: Metadata Enrichment

EvaluaMon  summary  

Lupedia  compared  to  OpenCalais  and  AlchemyAPI  

•  Only  two  other  similar  services  •  Much  becer  coverage  than  either  of  them  •  Comparable  precision  

•  Custom  vocabularies  &  filters  •  Tuned  to  TV  domain  

Lupedia  is  a  unique  service  

26-­‐27  March  2012   NoTube  3rd  review   19  

Page 20: NoTube: Metadata Enrichment

Links  to  other  WPs  

•  EnMty  URIs  point  to  WP1  models  

WP1  

•  Lupedia  in  NLP  based  profiling  and  enrichment  

WP3  

•  Lupedia  in  SmartLink  and  Watch’n’Buy  

WP5  

•  IntegraMon,  enrichment  in  demo  apps  

WP6  

•  7a  news  enrichment  

•  7c  programme  descripMon  enrichment  

WP7  26-­‐27  March  2012   NoTube  3rd  review   20  

Page 21: NoTube: Metadata Enrichment

Lupedia  demo  

26-­‐27  March  2012   NoTube  3rd  review   21  

http://lupedia.ontotext.com/

Page 22: NoTube: Metadata Enrichment

Emerging  compeMMon  

Lupedia   Yahoo   WikiMachine   En5tyPedia  

LOD  output   DBpedia  &  LinkedMDB  

DBpedia   ?  

MulMlingual   ar,  bg,  nl,  en,  fr,  de,  it,  ko,  tr  

en,  zh   en,  pt,  it   ?  

Confidence   yes   yes   yes   ?  

Graph  enrichment  

yes   yes*   no   ?  

Remark   Tuned  to  TV  domain,  one  of  the  pioneers  

No  direct  access  to  LOD,  graph  enrichment  too  abstract  

Too  generic,  precision  seems  lower  

Not  yet  released  

26-­‐27  March  2012   NoTube  3rd  review   22  

Page 23: NoTube: Metadata Enrichment

Lessons  &  Impact  

Lessons  learnt:  • Emerging  similar  services  clearly  show  the  need  for  such  services  

• Coverage  and  language  support  are  important  

Lupedia  recognised  as  one  of  the  major  players  and  included  in  NERD:  • AggregaMng  named  enMty  services  and  comparing  their  performance  

• hcp://nerd.eurecom.fr  

Various  partners  willing  to  use  

Lupedia  in  other  projects  

26-­‐27  March  2012   NoTube  3rd  review   23  

Page 24: NoTube: Metadata Enrichment

Life  aker  NoTube  

Will  be  kept  alive  as  a  

demo  service  

Closed  source  

Possibly  an  OpenCalais-­‐like  service  in  

future  

26-­‐27  March  2012   NoTube  3rd  review   24  

Page 25: NoTube: Metadata Enrichment

QuesMons?  

26-­‐27  March  2012   NoTube  3rd  review   25