45
Se#ng Up a Living Lab for Informa3on Access Research Frank Hopfgartner, DAILabor, Technische Universität Berlin

Setting Up a Living Lab for Information Access Research

Embed Size (px)

Citation preview

Se#ng  Up  a  Living  Lab  for  Informa3on  Access  Research

Frank  Hopfgartner,  DAI-­‐Labor,  Technische  Universität  Berlin  

In  CLEF  NEWSREEL,  par3cipants  can  develop  news  recommendaBon  

algorithms  and  have  them  tested  by  millions  of  users  over  the  period  of  a  few  

months  in  a  living  lab.    

Why  am  I  here?  

Because  I  co-­‐organise  CLEF  NEWSREEL  

Overview  

Part  2  (Hands-­‐on  Experience)  

Part  1  (Academic  Overview)  Living  Labs  

(Introduc3on)  Living  Labs  for  IR  

Research   CLEF  NEWSREEL  

So  what  are  living  labs?  

Rely on feedback from real users to develop convincing demonstrators that showcase potentials of an idea or a product.

Real-life test and experimentation environment to fill the pre-commercial gap between fundamental research and innovation.

§  Na3onal  research  ini3a3ve  on  energy  efficiency  in  the  housing  and  traffic  domains  

§  Efficiency  House  Plus  is  a  small  power  plant  that  can  export  energy  surpluses  into  the  local  power  grid  

§  Equipped  with  1000  data  sources  such  as  movement  sensors,  weather  data,  etc.  

Example:  Efficiency  House  Plus  

[BMVBS,  2011]  

Source:  W

erne

r  Sob

ek  

What  can  be  studied?  

§  205  Smart  meters  §  39  Heat  pumps  §  74  Illumina3on  sensors  §  38  Photovoltaic  sensors  §  …  

1000  data  points  

Efficiency  House  Plus  with  electro  mobility,  Berlin  Research  inita3ve  of  BMVBS  

§  Detec3on  of  resident  presence  in  home  environment  §  Energy  consump3on  is  an  

indicator  for  presence,  but  some  devices  con3nually  consume  energy  

§  Recogni3on  of  resident  acBviBes  §  Draw  conclusions  about  user  

ac3vity  based  on  usage  of  home  appliances  

§  Recommenda3on  of  op3mized  heaBng  schedules  §  Gradually  learn  characteris3c  

behavior  to  create  personalized  schedules  for  hea3ng  control    

InnovaBon  

Data  Analysis  

QuesBons  to  be  addressed  

How  will  people  really  use  the  technology?  

Who  is  interested  in  my  product?  

What  is  the  willingness  to  

pay?  

Is  there  a  need  for  my  product?  

What  parameters  do  I  

need?  

Overview  

Part  2  (Hands-­‐on  Experience)  

Part  1  (Academic  Overview)  Living  Labs  

(Introduc3on)  Living  Labs  for  IR  

Research   CLEF  NEWSREEL  

Do  we  need  Living  Labs  for  IR  Research?  

Why?  

Cranfield (1962-1966)

Medlars  (1966-­‐1967)  

SMART  (1961-­‐1995)  

TREC  (1992-­‐today)  

NTCIR  /  CLEF  (1999/2000-­‐today)  

Let’s  have  a  look  at  the  history  of  IR  evalua3on  

Develop  system/algorithm  

Prepare  appropriate  dataset  

Perform  user  study  

Measure  performance  

Cranfield  EvaluaBon  Paradigm  

§  Use  standard  test  collec3on  (e.g.,  from  TREC)  with  documents,  relevance  assessments  and  search  tasks  

§  Create  your  own  test  collec3on  (domain  specific)  

§  Ask  users  to  perform  search  tasks  in  controlled  environment  §  Simulated  work  task  situa3on  

§  Standard  IR  evalua3on  metrics  §  Qualita3ve  Methods  

§  Baseline    §  Fancy  improvement  that  will  change  the  world  

Laboratory  SeVng  

Find  as  many  documents    as  possible  for  a  given  

search  task  

Act  naturally  while  I  watch    everything  you  are  doing  

I  tell  you  what  is  relevant!  

NOT SUITABLE

FOR RESEARCH ON

USER-CENTRED IR

EvaluaBon  of  User-­‐Centred  IR  (Personalised  Search)  

Context  

§  Country  §  Social  Connec3on  §  Locality  §  Personal  History    §  Mobile  Search  

Evalua3on  Issues  

§  Observer-­‐expectancy  effect  §  Atypical  search  task  §  Missing  context/background  §  Missing  incen3ve  to  sa3sfy  

own  informa3on  need  

Personalised  Search  

An  alternaBve  seVng  

Use  our  system  to  find  the  informa3on  you  are  

looking  for  

Use  the  system  whenever  you  want  for  

whatever  reason  

You  decide  what  you  

consider  to  be  relevant  

How  to  evaluate?  

User  SimulaBon  

[ECIR’08,  ACM  TOIS,  2011]  

Allows  fine  tuning  (White  et  al.,  2005)   But  does  not  replace  user  study  

A  /  B  tesBng  

Evaluate   submit  to  

SIGIR  

OK,  cool.  Go  for  it!  

Sure…  But  who  has  the  users?  

These  guys  have…  

OK,  then  let’s  pay  for  the  users…  

Evalua3on  campaigns  

Crowdsourcing  works  

Micro-­‐tasks  

§  Image  CLEF  (Nowak  and  Rüger,  2010)  

§  INEX  (Kazai  et  al.,  2011)  §  TREC  Blog  (McCreadie  et  al.,  2011)  §  MediaEval  (Loni  et  al.,  2013)  

§  Data  annota3on  §  Document  annota3on  §  Document  categorisa3on  §  Itera3ve  system  evalua3on  

Ac3va3ng  the  crowd  

§  Users  may  have  interest  in  annota3ng  items  that  they  know  well  

§  Users  may  be  apracted  by  incen3ves  to  annotate  items    

But…  

§  Personalised  search  needs  users  who  follow  their  own  informaBon  needs.  

§  Users  need  to  be  driven  by  their  own  intrinsic  moBvaBon.  

EXTRINSIC  Mo3va3on  

 Comes  from  the  outside  

INTRINSIC  Mo3va3on  

 Exists  

within  the  individual  

Therefore…  

“A  living  laboratory  on  the  Web  that  brings  researchers  and  searchers  together  is  needed  to  facilitate  ISSS  (Informa3on-­‐Seeking  Support  System)  evalua3on.”  

 Kelly  et  al.,  2009  

Living  Labs  for  IR  evaluaBon  

Local  domain  search   Newsreel  Product  search  

Real  users  interac3ng  with  a  system    following  their  own  informa3on  need  

RealisBc  se#ng  where  users  are  not  restricted    by  closed  laboratory  condic3ons  

Ideally:  Many  users  to  perform  A/B  tesBng  

Source  (Guinea  pig):  hpp://living-­‐labs.net/wp-­‐content/uploads/2014/05/livinglab.logo_.textunder.square200.png  

Privacy  and  security  

Challenges  

Legal  and  ethical  issues  

§  Hos3ng  data  on  secure  server  §  Gaining  subjects’  trust  §  Coping  with  need  for  privacy  §  Alterna3ves  when  individuals  will  not  

share  their  data  

§  User  consent  §  Ethics  approval  §  Trust  between  par3es  §  Copyright  issues  §  Commercial  sensi3vity  of  data  

Prac3cal  challenges  

§  Forming  living  labs  for  IR  partners  within  the  research  community  

§  Obtaining  commercial  partners  §  Defining  tasks  and  scenarios  for  

evalua3on  purposes  

Technical  challenges  

§  Designing  and  implemen3ng  living  labs  architecture  

§  Cost  of  implementa3on  §  Maintenance  and  adop3on  §  Managing  living  labs  infrastructure  

Source:  hpp://living-­‐labs.net/ll14/call-­‐for-­‐papers/  

Overview  

Part  2  (Hands-­‐on  Experience)  

Part  1  (Academic  Overview)  Living  Labs  

(Introduc3on)  Living  Labs  for  IR  

Research   CLEF  NEWSREEL  

   

In  CLEF  NEWSREEL,  par3cipants  can  develop  news  recommendaBon  

algorithms  and  have  them  tested  by  millions  of  users  over  the  period  of  a  few  

months  in  a  living  lab.  

 

Again…  

 Recommender  Systems  help  users  to  find  items  that  they  were  not  

searching  for.    

 

What  are  recommender  systems?  

Items?

§  First  living  lab  for  the  evalua3on  of  news  recommenda3on  algorithms  in  real-­‐3me  

§  Organised  as  plista  Contest,  as  a  challenge  at  ACM  RecSys’13  and  as  campaign-­‐style  evalua3on  lab  of  CLEF’14  

Example:  News  ArBcles  

Source  (Image):  T.  Brodt  of  plista.com  

OrganisaBon  (CLEF  NEWSREEL)  

Leading  provider  of  a  recommenda3on  and    adver3sement  network  in  Central  Europe  

Thousands  of  content  providers  rely  on  plista    to  generate  recommenda3ons  for  their    customers  (i.e.,  web  users)  

Applica3on-­‐oriented  research  on  smart    informa3on  systems  

Steering  Commipee  of  experts  from  the    fields  of  IR  and  RecSys  

Central  Innova3on  Programme  SME  

• Given a dataset, predict news articles a user will click on

Offline Evaluation

• Recommend articles in real-time over several months

Online Evaluation

CLEF  NEWSREEL  Tasks  

Started  in  November  2013  

TASK

 1  

TASK

 2  

@clefnewsreel   hpp://www.clef-­‐newsreel.org/  

Predict  interac3ons  based  on  an  OFFLINE  dataset  

Task  1:  Offline  EvaluaBon  DA

TASET  

EVAL

UAT

ION  

§  Traffic  and  content  updates  of  9  German-­‐language  news  content  provider  websites  

§  Traffic:  Reading  ar3cle,  clicking  on  recommenda3ons  

§  Updates:  adding  and  upda3ng  news  ar3cles  

§  Recorded  in  June  2013  §  65  GB,  84  Million  records  §  [Kille  et  al.,  2013]  

§  Dataset  split  into  different  Bme  segments  

§  Par3cipants  have  to  predict  interacBons  of  these  segments  

§  Quality  measured  by  the  ra3o  of  successful  predic3ons  by  the  total  number  of  predic3ons  

Recommend  news  ar3cles  in  REAL-­‐TIME  

Task  2:  Online  EvaluaBon  LIVING  LAB

 

EVAL

UAT

ION  

§  Provide  recommenda3ons  for  visitors  of  the  news  portals  of  plista’s  customers  

§  Ten  portals  (local  news,  sports,  business,  technology)    

§  Communica3on  via  Open  Recommender  Plaworm  (ORP)  

§  Provide  recommenda3ons  within  <100ms  (VM  provided  if  necessary)  

§  Three  pre-­‐defined  evalua3on  periods  §  5-­‐23  February  2014  §  1-­‐14  April  2014  §  5-­‐19  May  2014  

§  Evalua3on  criteria  §  Number  of  clicks  §  Number  of  requests  §  Click-­‐through  rate  

Living  Lab  Scenario  

…  

Publisher  A  

Publisher  n  

Researcher  1  

Researcher  n  

…  plista  ORP  

…  

Millions  of  visitors   Publishers   Teams  

Open  Recommender  Plaform  

More  about  it  later  

Number  of  clicks  

Number  of  requests  

Click-­‐Through  Rate  

Privacy  and  security  

Challenges  

Legal  and  ethical  issues  

§  Hos3ng  data  on  secure  server  §  Gaining  subjects’  trust  §  Coping  with  need  for  privacy  §  Alterna3ves  when  individuals  will  not  

share  their  data  

§  User  consent  §  Ethics  approval  §  Trust  between  par3es  §  Copyright  issues  §  Commercial  sensi3vity  of  data  

Prac3cal  challenges  

§  Forming  living  labs  for  partners  within  the  research  community  

§  Obtaining  commercial  partners  §  Defining  tasks  and  scenarios  for  

evalua3on  purposes  

Technical  challenges  

§  Designing  and  implemen3ng  living  labs  architecture  

§  Cost  of  implementa3on  §  Maintenance  and  adop3on  §  Managing  living  labs  infrastructure  

Source:  hpp://living-­‐labs.net/ll14/call-­‐for-­‐papers/  

§  Hos3ng  data  on  secure  server  

§  Gaining  subjects’  trust  §  Coping  with  need  for  privacy  §  Alterna3ves  when  

individuals  will  not  share  their  data  

Privacy  and  security  

§  No  search  queries  are  provided.  

§  Data  stream  is  pseudo-­‐mized,  i.e.,  users  cannot  be  iden3fied  based  on  their  IP  or  search  queries.  

§  User  consent  §  Ethics  approval  §  Trust  between  par3es  §  Copyright  issues  §  Commercial  sensi3vity  of  

data  

Legal  and  ethical  issues  

§  Researchers  do  not  interact  with  users.  

§  Business  rela3on  of  plista  and  their  customers.  

§  Par3cipants  have  to  agree  to  terms  before  par3cipa3ng.  

§  Designing  and  implemen3ng  living  labs  architecture  

§  Cost  of  implementa3on  §  Maintenance  and  adop3on  §  Managing  living  labs  

infrastructure  

Technical  challenges  

§  Infrastructure  developed  in  context  of  research  project  EPEN.  

§  Constantly  monitor  the  system.  

 

§  Forming  living  labs  for  partners  within  the  research  community  

§  Obtaining  commercial  partners  

§  Defining  tasks  and  scenarios  for  evalua3on  purposes  

PracBcal  challenges  

§  Always  keep  in  contact  with  your  par3cipants.  

§  Adver3se.  §  Make  sure  no  one  can  

cheat!  §  It’s  a  Win-­‐Win-­‐Win-­‐Win  

situa3on.  (-­‐>  Torben)    

Acknowledgement  CO

-­‐ORG

ANISER

S  

STEERING  COMMITTEE  

§  Andreas  Lommatzsch  §  Benjamin  Kille  §  Till  Plumbaum  §  Torben  Brodt  §  Tobias  Heintz  

§  Pablo  Castells  §  Paolo  Cremonesi  §  Hideo  Hoho  §  Udo  Kruschwitz  §  Joemon  M.  Jose  §  Mounia  Lalmas  §  Martha  Larson  §  Jimmy  Lin  §  Vivien  Petras  §  Domonkos  Tikk  

www.dai-­‐labor.de/~hopfgartner/  

Fon  Fax  

+49  (0)  30  /  314  –  74  +49  (0)  30  /  314  –  74  003  

DAI-­‐Labor  

Technische  Universität  Berlin  Fakultät  IV  –    Elektrotechnik  &  Informa3k  

Ernst-­‐Reuter-­‐Platz  7  10587  Berlin,  Deutschland  

Distributed  Ar3ficial  Intelligence  Laboratory  

Frank  Hopfgartner,  PhD  

@OkapiBM25  

Director  of  Competence  Center  Informa3on  Retrieval  and  Machine  Learning    

frank.hopfgartner@tu-­‐berlin.de  202  

Thank  you