Setting Up a Living Lab for Information Access Research

Se#ng Up a Living Lab for Informa3on Access Research

Frank Hopfgartner, DAI-‐Labor, Technische Universität Berlin

In CLEF NEWSREEL, par3cipants can develop news recommendaBon

algorithms and have them tested by millions of users over the period of a few

months in a living lab.

Why am I here?

Because I co-‐organise CLEF NEWSREEL

Overview

Part 2 (Hands-‐on Experience)

Part 1 (Academic Overview) Living Labs

(Introduc3on) Living Labs for IR

Research CLEF NEWSREEL

So what are living labs?

Rely on feedback from real users to develop convincing demonstrators that showcase potentials of an idea or a product.

Real-life test and experimentation environment to fill the pre-commercial gap between fundamental research and innovation.

§  Na3onal research ini3a3ve on energy efficiency in the housing and traffic domains

§  Efficiency House Plus is a small power plant that can export energy surpluses into the local power grid

§  Equipped with 1000 data sources such as movement sensors, weather data, etc.

Example: Efficiency House Plus

[BMVBS, 2011]

Source: W

erne

r Sob

ek

What can be studied?

§  205 Smart meters §  39 Heat pumps §  74 Illumina3on sensors §  38 Photovoltaic sensors §  …

1000 data points

Efficiency House Plus with electro mobility, Berlin Research inita3ve of BMVBS

§  Detec3on of resident presence in home environment §  Energy consump3on is an

indicator for presence, but some devices con3nually consume energy

§  Recogni3on of resident acBviBes §  Draw conclusions about user

ac3vity based on usage of home appliances

§  Recommenda3on of op3mized heaBng schedules §  Gradually learn characteris3c

behavior to create personalized schedules for hea3ng control

InnovaBon

Data Analysis

QuesBons to be addressed

How will people really use the technology?

Who is interested in my product?

What is the willingness to

pay?

Is there a need for my product?

What parameters do I

need?

Overview





Do we need Living Labs for IR Research?

Why?

Cranfield (1962-1966)

Medlars (1966-‐1967)

SMART (1961-‐1995)

TREC (1992-‐today)

NTCIR / CLEF (1999/2000-‐today)

Let’s have a look at the history of IR evalua3on

Develop system/algorithm

Prepare appropriate dataset

Perform user study

Measure performance

Cranfield EvaluaBon Paradigm

§  Use standard test collec3on (e.g., from TREC) with documents, relevance assessments and search tasks

§  Create your own test collec3on (domain specific)

§  Ask users to perform search tasks in controlled environment §  Simulated work task situa3on

§  Standard IR evalua3on metrics §  Qualita3ve Methods

§  Baseline §  Fancy improvement that will change the world

Laboratory SeVng

Find as many documents as possible for a given

search task

Act naturally while I watch everything you are doing

I tell you what is relevant!

NOT SUITABLE

FOR RESEARCH ON

USER-CENTRED IR

EvaluaBon of User-‐Centred IR (Personalised Search)

Context

§  Country §  Social Connec3on §  Locality §  Personal History §  Mobile Search

Evalua3on Issues

§  Observer-‐expectancy effect §  Atypical search task §  Missing context/background §  Missing incen3ve to sa3sfy

own informa3on need

Personalised Search

An alternaBve seVng

Use our system to find the informa3on you are

looking for

Use the system whenever you want for

whatever reason

You decide what you

consider to be relevant

How to evaluate?

User SimulaBon

[ECIR’08, ACM TOIS, 2011]

Allows fine tuning (White et al., 2005) But does not replace user study

A / B tesBng

Evaluate submit to

SIGIR

OK, cool. Go for it!

Sure… But who has the users?

These guys have…

OK, then let’s pay for the users…

Evalua3on campaigns

Crowdsourcing works

Micro-‐tasks

§  Image CLEF (Nowak and Rüger, 2010)

§  INEX (Kazai et al., 2011) §  TREC Blog (McCreadie et al., 2011) §  MediaEval (Loni et al., 2013)

§  Data annota3on §  Document annota3on §  Document categorisa3on §  Itera3ve system evalua3on

Ac3va3ng the crowd

§  Users may have interest in annota3ng items that they know well

§  Users may be apracted by incen3ves to annotate items

But…

§  Personalised search needs users who follow their own informaBon needs.

§  Users need to be driven by their own intrinsic moBvaBon.

EXTRINSIC Mo3va3on

Comes from the outside

INTRINSIC Mo3va3on

Exists

within the individual

Therefore…

“A living laboratory on the Web that brings researchers and searchers together is needed to facilitate ISSS (Informa3on-‐Seeking Support System) evalua3on.”

Kelly et al., 2009

Living Labs for IR evaluaBon

Local domain search Newsreel Product search

Real users interac3ng with a system following their own informa3on need

RealisBc se#ng where users are not restricted by closed laboratory condic3ons

Ideally: Many users to perform A/B tesBng

Source (Guinea pig): hpp://living-‐labs.net/wp-‐content/uploads/2014/05/livinglab.logo_.textunder.square200.png

Privacy and security

Challenges

Legal and ethical issues

§  Hos3ng data on secure server §  Gaining subjects’ trust §  Coping with need for privacy §  Alterna3ves when individuals will not

share their data

§  User consent §  Ethics approval §  Trust between par3es §  Copyright issues §  Commercial sensi3vity of data

Prac3cal challenges

§  Forming living labs for IR partners within the research community

§  Obtaining commercial partners §  Defining tasks and scenarios for

evalua3on purposes

Technical challenges

§  Designing and implemen3ng living labs architecture

§  Cost of implementa3on §  Maintenance and adop3on §  Managing living labs infrastructure

Source: hpp://living-‐labs.net/ll14/call-‐for-‐papers/

Overview





In CLEF NEWSREEL, par3cipants can develop news recommendaBon

algorithms and have them tested by millions of users over the period of a few

months in a living lab.

Again…

Recommender Systems help users to find items that they were not

searching for.

What are recommender systems?

Items?

§  First living lab for the evalua3on of news recommenda3on algorithms in real-‐3me

§  Organised as plista Contest, as a challenge at ACM RecSys’13 and as campaign-‐style evalua3on lab of CLEF’14

Example: News ArBcles

Source (Image): T. Brodt of plista.com

OrganisaBon (CLEF NEWSREEL)

Leading provider of a recommenda3on and adver3sement network in Central Europe

Thousands of content providers rely on plista to generate recommenda3ons for their customers (i.e., web users)

Applica3on-‐oriented research on smart informa3on systems

Steering Commipee of experts from the fields of IR and RecSys

Central Innova3on Programme SME

• Given a dataset, predict news articles a user will click on

Offline Evaluation

• Recommend articles in real-time over several months

Online Evaluation

CLEF NEWSREEL Tasks

Started in November 2013

TASK

1

TASK

2

@clefnewsreel hpp://www.clef-‐newsreel.org/

Predict interac3ons based on an OFFLINE dataset

Task 1: Offline EvaluaBon DA

TASET

EVAL

UAT

ION

§  Traffic and content updates of 9 German-‐language news content provider websites

§  Traffic: Reading ar3cle, clicking on recommenda3ons

§  Updates: adding and upda3ng news ar3cles

§  Recorded in June 2013 §  65 GB, 84 Million records §  [Kille et al., 2013]

§  Dataset split into different Bme segments

§  Par3cipants have to predict interacBons of these segments

§  Quality measured by the ra3o of successful predic3ons by the total number of predic3ons

Recommend news ar3cles in REAL-‐TIME

Task 2: Online EvaluaBon LIVING LAB

EVAL

UAT

ION

§  Provide recommenda3ons for visitors of the news portals of plista’s customers

§  Ten portals (local news, sports, business, technology)

§  Communica3on via Open Recommender Plaworm (ORP)

§  Provide recommenda3ons within <100ms (VM provided if necessary)

§  Three pre-‐defined evalua3on periods §  5-‐23 February 2014 §  1-‐14 April 2014 §  5-‐19 May 2014

§  Evalua3on criteria §  Number of clicks §  Number of requests §  Click-‐through rate

Living Lab Scenario

…

Publisher A

Publisher n

Researcher 1

Researcher n

… plista ORP

…

Millions of visitors Publishers Teams

Open Recommender Plaform

More about it later

Number of clicks

Number of requests

Click-‐Through Rate


Challenges


§  Hos3ng data on secure server §  Gaining subjects’ trust §  Coping with need for privacy §  Alterna3ves when individuals will not

share their data

§  User consent §  Ethics approval §  Trust between par3es §  Copyright issues §  Commercial sensi3vity of data

Prac3cal challenges

§  Forming living labs for partners within the research community

§  Obtaining commercial partners §  Defining tasks and scenarios for

evalua3on purposes



§  Cost of implementa3on §  Maintenance and adop3on §  Managing living labs infrastructure

Source: hpp://living-‐labs.net/ll14/call-‐for-‐papers/

§  Hos3ng data on secure server

§  Gaining subjects’ trust §  Coping with need for privacy §  Alterna3ves when

individuals will not share their data


§  No search queries are provided.

§  Data stream is pseudo-‐mized, i.e., users cannot be iden3fied based on their IP or search queries.

§  User consent §  Ethics approval §  Trust between par3es §  Copyright issues §  Commercial sensi3vity of

data


§  Researchers do not interact with users.

§  Business rela3on of plista and their customers.

§  Par3cipants have to agree to terms before par3cipa3ng.


§  Cost of implementa3on §  Maintenance and adop3on §  Managing living labs

infrastructure


§  Infrastructure developed in context of research project EPEN.

§  Constantly monitor the system.

§  Forming living labs for partners within the research community

§  Obtaining commercial partners

§  Defining tasks and scenarios for evalua3on purposes

PracBcal challenges

§  Always keep in contact with your par3cipants.

§  Adver3se. §  Make sure no one can

cheat! §  It’s a Win-‐Win-‐Win-‐Win

situa3on. (-‐> Torben)

Acknowledgement CO

-‐ORG

ANISER

S

STEERING COMMITTEE

§  Andreas Lommatzsch §  Benjamin Kille §  Till Plumbaum §  Torben Brodt §  Tobias Heintz

§  Pablo Castells §  Paolo Cremonesi §  Hideo Hoho §  Udo Kruschwitz §  Joemon M. Jose §  Mounia Lalmas §  Martha Larson §  Jimmy Lin §  Vivien Petras §  Domonkos Tikk

www.dai-‐labor.de/~hopfgartner/

Fon Fax

+49 (0) 30 / 314 – 74 +49 (0) 30 / 314 – 74 003

DAI-‐Labor

Technische Universität Berlin Fakultät IV – Elektrotechnik & Informa3k

Ernst-‐Reuter-‐Platz 7 10587 Berlin, Deutschland

Distributed Ar3ficial Intelligence Laboratory

Frank Hopfgartner, PhD

@OkapiBM25

Director of Competence Center Informa3on Retrieval and Machine Learning

frank.hopfgartner@tu-‐berlin.de 202

Thank you

Data & Analytics

Setting Up a Living Lab for Information Access Research