30
The CLEF 2005 The CLEF 2005 interactive track interactive track (iCLEF) (iCLEF) Julio Gonzalo Julio Gonzalo 1 , Paul Clough , Paul Clough 2 and and Alessandro Vallin Alessandro Vallin 3 1 Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia 2 Department of Information Studies, Department of Information Studies, University of Sheffield, UK University of Sheffield, UK 3 ITC-irst, Trento, Italy ITC-irst, Trento, Italy

The CLEF 2005 interactive track (iCLEF)

  • Upload
    genero

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

The CLEF 2005 interactive track (iCLEF). Julio Gonzalo 1 , Paul Clough 2 and Alessandro Vallin 3 1 Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a Distancia 2 Department of Information Studies, University of Sheffield, UK 3 ITC-irst, Trento, Italy. - PowerPoint PPT Presentation

Citation preview

Page 1: The CLEF 2005 interactive track (iCLEF)

The CLEF 2005 The CLEF 2005 interactive track interactive track

(iCLEF)(iCLEF)Julio GonzaloJulio Gonzalo11, Paul Clough, Paul Clough22 and Alessandro and Alessandro

VallinVallin33

11Departamento de Lenguajes y Sistemas Informáticos, Universidad Nacional de Educación a

Distancia22Department of Information Studies, University of Department of Information Studies, University of

Sheffield, UKSheffield, UK33ITC-irst, Trento, ItalyITC-irst, Trento, Italy

Page 2: The CLEF 2005 interactive track (iCLEF)

OverviewOverview Consolidation of two pilot user studies at iCLEF Consolidation of two pilot user studies at iCLEF

20042004 Interactive question answering taskInteractive question answering task Interactive image retrieval task Interactive image retrieval task

iCLEF provides resources and experiment designiCLEF provides resources and experiment design Participants select a research question to Participants select a research question to

investigate (by comparing the behaviour and investigate (by comparing the behaviour and search results of users with a reference and a search results of users with a reference and a contrastive system) contrastive system)

Five research groups submitted resultsFive research groups submitted results 2 groups for image retrieval2 groups for image retrieval 3 groups for QA3 groups for QA

Page 3: The CLEF 2005 interactive track (iCLEF)

AgendaAgenda

Image retrieval taskImage retrieval task Question Answering taskQuestion Answering task Ideas for 2006: the flickr taskIdeas for 2006: the flickr task

Page 4: The CLEF 2005 interactive track (iCLEF)

Cross-language image Cross-language image retrievalretrieval

Page 5: The CLEF 2005 interactive track (iCLEF)

OverviewOverview Limited evaluation with ranked listsLimited evaluation with ranked lists

Image retrieval systems highly interactiveImage retrieval systems highly interactive Appealing for CLIR researchAppealing for CLIR research

Language-independent: object to be Language-independent: object to be retrieved is an image retrieved is an image

Image RetrievalImage Retrieval Purely visual (QBE) e.g. “find images Purely visual (QBE) e.g. “find images

like this one”like this one” Text-based e.g. Web image searchText-based e.g. Web image search CombinationCombination

Page 6: The CLEF 2005 interactive track (iCLEF)

Interactive taskInteractive task Areas of interaction to study includeAreas of interaction to study include

Query formulation (visual and textual)Query formulation (visual and textual) Query re-formulation (relevance feedback)Query re-formulation (relevance feedback) Browsing/navigating resultsBrowsing/navigating results Identifying/selecting relevant imagesIdentifying/selecting relevant images

Based on iCLEF methodologyBased on iCLEF methodology Participants require minimum of 8 usersParticipants require minimum of 8 users Within-subject experimental designWithin-subject experimental design 16 search tasks (5 mins per task)16 search tasks (5 mins per task)

Participants select area to investigateParticipants select area to investigate

Page 7: The CLEF 2005 interactive track (iCLEF)

Target search taskTarget search task

海 (sea)

Clear goal for the user (easy to describe task)Clear goal for the user (easy to describe task) Can be achieved without knowledge of the Can be achieved without knowledge of the

collectioncollection Clearly defined measures of successClearly defined measures of success Invokes different searching strategiesInvokes different searching strategies

Page 8: The CLEF 2005 interactive track (iCLEF)

Search tasksSearch tasks

Page 9: The CLEF 2005 interactive track (iCLEF)

ParticipantsParticipants 11 signed up; 2 submitted11 signed up; 2 submitted University of SheffieldUniversity of Sheffield

Compared Italian version of the same systemCompared Italian version of the same system Aimed to test whether automatically-generated Aimed to test whether automatically-generated

menus better for presenting results than ranked menus better for presenting results than ranked listlist

MiracleMiracle Compared Spanish versus English query Compared Spanish versus English query

formulationformulation Aimed to test whether Boolean AND or OR Aimed to test whether Boolean AND or OR

betterbetter

Page 10: The CLEF 2005 interactive track (iCLEF)

ResultsResults MiracleMiracle

69% of images found English; 66% 69% of images found English; 66% SpanishSpanish

Domain-specific terminology caused Domain-specific terminology caused problems for users (and system)problems for users (and system)

University of SheffieldUniversity of Sheffield 53% images found using list; 47% menus53% images found using list; 47% menus Users preferred the menusUsers preferred the menus

Comparison between groups (limited)Comparison between groups (limited) Miracle: 86/128 images found overallMiracle: 86/128 images found overall Sheffield: 82/128 images found overallSheffield: 82/128 images found overall

Page 11: The CLEF 2005 interactive track (iCLEF)

Interactive CL Q&AInteractive CL Q&A

Page 12: The CLEF 2005 interactive track (iCLEF)

Q&A taskQ&A taskQuestion

(native language)

Text collection(foreign language)

Answer(native language)

Q&A search assistant

Page 13: The CLEF 2005 interactive track (iCLEF)

Q&A vs. interactive Q&A vs. interactive Q&AQ&A

People know some of the answersPeople know some of the answers Questions must be carefully selectedQuestions must be carefully selected

People can draw inferencesPeople can draw inferences Answer from multiple documents: considered in 2004, Answer from multiple documents: considered in 2004,

problems with assessmentproblems with assessment Combination of document evidence with user Combination of document evidence with user

knowledge: avoid definition and other open questions.knowledge: avoid definition and other open questions. People answer in the question languagePeople answer in the question language

Need to provide high-quality manual translations for Need to provide high-quality manual translations for assessment.assessment.

People get tiredPeople get tired Exclude nil questions, limit question typesExclude nil questions, limit question types

Page 14: The CLEF 2005 interactive track (iCLEF)

Experimental DesignExperimental Design 8 users (native query language)8 users (native query language) 16 evaluation questions (+ 4 for training)16 evaluation questions (+ 4 for training) 5 minutes per search (~3 hours per user)5 minutes per search (~3 hours per user) Independent variable: CLIR system Independent variable: CLIR system

design (reference/contrastive)design (reference/contrastive) Dependent variable: accuracyDependent variable: accuracy Latin square to block user/question Latin square to block user/question

effectseffects

Page 15: The CLEF 2005 interactive track (iCLEF)

Evaluation measuresEvaluation measures Official score: accuracy (= Q&A Official score: accuracy (= Q&A

track)track) Additional quantitative data: Additional quantitative data:

searching time, number of searching time, number of interactions, log analysis in general.interactions, log analysis in general.

Additional data: questionnaires Additional data: questionnaires (initial, 2 post-system, final), (initial, 2 post-system, final), observational information.observational information.

Page 16: The CLEF 2005 interactive track (iCLEF)

ExperimentsExperiments AlicanteAlicante: How much context users need : How much context users need

to correctly identify answers? (clauses vs to correctly identify answers? (clauses vs full paragraphs in a QA-based system)full paragraphs in a QA-based system)

SalamancaSalamanca: How useful is MT for the : How useful is MT for the task? (with/without MT) X (poor/good task? (with/without MT) X (poor/good target language skills) X (EN/FR as target language skills) X (EN/FR as target language)target language)

UNEDUNED: Is it better to search paragraphs : Is it better to search paragraphs than full documents?than full documents?

Page 17: The CLEF 2005 interactive track (iCLEF)

Official resultsOfficial results

Page 18: The CLEF 2005 interactive track (iCLEF)

Remarkable factsRemarkable facts

UNED & Alicante: accuracy UNED & Alicante: accuracy increases with larger contexts. increases with larger contexts.

Salamanca: MT is not very Salamanca: MT is not very helpful!helpful!

Implications for CL-QA systems?Implications for CL-QA systems?

Page 19: The CLEF 2005 interactive track (iCLEF)

Ideas for 2006Ideas for 2006

Page 20: The CLEF 2005 interactive track (iCLEF)

ParticipationParticipation

0

10

20

30

40

50

60

70

80

2001 2002 2003 2004 2005

iCLEFparticipantsCLEFparticipants

Conclusion: terminate the track!

Page 21: The CLEF 2005 interactive track (iCLEF)

Failure analysis (1)Failure analysis (1)

High cost of entryHigh cost of entry Long, boring guidelines.Long, boring guidelines. User recruitment, scheduling, User recruitment, scheduling,

training, monitoring.training, monitoring. Can’t really do experiment Can’t really do experiment

variations.variations. Made a programming mistake? Made a programming mistake?

Start recruiting volunteers again.Start recruiting volunteers again.

Page 22: The CLEF 2005 interactive track (iCLEF)

Failure analysis (2)Failure analysis (2) ““users screw everything up” (XXX, IR users screw everything up” (XXX, IR

competition organizer). Recruiting, competition organizer). Recruiting, training, monitoring, sometimes even training, monitoring, sometimes even paying… just to see how users ruin your paying… just to see how users ruin your hypothesis.hypothesis.

Is your search assistant at least good for Is your search assistant at least good for demonstration purposes? No, becausedemonstration purposes? No, because

1)1) Cross-Language Search Cross-Language Search cross-cultural cross-cultural needneed

2)2) But cross-cultural need is unfrequent!But cross-cultural need is unfrequent!

(Experiment: show your mother)(Experiment: show your mother)

Page 23: The CLEF 2005 interactive track (iCLEF)

titletitle

Page 24: The CLEF 2005 interactive track (iCLEF)

description

description

comments

comments

setssets

Tags(folksonomi

es)

Tags(folksonomi

es)

Page 25: The CLEF 2005 interactive track (iCLEF)

SpanishSpanish

ItalianItalian

EnglishEnglish

Japanese

Japanese

Page 26: The CLEF 2005 interactive track (iCLEF)
Page 27: The CLEF 2005 interactive track (iCLEF)

Advantages of flickrAdvantages of flickr Naturally multilingual, new IR challenge Naturally multilingual, new IR challenge

(folksonomies)(folksonomies) You can show your mother! (it is cross-You can show your mother! (it is cross-

language but it is not cross-cultural)language but it is not cross-cultural) Can avoid recruiting users: study Can avoid recruiting users: study

behaviour of real web/flickr users (log behaviour of real web/flickr users (log analysis)analysis)

Challenges of web scenarios (social Challenges of web scenarios (social network effects) plus advantages of network effects) plus advantages of controlled scenarios (unlike Google or controlled scenarios (unlike Google or Yahoo image search)Yahoo image search)

Page 28: The CLEF 2005 interactive track (iCLEF)

Interactive Flickr task Interactive Flickr task (2006)(2006)

Target language: PortugueseTarget language: Portuguese Data: Flickr images (local or via Flickr API)Data: Flickr images (local or via Flickr API) Search task:Search task:

Illustrate this text (open)Illustrate this text (open) What’s behind this house? (focused, Q&A type)What’s behind this house? (focused, Q&A type) Sunsets in Mangue Seco (ad-hoc type)Sunsets in Mangue Seco (ad-hoc type) Pictures where the Nike logo appears (ad-hoc, Pictures where the Nike logo appears (ad-hoc,

content oriented)content oriented) Track real users w. real information needs (log Track real users w. real information needs (log

analysis!)analysis!) Experiment design: open!! (let’s also Experiment design: open!! (let’s also

compare evaluation methodologies!)compare evaluation methodologies!)

Page 29: The CLEF 2005 interactive track (iCLEF)

Plans for 2007Plans for 2007

Make the task compulsory for Make the task compulsory for CLEF participantsCLEF participants

Terminate all other tracksTerminate all other tracks Task coordinators hired by Task coordinators hired by

Yahoo!Yahoo!

Page 30: The CLEF 2005 interactive track (iCLEF)

AcknowledgmentsAcknowledgments People who helped organizing iCLEF People who helped organizing iCLEF

2005: Richard Sutcliffe, Christelle 2005: Richard Sutcliffe, Christelle Ayache, Víctor Peinado, Fernando Ayache, Víctor Peinado, Fernando López, Javier Artiles, Jianqiang Wang, López, Javier Artiles, Jianqiang Wang, Daniela PetrelliDaniela Petrelli

People already helping us shape flickr People already helping us shape flickr task: Javier Artiles, Peter Anick, Jussi task: Javier Artiles, Peter Anick, Jussi Karlgren, Doug Oard, William Hersh, Karlgren, Doug Oard, William Hersh, Donna Harman, Daniela Petrelli, Donna Harman, Daniela Petrelli, Henning MüllerHenning Müller

All participant groupsAll participant groups