View
97
Download
3
Category
Preview:
DESCRIPTION
are 10 participants enough to evaluate scent
Citation preview
Christos Katsanos | ckatsanos@ece.upatras.gr
Nikolaos Tselios | nitse@ece.upatras.gr
Nikolaos Avouris | avouris@ece.upatras.gr
Are Ten Participants Enough for Evaluating Information Scent of Web Page Hyperlinks?
IFIP INTERACT | Uppsala, Sweden | 24-28 August, 2009
Purpose & Motivation
2
A critical factor in web navigation is information scent (Fu & Pirolli, 2007; Blackmon et al, 2005; Miller & Remington, 2004)
user’s assessment of semantic relevance of navigation options in a webpage
Often, participants are called to evaluate scent by providing ratings (Miller & Remington, 2004; Brumby & Howes, 2008)
Remains unclear how many raters are required to obtain representative estimates of information scent.
The Study: First Phase
3
Design & Procedures
Web-based survey
Rate semantic relevancy of all links to the provided goal (1=poor relevance, 5=high relevance).
101 participants 8 navigation menus, 8 links each
4
6464 ratings
Analysis Methodology Reference case = Scent-ratings from 101 participants
Select 10 random samples of different size N N = 2, 5, 10, 15, 20, 25, 30, 40 and 50
[Samples-Ratings] VS [All 101 participants Ratings] Average Spearman Correlation
How many raters are enough to represent the ratings of the whole dataset? 5
Results
6
10 raters 84-90% total var.
Error Bars = (rMEAN ± rSD)2
x2 raters still the same
x3 raters +5% closer to whole
dataset
First-phase: Conclusion
10 raters appear to be a cost-effective solution to evaluate information scent without expense in the quality of results
7
But how close are scent-ratings of 10 participants to observed navigation behavior?
The Study: Second Phase
8
Design & Procedures
Eye-tracking user study
Perform the same 8 navigation tasks used in first-phase
54 users (not involved in first-phase)
Two measures of users’ behavior: clicks on each link fixations-adjusted-for-text-length on each link.
9
432 recordings
Analysis Methodology
Reference case = Behavioral data from 54 users
[Scent-ratings from samples of - 1st phase] VS [Measures of user’s navigation behavior - 2nd phase]
Average Spearman Correlation
How many raters are enough to reach an acceptable level of correlation with these two measures?
10
Results
11
Clicks on each link r10-raters is 0.7% different from
r101-raters
r101raters = 0.80, p<.01
Fixations on each link r10-raters is 7.4% different
from r101-raters
r101-raters = 0.40, ns
Error Bars = rMEAN ± rSD
Second-phase: Conclusion
10 participants provide scent-ratings that are close to observed link-selection behavior (clicks) distribution of attention (fixations)
12
However, scent-ratings should be used only as a rough indicator of users’ distribution of attention rs = 0.40, ns
Summary & Questions
Investigated the well-known debate of “how many users” in the context of information scent evaluation
Scent-ratings of 10 participants appeared to be enough for a discount evaluation of information scent
13
More studies required in the context of highly specialized domains and/or varied user group composition
Christos Katsanos | ckatsanos@ece.upatras.gr
EXTRA SLIDES
14
First-Phase: Question example
15
Second-Phase: How many users are enough?
16
Clinks Count Observations Count
Recommended