View
490
Download
0
Category
Tags:
Preview:
DESCRIPTION
New usability measure
Citation preview
1
Presented by: Lashanda Lee
1
ASSESSING INTERACTIVE SYSTEM EFFECTIVENESS WITH USABILITY DESIGN HEURISTICS AND MARKOV MODELS OF
USER BEHAVIOR
2
Motivation2
For HCI to be successful, interfaces must be designed to: Effectively translate intentions, actions and inputs of operator for
computer Effectively translate machine outputs for human comprehension
HCI frameworks available can aid in evaluating interface designs and generating subjective data.
Quantitative objective data is also needed as basis for cost justification and determination of ROI
Modeling human behavior may reduce the need for experimentation : Time Expense
Combination of data from OR techniques with subjective data can be used to generate score for overall system effectiveness Allows for comparison of alternative interface designs.
3
Literature ReviewHCI frameworks
3
Norman’s model of HCI Two stages Does not focus on continuous cycle of
communication
Pipe-line model Inputs and outputs of system operate in parallel Complex model with many states Does not show cognitive process of human Explains the computer processing
Dix et al model Focuses on distances between both user and system Focuses on continuous cycle of communication Chosen as basis for evaluation in present research
4
Literature ReviewUsability paradigms and principles
Paradigms: how humans interact with computers ubiquitous computing, intelligent systems, virtual reality
and WIMP Principles: how paradigms work
Flexibility, consistency, robustness, recoverability and learnability
Each paradigm focused on different usability principles.
Specific usability measures can be used to assess certain paradigms
Paradigms address figurative distance of articulation in Dix’s model in different ways
Examples: Intelligent interfaces using NLP:
Greatest reduction in articulation distance for users but furthest from system language
Command line interface Farthest from user in Dix framework but easy for the computer
to understand WIMP
Easy for both the system and user to interpret and equal in distance between the user and the system in the Dix interaction framework
4
5
Literature ReviewMeasures of usabilityQualitative measures
5
Low cost but low discovery Comparisons of designs based on interface qualities Data hard to analyze May not lead to design changes because management considers data unreliable Subjective data
Inspection methods Low cost and quick discovery of problems using low skill evaluators Often fail to find many serious problems and do not provide enough evidence to
create design recommendations Types include:
Heuristic methods, guidelines, style and rule inspections Verbal reports
Hard to find an appropriate way to use the data Gain insight into cognition
Surveys Inexpensive and helps find trouble spots Some information lost due to STM capabilities
6
Literature ReviewMeasures of usabilityQuantitative measures
6
Used to make comparisons of designs based on quantities associated with certain interface features
Useful in presenting information to management Goals may be too ambitious or there are too many goals Cannot cover entire systems Subjective responses
Rankings, ratings or fuzzy set ratings Considered quantitative because they involve manipulation and
analysis of data as a basis for comparing interface alternatives. Objective responses
Measures of effectiveness: binary task completion, number of correct tasks completed and task performance accuracy
Measures of efficiency: task completion time, time in mode, usage patterns and degree of variation from an optimal solution
Fuzzy sets User modeling Count of concrete occurrences and not based on the opinions of
users
7
Literature ReviewQuantitative objective measuresFuzzy sets and user modeling
7
Fuzzy sets Used to compare interface alternatives Aggregate score produced based on count of interface inadequacies Fuzzy sets logic used to determine membership for aggregate score Method uses both subjective and objective measures Requires multiple cycles of user testing to compare scores Doesn’t use variable weights for dimensions considering them all equal
User Modeling Used to predict interface action sequences based on prior use data. Limited in revealing actual human performance, not exact Can be used to help guide users while performing task with an interface Activist GOMS
Estimates task performance times Produces accurate predictions of user actions Takes a long time to create
Benefits include: Model one or more types of users Analyze without additional user testing
8
Literature ReviewUsability measuresSummary
8
Qualitative Used iteratively Low discovery Hard to analyze Usually does not effect change in a display because
management considers data unreliable Quantitative
Appear to be better for detailed usability problem analysis and design recommendations
User modeling can decrease cost Necessary to gain management support
Combine an objective quantitative user modeling approach with subjective usability measures may provide:
An approach effective in finding problems Basis for interface redesign
9
Literature ReviewOperations Research methods of usability evaluation
9
Use of techniques such as mathematical modeling to analyze complex situations
Used to optimize systems Limited use in usability evaluation or interface improvements Methods used:
Markov models Stochastic processes Used for website customization Predict user behavior Research by: Kitajima et al., Thimbleby et al., Jenamani et al.
Probabilistic finite state models Include time distributions and transitional probabilities Generate user behavior predictions Research by: Sholl et al.
Critical path models Algorithm determines longest time Can also incorporate stochastic process predictions Research by: Baber and Mellor (2001)
10
Literature ReviewOperations Research methods of usability evaluationMarkov models: Kitajima et al and Thimbleby et al.
10
Kitajima et al. Markov models used to predict user behavior Determine number of clicks to find relevant articles After interface improvements, used model to predict number of
clicks Number of clicks was reduced Used equation u(i) = 1+Σ Piku(k)
Thimbleby et al. Applied Markov chains to several applications: microwave oven and
cell phone Used Markov chains to predict number of steps Used Mathematica simulation of microwave to gather information Used a mixture of perfect error-free matrix:
Used knowledge factors from 0 to 1, (1 was a fully knowledgeable user) Simulated user behavior Original design took 120 steps (for random user) Improved design took fewer steps (for the random user) Fewer steps considered “easier”
11
Literature ReviewOperations Research methods of usability evaluationSummary
11
Appears to be a viable and useful approach to evaluate interface usability
Provides objective quantitative data without need for several iterations of testing
Used repeatedly to predict behavior, such as number of clicks and task times
Accurately predicts user behavior
12
Summary and Problem Statement
12
Need to use framework describing communication between humans and computers to guide design improvements (Dix et al. was chosen for its simplicity and cyclic structure.)
Usability paradigms help identify types of technology that can be used to improve systems and provide direction in how to evaluate systems. WIMP paradigm chosen for its simplicity accommodation of user and system
Many subjective measures but not adequate for assessing performance and supporting design changes
Objective, quantitative measures often gain the support of management for design changes but are expensive
OR methods: Markov models accurately predict human behavior Need to define approach to using both types of measures to evaluate usability
and require minimal user testing. Combined use of Dix et al. model subjective system evaluations and OR
modeling techniques to predict user behavior of interface Both methods used to produce overall system effectiveness score to compare
alternative designs.
13
MethodOverview of system effectiveness score
13
Dix et al. framework Survey for designers- capture the perceptions of importance of
each link in HCI framework Survey for user with Markov model prediction of average
number of interface actions (clicks) -users rated interfaces with respect to links in the framework
Novelty is measure reflects designer’s intent for application and user’s perception of the system
Designer weights and user ratings are multiplied and summed across links
Weighted sum is divided by Markov model prediction of average number of clicks
Score represents perceived usability per action
14
MethodWeighting factor determination
Designers expected to be most concerned with cognitive load.
Four designers surveyed using the Dix et al. framework: Based on paradigm for
application (WIMP), how important is each link to system effectiveness
Pair-wise comparisons of links Values ranged between 0 and
0.5 Weighting factors averaged
across designers to determine weight for each dimension
Weights were used in calculating overall system subjective score (designer’s rankings x user ratings)
14
15
MethodExperimental task
Used a version of Lenovo.com prototype to find and order ThinkPad R60
Twenty participants: 11 males, 9 females Age range: 17-25
Half participants used old version of Lenovo.com website: Required 11 clicks to buy (optimal
path) Tabs that separated the features
information and the ability to purchase
Half of the participants used a new prototype: Required 9 clicks to buy (optimal
path) All information about type of
computer contained on 1 page Multi-level navigation structure More salient buttons
15
16
MethodDeveloping Markov Chain models
16
JavaScript recorded user actions Old online ordering system used to identify states: Links, Tabs,
Menu options (Radio buttons and popups not included) Used action sequences to create transitional probability matrices
Based on actual number of users going from state i to state k. Assumptions of Markov model include:
Sum of each row must equal 1 Probability of next interface state only depends on current
state To determine average number of clicks to task completion, used
Kitajima et al. (2005) u(i) = 1+Σ Piku(k)
Need state probability matrix based on action sequences Need average number of steps from one state to another (based on
designer analysis)
17
MethodRating system effectiveness (based on Dix framework)
Used Dix et al. framework End users rated links
On a scale from 1 to 10
Presented framework at end of the task
Determined average ratings for each link and used in overall system effectiveness score
17
18
MethodOverall system effectiveness score and Markov model validation
Overall score Used to compare alternative
interface design Average designer weights for
each dimension Average rating by end users Product of two is partial score Partial score divided by
predicted average number of clicks is overall score
Highest ratio considered to indicate higher overall system effectiveness
Validation T-test used to determine if
actual observed number of clicks was significantly different from number of clicks with Markov model.
18
tsrequiremenk tas
user tofeaturesdisplay map tocapability of importancefor eight designer w average a
output through states
systemrepresent accurately tocapability of importancefor eight designer w average a
functions
system tooptionsinput map tocapability of importancefor eight designer w average a
statesinput to
intention calpsychologi map tocapability of importancefor eight designer w average a
outputs ofon translatiof ease theof ratingsuser average obs
output presentingin accuracy and speed system of ratingsuser average pres
inputs tonessresponsive system of ratingsuser average perf
statesinput togoals ofon translatiof ease theof ratinguser average art
, where
clicks ofnumber avgobs)a presa perfaarta (SE
4
3
2
1
4321
System Effectiveness:
19
ResultsAssessment of Markov model assumption
Transition from one state must only be dependent on the current state
Durbin-Watson test used to assess autocorrelation among user steps in interaction
Test statistics were:1.2879 (old) and 2.0815 (new)
Normalization procedure applied to original transitional probability matrices.
Durbin-Watson test conducted on normalized data
Test statistics were: 1.3920 (old) and 2.27 (new)
Test revealed mixed evidence Model was accepted and
applied to predict average number of clicks
19
10000000000
10000000000
01000000000
00100000000
00010000000
00001000000
00000100000
00000010000
00000000010
00000000.90.100
00000000010
new P ik
1.000.000.000.000.000.000.000.000.000.000.00
1.000.000.000.000.000.000.000.000.000.000.00
0.001.000.000.000.000.000.000.000.000.000.00
0.000.001.000.000.000.000.000.000.000.000.00
0.000.000.001.000.000.000.000.000.000.000.00
0.000.000.000.001.000.000.000.000.000.000.00
0.000.000.000.000.001.000.000.000.000.000.00
0.000.000.000.000.000.001.000.000.000.000.00
0.000.000.000.000.000.000.000.000.001.000.00
0.000.000.000.000.000.000.000.500.500.000.00
0.000.000.000.000.000.000.000.000.001.000.00
normalized new P ik
10000000000000
10000000000000
01000000000000
00100000000000
00010000000000
00001000000000
00000100000000
00000010000000
000000076.0008.008.08.
0000000000050.50.0
00000007.087.0007.00
00000000011.89.000
0000000010.050.40.00
0000000000027.73.0
existing Pij
10000000000000
10000000000000
01000000000000
00100000000000
00010000000000
00001000000000
00000100000000
00000010000000
000000046.003.005.46.
0000000000053.47.0
00000008.083.0009.00
00000000047.53.000
00000000038.27.34.00
0000000000029.71.0
rmalizedexistingnoPik
20
ResultsComputation of average number of steps
20
04.54.54.55433311111
10444433311111
2103.53.53.533311111
32103333311111
432102.52.53311111
54321022211111
65432101.51.511111
76543210011111
76543210011111
98765432201111
98765432210111
98765432211011
98765432211101
1098765433221.310
newMik
The average number of steps it takes to get from any one state to the other
Represents individual u(k) in the Kitajima et al. equation
Matrix created by designers of the interface
0.007.337.006.677.506.505.504.503.502.502.502.001.001.00
1.000.006.676.336.006.505.504.503.502.502.502.001.001.00
2.001.000.006.005.675.335.504.503.502.502.502.001.001.00
3.002.001.000.005.335.004.674.503.502.502.502.001.001.00
4.003.002.001.000.004.674.334.003.502.502.502.001.001.00
5.004.003.002.001.000.004.003.673.332.502.502.001.001.00
6.005.004.003.002.001.000.004.003.672.502.502.001.001.00
7.006.005.004.003.002.001.000.002.672.502.332.001.001.00
8.007.006.005.004.003.002.001.000.002.002.002.001.001.00
11.5010.509.508.507.506.505.504.503.500.002.501.501.001.00
9.008.007.006.005.004.003.002.001.002.500.001.501.001.00
10.009.008.007.006.005.004.003.002.001.001.500.001.001.00
10.009.008.007.006.005.004.003.002.002.002.002.000.001.00
11.0010.009.008.007.006.005.004.003.002.002.001.501.000.00
Mijexisting
21
ResultsComputation of average number of clicks
21
Use u(i) = 1+Σ Piku(k)
Consider paths to absorbent state to determine average number of clicks
Markov model predicted number of clicks for each interface: 11.5 for old (actual 12.9) 9 for new (actual 9.2)
T-test used to compare the difference between actual clicks across interfaces T-value: -4.30 with p-value: 0.0004 Actual number of clicks different across interfaces - new was significantly less
T-test used to compare actual click count to predicted click count for all subjects: P-value: 0.439 for new P-value: 0.0605 for old No significant difference between actual and predicted on either interface
T-tests used to compare predicted clicks across interfaces: P-value: 0.0033 New interface reduced number of clicks
22
ResultsPartial system effectiveness score
Each participant rated interfaces on each dimension using scale of 1 to 10
Designers completed pair-wise comparisons Designers expected to rate
articulation and observation higher T-test used to compare designer
ratings of articulation and observation with performance and presentation
Rated articulation and observation higher
Average designer weights were multiplied by average user ratings
T-test used to compare partial score of new against old for all subjects T-value: 5.08; p-value: < .0001 Partial score for new interface is
significantly higher
22
Articulation Observation Presentation PerformanceNew 8.4 9.1 8 8.1
Existing 4.8 4.9 7.2 6.4
Performance PresentationArticulation p = 0.0004 p = 0.0013Observation p = 0.0013 p = 0.0055
p-value
23
ResultsOverall system effectiveness score
23
Partial score was divided by predicted average number of clicks to yield perceived usability per click New: 0.939 Old: 0.475
T-test used to compare overall score for new and old interfaces for all subjects T-value: 5.62; p-value: < .0001 Overall system effectiveness score for new was significantly higher
than old
24
ResultsReducing experimentation
24
Purpose of Markov model was to predict number of clicks and to reduce need of additional user testing.
Designers can speculate an average number of steps to transition among state in the new interface and multiply by probabilities determined for original interface (through user testing)
Predicted number of clicks for new interface was 9.35 (actual 9.2) T-test used to compare if actual number of clicks was different then the
predicted number of clicks T-value: 1.15; p-value: 0.270 Markov model was accurate in predicting the average number of clicks
In order to obtain user ratings, focus groups would be necessary Approach significantly reduces time and money necessary for user
testing
25
DiscussionDesigner ratings
25
Hypothesis: Average designer weighting factors for articulation and observation will be higher than performance and presentation
Designers were concerned with cognitive load, as represented by articulation and observation
If customer cannot find what (s)he is looking for, may lead to: Frustration Lost customers Lost revenue
Designers realize that effectively reducing cognitive load is important
26
DiscussionImproved usability
26
Hypothesis: New interface will improve perceived usability Multi-level navigation was used to reduce cognitive load:
Easier to find and view all options Users could reach many state with 1 click Identified by users of new interface as one of the most usable features
More prominent buttons: Aided in easily identifying next steps In original interface, users had difficult time finding customize button
Often scrolled up and down page or backtracked to determine what to do next
Partial system effectiveness score was higher for new interface (8.6) than the old (5.2)
27
DiscussionHigher system effectiveness score
27
Hypothesis: New interface will produce higher score because of perceived higher usability
Old interface degraded performance: From features tab, some found it difficult to identify what to do next Once users found product tab, some scrolled up and down trying to
determine what to do next (new interface alleviated both these problems -all information on 1 page)
Higher perceived usability and fewer clicks led to higher ratio
28
DiscussionMarkov model accurately predicted average number of clicks
Hypothesis: Markov model will accurately predict average number of clicks used equation detailed by Kitajima
Because Markov models are used to represent stochastic behavior they proved valid in present work
Model revealed the variability among participants but do not show exact magnitude of the error
28
Existing interface: Actual vs. predicted
0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 10
Nu
mb
er
of
cli
cks
actual
predicted
New interface: Actual vs. predicted
0
2
4
6
8
10
12
1 2 3 4 5 6 7 8 9 10
Nu
mb
er
of
cli
cks
actual
predicted
29
Conclusion29
Objective was to create new measure of usability Based on:
Few quantitative objective measures Many subjective measures insufficient to justify design changes
Research supports subjective measure using Dix et al. framework and an objective measure, based on Markov models
Method is: effective in objectively selecting among alternative designs and reducing the
amount of experimentation necessary Easy to implement Can be used with several alternatives without the need for testing Cannot apply to interfaces where selection of next state depends on previous
states and not only current state Future research:
Use Markov models to predict next steps, user will take and make relevant interface options more salient to improve usability
Find a way to incorporate time-on-task in overall effectiveness score: Perceived time-on-task will impact customer retention Research a method to accurately predict
Recommended