Upload
ngotu
View
220
Download
5
Embed Size (px)
Citation preview
User Interface Design
Lecture 12
Evaluation
C. Patanothai 2110646:10-Evaluation 2
C. Patanothai 2110646:10-Evaluation 3
C. Patanothai 2110646:10-Evaluation 4
Evaluation Types
• Without user
– cognitive walk-through
– keystroke-level model
– heuristics evaluation
• With user
– usability test
– user test
C. Patanothai 2110646:10-Evaluation 5
COGNITIVE WALK-THROUGH
Cognitive Walk-Through
• “Evaluates the steps required to perform a task and attempts to uncover mismatches between how the users think about a taks and how the UI designer thinks about the task”
C. Patanothai Cognitive Walk-Through 7
concentrate on LEARNABILITY
Cognitive Walkthrough Method
Step 0:
The user selects a task to be performed and writes down all the steps (actions) in the task.
For each action in the task:
Step 1:
The user explores the artifact, prototype, or task scenario, looking for the action that might enable him or her to perform the selected task.
How does the use know what to do next? Is the correct action sufficiently evident to the user (can recognize it) or does the user have to recall what to do from memory?
C. Patanothai Cognitive Walk-Through 8
Cognitive Walkthrough Method
Step 2:
The user selects the action that appears to match most closely what he or she is trying to do.
Will the user connect the description of the correct action with what he or she is trying to do?
Step 3:
The user interprets the system’s response and assesses if any progress has made toward completing the task.
Will the user know if he or she has made a right or wrong choice?
C. Patanothai Cognitive Walk-Through 9
Remote
C. Patanothai Cognitive Walk-Through 10
Produce a “Cognitive Walkthrough” of a TV or other remote control you may have at home.
•From “Turn ON”
•Actions in between
•To “Turn OFF”
Step 0: Select a Task
• set the VCR for a timed recording of a program start at 21:00 and finishing at 22:30 on Ch4 on August 18, 2005.
• when switched ON, the VCR displays the number of the channel it last used.
• when switched OFF, it displays a digital clock.
• To set the timed recording, the VCR must be switched ON.
C. Patanothai Cognitive Walk-Through 11
C. Patanothai Cognitive Walk-Through 12
User actions (UA) System responses (SR)
UA1 Press the PROG button on the hand set.
SR1 VCR display shows a form fill-in for setting the start and stop times. These times are divided into separated sections for the hour and minute, separated by a colon. The cursor is flashing on the hour section on the start time.
UA2 Press the up arrow until the number 21 is showing.
SR2 21 is showing in the hour section of the stat time.
UA3 Press the right arrow once to move the cursor to the minute section of the stat time.
SR3 00 is showing and flashing in the minute section of the start time. This defaulted to 00 on selection of 21 in the hour section.
C. Patanothai Cognitive Walk-Through 13
User actions (UA) System responses (SR)
UA4 00 in the minute section of the start time is what is wanted. Press the right arrow once to move the cursor to the hour section of the finish time.
SR4 The cursor is flashing in the hour section of the finish time.
UA5 Press the up arrow until the number 22 is showing.
SR5 22 is showing in the hour section of the finish time.
UA6 Press the right arrow once to move the cursor to the minute section of the finish time.
SR6 00 is showing and flashing in the minute portion of the finish time. This defaulted to 00 on selection of 22 in the hour section.
C. Patanothai Cognitive Walk-Through 14
User actions (UA) System responses (SR)
UA7 Press the up arrow until the number 30 is shown.
SR7 30 is showing in the minute section of the finish time.
UA8 Press the right arrow once to move the cursor to the day section of the date field.
SR8 On the display, the full date has now defaulted to the current date. The cursor is flashing in the day section of the date.
UA9 Press the up arrow until the number 18 is showing.
SR9 18 is showing in the day section of the date field.
UA10 Press the right arrow once to move the cursor to the month section of the date field
SR10 The cursor is flashing in the month section of the date field.
C. Patanothai Cognitive Walk-Through 15
User actions (UA) System responses (SR)
UA11 Press the up arrow until the number 8 is showing.
SR11 18 is showing in the day section of the date field, and 8 is showing in the month section of the date field.
UA12 Press the right arrow once to move the cursor to the year section.
SR12 The cursor is flashing in the year section of the date field.
UA13 Press the up arrow until the number 04 is showing.
SR13 18 is showing in the day section, 8 is showing in the month section, and 04 is showing in the year section of the date field.
UA14 Press the right arrow once to move the cursor to select the channel to record.
SR14 The cursor is flashing in the channel field.
C. Patanothai Cognitive Walk-Through 16
User actions (UA) System responses (SR)
UA15 Press the up arrow until the number 4 is showing.
SR15 4 is showing as the channel to record form.
UA16 Press the right arrow once so the system accepts the setting.
SR16 The clock returns to the display. A small 1 is displayed on the left side of the clock, which indicates one timed recording has been set.
UA17 Press the TIMER button to initiate timed recording mode.
SR17 Video switches itself off and into timed recording mode. A small red clock is displayed in the upper right-hand corner of the display to indicate that the video is set for time recording.
C. Patanothai Cognitive Walk-Through 17
UA1 Press the prog button on the handset.
Question 1 Is the correct action sufficiently evident to the user? Neither the handset nor the VCR display give any indication that the user needs to press the PROG button to do a timed recording.
Question 2 Will the user connect the description of the correct action with what he or she is trying to do? Experienced users might associate timed recording with setting or programming (prog) the VCR. However, this is probably not the case for novice users.
Question 3 Will the user know if he or she has made a right or wrong choice on the basis of the system’s response to the chosen action? Once the PROG button is pressed, the VCR display changes to form fill-in that guides the user in entering the information (although the display on the handset does not change). Any user who notices the VCR display, or remembers where the form fill-in appears, will know that he or she has mad a right choice.
C. Patanothai Cognitive Walk-Through 18
UA2 Press the up arrow until the number 21 is showing in the hour section of the start time.
Question 1 Is the correct action sufficiently evident to the user? No. It is not evident that to set the time one can use only the four unlabeled arrow keys. In fact, the handset is confusing for the user because there is a number pad above the four arrow keys. The user might assume that he/she can use the number pad to enter the time values into the form.
Question 2 Will the user connect the description of the correct action with what he or she is trying to do? No. There are no markings on the arrows themselves, nor anywhere near them, that might indicate that they are to be used for entering information into the programming form fill-in.
Question 3 Will the user know if he or she has made a right or wrong choice on the basis of the system’s response to the chosen action? If the user is lucky enough to discover that the arrow buttons change the times and channel, then there will be feedback on the VCR display as the form gets filled in. However, this could easily be missed if the user stops looking at the VCR display – perhaps because he/she is so engrossed (and irritated) in trying to make the handset work.
Environment and effect
Environmental characteristic How it affects the design
The environment is noisy. The use of sound for alerting users to problems may not be effective
The environment is dusty or dirty.
Equipment might require some type of protective covering (e.g., a keyboard might need a membranous cover).
Users wear protective clothing such as gloves.
Input devices will need to accommodate this.
C. Patanothai Cognitive Walk-Through 19
Environment and effect
Environmental characteristic How it affects the design
The work is highly pressured and subject to frequent interruptions.
The application must allow the user to stop his or her work and restart it later, preferably from the point where the user left off.
There is a need for workers to share information, or the work is designed so that they work in groups rather than in isolation.
The workplace will need to be laid out carefully to take this factor into consideration.
C. Patanothai Cognitive Walk-Through 20
KEYSTROKE LEVEL MODEL
Keystroke Level Model
• Card, Moran, Newell—CACM, July 1980
• Outline
– Problem statement
– Model
– Empirical validation
– Applications
C. Patanothai Keystroke Level Model 22
Problem Statement
• Goal – Develop simple model to describe time to do task with a
given method on an interactive system
• Ttask = Tacquire + Texecute where Ttask = total time to complete task Tacquire = time to select method to complete task Texecute = time to perform method
• Model predicts Texecute
• Assume expert users and no errors
C. Patanothai Keystroke Level Model 23
Model
• Texecute = (time to execute primitive op)
• Primitive operations
– K key press
– B button press
– P point to target with mouse
– H home hands to keyboard or mouse
– D draw line with mouse
– M mental preparation (pause)
– R system response time
C. Patanothai Keystroke Level Model 24
Model (cont.)
• Times for primitive operations are predicated from experiments
– Time to press key ranges between
– 0.08 sec/char best typist (135 wpm)
– 0.12 sec/char good typist (90 wpm)
– 0.28 sec/char average typist (40 wpm)
– 0.50 sec/char random letters
– 0.75 sec/char complex codes
– 1.2 sec/char slow typist
C. Patanothai Keystroke Level Model 25
Actual Parameters
Parameter Estimate Time (sec) K nktk 0.12 (good)
0.20 (average typist)
0.28 (average non-typist)
1.20 (non-typist)
B down/up
click 0.10
0.20
P Fitt's law
Average 0.1log2(D/S + 0.5)
1.10
H 0.40
D(nD, lD) nD – # of segments
lD – total length 0.9 nD + 0.16 lD
M 0.6 - 1.35 (use 1.2)
R
C. Patanothai Keystroke Level Model 26
Scenario 1 (Assumption)
• One file is to be deleted
• File icon is visible and can be pointed to
• Trash can icon is visible and pointable
• Cursor must end up in the original window that the file icon was in
• Hand starts and ends on mouse
• User is average non-secretary typist (40 wpm)
C. Patanothai Keystroke Level Model 27
Scenario 1 (action sequence)
1. point to file icon
2. press and hold mouse button
3. drag file icon to trash can icon
4. release mouse button
5. point to original window
C. Patanothai Keystroke Level Model 28
Scenario 1 (Operator sequence)
1. point to file icon P
2. press and hold mouse button B
3. drag file icon to trash can icon P
4. release mouse button B
5. point to original window P
C. Patanothai Keystroke Level Model 29
Total time = 3P + 2B = 3*1.1 + 2*0.1 = 3.5 sec
New design
• In the new design, the procedure we intend people to use is to first select the file to be deleted, and
• then select DELETE on the FILE menu.
C. Patanothai Keystroke Level Model 30
New design (action sequence)
1. point to file icon
2. click mouse button
3. point to file menu
4. press and hold mouse button
5. point to DELETE item
6. release mouse button
7. point to original window
C. Patanothai Keystroke Level Model 31
New design (operator sequence)
1. point to file icon P
2. click mouse button BB
3. point to file menu P
4. press and hold mouse button B
5. point to DELETE item P
6. release mouse button B
7. point to original window P
C. Patanothai Keystroke Level Model 32
Total time = 4P + 4B = 4*1.1 + 4*0.1 = 4.8 sec
New design
• slower
• require additional mouse movement
• solution
– power key
C. Patanothai Keystroke Level Model 33
New design (power key)
1. point to file icon P
2. click mouse button BB
3. move hand to keyboard H
4. hit command key command-T KK
5. move hand back to mouse H
C. Patanothai Keystroke Level Model 34
Total time = P + 2B + 2H + 2K = 1.1 + 0.2 + 0.8 + 0.56 = 2.66 sec
Scenario 2: Hidden trash can
• One file is to be deleted. • File icon is visible and can be pointed to in
window A. • Trash can icon is covered by one window, B, that
must be moved out of the way to see the trash can, but B must remain open on the screen.
• There is room for both window A and window B on the screen at the same time.
• Cursor must end up in window A • Hand starts and ends on mouse
C. Patanothai Keystroke Level Model 35
Scenario 2: current design
1. point to the title bar of window B P 2. hold down the mouse button B 3. drag the window to another place P 4. release the mouse button B 5. point to file icon P 6. press and hold mouse button B 7. drag file icon to trash can icon P 8. release mouse button B 9. point to original window P
C. Patanothai Keystroke Level Model 36
Total time = 5P + 4B = 5*1.1 + 4*0.1 = 5.9 sec
Scenario 2: new design
• same as scenario 1
• Total time, menu version = 4.8 sec
• Total time, power key version = 2.66 sec
• Current design: Time = 5.9 sec
C. Patanothai Keystroke Level Model 37
Including M’s (top-down)
• Initiating a task • Making a strategy decision • Retrieving a chunk from memory • Finding something on the screen • Thinking of a task parameter • Verifying that a specification or action is correct • Some useful convention
– The values for task parameters have to be explicitly obtained in a step.
– Pointing to an object on the screen should be preceded by a mental operator to locate the object.
– If something on the screen changes in response to user input, there should be a step to verify that the desired result appeared.
C. Patanothai Keystroke Level Model 38
Assigning Ms for new vs. expert
• New users verify every step.
• New users have small chunks, expert users have big chunks.
• Experienced users can overlap Ms with physical operators.
C. Patanothai Keystroke Level Model 39
Scenario 1a: new users
• New user will stop and check feedback from system at every step.
• The general procedure – find the file icon,
– make sure it is selected,
– and drag it to the trash can,
– making sure the trash can has been hit (reverse videos),
– and then verify the whole process by checking that the trash can is bulging.
C. Patanothai Keystroke Level Model 40
Scenario 1a: new users
1. Initiate the deletion (decide to do the task) M 2. find the file icon M 3. point to file icon P 4. press and hold mouse button B 5. verify that the icon is reverse-video M 6. find the trash can icon M 7. drag file icon to trash can icon P 8. verify that the trash can icon is reverse-video M 9. release mouse button B 10. verify that the trash can icon is bulging M 11. find the original window M 12. point to original window P
C. Patanothai Keystroke Level Model 41
Total time = 3P + 2B + 7M = 11.9 sec
Scenario 1b: expert
• The general procedure – find the file icon to be deleted and drag it to the trash can. – Experienced user thinks of selecting and dragging an item as a
single operation - a chunk – The user must find the to-be-deleted icon since it is different
every time – Moving icons to the trash can is highly practiced: – The trash can does not have to be located, so finding it is
overlapped with pointing to it – Verifying that the trash can has been hit is overlapped with
pointing to it – Final result (bulging can) is not checked since it is redundant
with verifying that the can has been hit – Pointing to the original window is overlapped with finding it
C. Patanothai Keystroke Level Model 42
Scenario 1b: expert
1. initiate the deletion M
2. find the file icon M
3. point to file icon P
4. press and hold mouse button B
5. drag file icon to trash can icon P
6. release mouse button B
7. point to original window P
C. Patanothai Keystroke Level Model 43
Total time = 3P + 2B + 2M = 5.9 sec
Scenario 2a: expert
1. initiate the deletion M 2. notice that trash is covered and decide to uncover it M 3. choose a suitable empty place on the screen M 4. initiate the window move M 5. point to the title bar of window B P 6. hold down the mouse button B 7. drag the window to the other place P 8. release the mouse button B 9. find the icon for the to-be-deleted file M 10. point to file icon P 11. press and hold mouse button B 12. drag file icon to trash can icon P 13. release mouse button B 14. point to original window P
C. Patanothai Keystroke Level Model 44
Total time = 5P + 4B + 5M = 11.9 sec
Scenario 2b: expert (power key)
1. initiate the deletion M
2. find the icon for the to-be-deleted file M
3. point to file icon P
4. click mouse button BB
5. move hand to keyboard H
6. hit command key command-T KK
7. move hand back to mouse H
C. Patanothai Keystroke Level Model 45
Total time = P + 2B + 2H + 2K + 2M = 5.06 sec
Why is KLM Useful?
• Provides model to reason about keystrokes required to implement operation
• Reminds us to think about timer required for basic interface actions
• Mental processing (M) is significant
C. Patanothai Keystroke Level Model 46
Considerations
• Keystrokes are fast
• Ms, such as memory retrievals and visual searches, are very slow
• The location of M is not as important as # of M’s.
• Mouse moves are very slow
• Switching hand between mouse and keyboard is moderately slow
C. Patanothai Keystroke Level Model 47
HEURISTIC EVALUATION
C. Patanothai 2110646:10-Evaluation 48
Heuristics (original)
• H1-1: Simple & natural dialog
• H1-2: Speak the users’ language
• H1-3: Minimize users’ memory load
• H1-4: Consistency
• H1-5: Feedback
• H1-6: Clearly marked exits
• H1-7: Shortcuts
• H1-8: Precise & constructive error messages
• H1-9: Prevent errors
• H1-10: Help and documentation
10/6/98 49
Heuristics (revised)
• H2-1. Visibility of System Status
• H2-2. Match Between the System and the Real World
• H2-3. User Control and Freedom
• H2-4. Consistency and Standards
• H2-5. Error Prevention
• H2-6. Recognition Rather than Recall
• H2-7. Flexibility and Ease of Use
• H2-8. Aesthetic and Minimalist Design
• H2-9. Help Users Recognize, Diagnose, and Recover from Errors
• H2-10. Help and Documentation
10/6/98 50
Heuristics (revised set)
• H2-1: Visibility of system status – keep users informed about what is going on
– example: pay attention to response time • 0.1 sec: no special indicators needed
• 1.0 sec: user tends to lose track of data
• 10 sec: max. duration if user to stay focused on action
• for longer delays, use percent-done progress bars
10/6/98 51
searching database for matches
Heuristics (cont.)
• Bad example: Mac desktop
– Dragging disk to trash
• should delete it, not eject it
• H2-2: Match between system & real world
– speak the users’ language
– follow real world conventions
10/6/98 52
Heuristics (cont.)
• Wizards – must respond to Q before going to
next
– for infrequent tasks • (e.g., modem config.)
– not for common tasks
– good for beginners
• have 2 versions (WinZip)
10/6/98 53
H2-3: User control & freedom
* “exits” for mistaken choices, undo, redo
* don’t force down fixed paths
Heuristics (cont.)
• H2-4: Consistency & standards
10/6/98 54
Heuristics (cont.)
• Use selection
• breadcrumb to recognize where it is
10/6/98 55
H2-5: Error prevention
H2-6: Recognition rather than recall
* make objects, actions, options, and directions visible or easily retrievable
Heuristics (cont.)
• H2-7: Flexibility and efficiency of use
– accelerators for experts (e.g., gestures, kb shortcuts)
– allow users to tailor frequent actions (e.g., macros)
10/6/98 56
Heuristics (cont.)
• H2-8: Aesthetic and minimalist design
– no irrelevant information in dialogues
10/6/98 57
Heuristics (cont.)
• H2-9: Help users recognize, diagnose, and recover from errors – error messages in plain language
– precisely indicate the problem
– constructively suggest a solution
10/6/98 58
Heuristics (cont.)
• H2-10: Help and documentation
– easy to search
– focused on the user’s task
– list concrete steps to carry out
– not too large
– support new users and experts
10/6/98 59
Phases of Heuristic Evaluation
1) Pre-evaluation training – give evaluators needed domain knowledge and
information on the scenario
2) Evaluation – individuals evaluate and then aggregate results
3) Severity rating – determine how severe each problem is (priority)
4) Debriefing – discuss the outcome with design team
10/6/98 60
How to Perform Evaluation
• At least two passes for each evaluator
– first to get feel for flow and scope of system
– second to focus on specific elements
• If system is walk-up-and-use or evaluators are domain experts, then no assistance needed
– otherwise might supply evaluators with scenarios
• Each evaluator produces list of problems
– explain why with reference to heuristic or other info.
– be specific and list each problem separately
10/6/98 61
Examples
• Can’t copy info from one window to another – violates “Minimize the users’ memory load” (H1-3)
– fix: allow copying
• Typography uses mix of upper/lower case formats and fonts – violates “Consistency and standards” (H2-4)
– slows users down
– probably wouldn’t be found by user testing
– fix: pick a single format for entire interface
10/6/98 62
How to Perform Evaluation (cont.)
• Why separate listings for each violation? – risk of repeating problematic aspect
– may not be possible to fix all problems
• Where problems may be found – single location in UI
– two or more locations that need to be compared
– problem with overall structure of UI
– something that is missing • hard w/ paper prototypes so work extra hard on those
10/6/98 63
Severity Rating
• Used to allocate resources to fix problems
• Estimates of need for more usability efforts
• Combination of
– frequency
– impact
– persistence (one time or repeating)
• Should be calculated after all evals. are in
• Should be done independently by all judges
10/6/98 64
Severity Ratings (cont.)
0 - don’t agree that this is a usability problem
1 - cosmetic problem
2 - minor usability problem
3 - major usability problem; important to fix
4 - usability catastrophe; imperative to fix
10/6/98 65
Debriefing
• Conduct with evaluators, observers, and development team members
• Discuss general characteristics of UI
• Suggest potential improvements to address major usability problems
• Dev. team rates how hard things are to fix
• Make it a brainstorming session
– little criticism until end of session
10/6/98 66
Severity Ratings Example
10/6/98 67
1. [H1-4 Consistency] [Severity 3][Fix 0] The interface used the string "Save" on the first screen for saving the user's file, but used the string "Write file" on the second screen. Users may be confused by this different terminology for the same function.
HE vs. User Testing
• HE is much faster – 1-2 hours each evaluator vs. days-weeks
• HE doesn’t require interpreting user’s actions
• User testing is far more accurate (by def.) – takes into account actual users and tasks
– HE may miss problems & find “false positives”
• Good to alternate between HE and user testing – find different problems
– don’t waste participants
10/6/98 68
Results of Using HE
• Discount: benefit-cost ratio of 48 [Nielsen94]
– cost was $10,500 for benefit of $500,000
– value of each problem ~15K (Nielsen & Landauer)
– how might we calculate this value?
• in-house -> productivity; open market -> sales
• Correlation between severity & finding w/ HE
10/6/98 69
Results of Using HE (cont.)
• Single evaluator achieves poor results
– only finds 35% of usability problems
– 5 evaluators find ~ 75% of usability problems
– why not more evaluators???? 10? 20?
• adding evaluators costs more
• many evaluators won’t find many more problems
10/6/98 70
Decreasing Returns
• Caveat: graphs for a specific example
10/6/98 71
problems found benefits / cost
USABILITY EVALUATION
C. Patanothai 2110646:10-Evaluation 72
Why Evaluate the Usability?
• Does the Interface Meet the Usability Requirements?
– Effective
– Efficient
– Engaging
– Error tolerant
– Easy to learn
• Exploring Other Concerns in Evaluations
– i.e.
• Why users are unable to complete tasks easily.
• Is the UI developed for all levels of users?
• Are all design features acceptable to users?
C. Patanothai 2110646:10-Evaluation 73
Learnability Efficiency Memorability Error Satisfaction
The Activities of Usability Evaluations
• The Process of Usability Evaluation Is Iterative
• Techniques for Usability Evaluations
– User Observations
– Inspections of the User Interface (heuristic inspection)
• Conform to usability standards?
– Other Evaluation Techniques
• Variations of user observation or inspection
C. Patanothai 2110646:10-Evaluation 74
What Happens in a User Observation Evaluation Session?
• Welcome participant, explain purpose, make participant comfortable
• Ask participant to complete tasks while you observe and record
• Following completion of tasks, as for participant’s views, or to complete posttest questionnaire
• Thank participant.
C. Patanothai 2110646:10-Evaluation 75
Strategic choices
• What is the purpose of the evaluation?
• Which user will you choose?
• Where will you do the evaluation?
• What data do you need to collect?
• What product, system, or prototype are you testing?
• What tasks will you ask the participants to try?
• Are they any specific concerns or questions that you want to ask the participant about?
C. Patanothai 2110646:10-Evaluation 76
Creating an Evaluation Strategy
• What Is the Purpose of This Evaluation?
– Does system meet usability requirements/concerns
– Qualitative Usability Requirements
• Desired features
– “The users on an e-shopping site should be able to order an item easily and without assistance.”
– “Railway clerks work in extremely noisy environments, so any warning messages to them should be visually distinct and highlighted on the screens.”
C. Patanothai 2110646:10-Evaluation 77
Creating an Evaluation Strategy
– Quantitative Usability Requirements/Usability Metrics
• Explicit measures used: percentages, timings, or numbers are used.
– “It should be possible for the users to load any page of a web site in 10 seconds using a 56K modem.”
– “It should take no more than two minutes for an experienced user (one who has domain knowledge and has undergone the prescribed level of training when the new system is introduced) to enter a customer’s details in the hotel’s database”
C. Patanothai 2110646:10-Evaluation 78
Level of usability metric
• current
• best case
• planned
• worst case
C. Patanothai 2110646:10-Evaluation 79
Creating an Evaluation Strategy
• Prioritizing Usability Requirements and Concerns
– The usability requirements most important to the success of the system are given priority.
– Assign values to the five dimensions of usability, the Five Es.
C. Patanothai 2110646:10-Evaluation 80
Creating an Evaluation Strategy
– What Type of Data Do I Want to Collect?
• Quantitative data
– Numeric content
• Qualitative data
– Non-numeric content
C. Patanothai 2110646:10-Evaluation 81
Evaluation Data
Dimension Possible quantitative data to collect
Possible qualitative data to collect
Effective Task completed accurately or not
Task finished correctly or not
Efficient Keystroke/click elapsed time Navigation paths
Task is easy or difficult
Engaging Numeric measure of satisfaction
User satisfaction surveys
Easy to learn # of false starts Time spent in incorrect routes Time spent to complete a task
Level of confidence
Error tolerant Level of accuracy Feeling of confidence
C. Patanothai 2110646:10-Evaluation 82
Creating an Evaluation Strategy
• What Am I Evaluating?
• What Constraints Do I Have?
– Money
– Timescales
– Availability of usability equipment
– Availability of participants and the costs of recruiting them
– Availability of evaluators
• Documenting the Evaluation Strategy
C. Patanothai 2110646:10-Evaluation 83
Choosing Your Users
• Who Is a Real User?
– Users who reflect the different skills, domain knowledge, system experience
• Users Working Alone or in Pairs
• Number of Participants
• Recruiting Extra Participants
C. Patanothai 2110646:10-Evaluation 84
How many participants are needed?
C. Patanothai 2110646:10-Evaluation 85
Choosing Your Users
• Ideas for Participants
– colleagues, family members, real users
• Offering Incentives
– Thank you letter, pay for out-of-pocket expenses, samples, gifts
• Recruiting Screeners and Pre-test Questionnaires
C. Patanothai 2110646:10-Evaluation 86
Creating a Timetable
• Decide the Duration of the Evaluation Session 30-90 minutes
• Create an Evaluation Timetable – sessions, evaluation, reporting
• Preparing Task Descriptions – the tasks the participant will perform while interacting with the prototype during the evaluation
– Task Cards
– Task Descriptions for Global Warming
C. Patanothai 2110646:10-Evaluation 87
Where Will You Do the Evaluation?
• Field Studies – user’s own environment
• Controlled Studies – other than user’s environment
C. Patanothai 2110646:10-Evaluation 88
Deciding how to collect data
• Preparing to Collect Evaluation Data
• Timing and Logging Actions
– Automatic Logging of Keystrokes and Mouse Clicks
– Specialist Logging Software for Usability Evaluations
– Choosing a Logging Product
C. Patanothai 2110646:10-Evaluation 89
Think-Aloud and Offering Help
• Using Cognitive Walkthrough Questions
– “Is there anything there that tells you what to do next?”
– “Is there a choice on the screen that lines up with what you want to do? If so, which one?”
– “Now that you’ve tried it, has it done what you wanted it to do?”
C. Patanothai 2110646:10-Evaluation 90
Taking Notes When Observing Users
C. Patanothai 2110646:10-Evaluation 91
Conducting Post-Session Discussions
• Retrospective protocol
• Post-session interview / debrief
C. Patanothai 2110646:10-Evaluation 92
Questionnaires
• Advantages – Can’t forget to ask a question – All participants see the same questions – Ability to collect quantitative data
• Disadvantages – Difficult to design – Must predict topics the users will need – Closed questions don’t give reasons why the
users answered the way that they have.
C. Patanothai 2110646:10-Evaluation 93
Using Technologies to Help with Recording
• Video and Audio Recording
• Eye-Tracking Equipment
• Practitioner’s Choice of Technology
– What to Do If a Participant Does Not Agree to Be Recorded
C. Patanothai 2110646:10-Evaluation 94
Roles for Evaluators
• Facilitator
• Note-Taker
• Equipment Operator
• Observer
• Meeter and Greeter
• Recruiter
• The Lone Evaluator
C. Patanothai 2110646:10-Evaluation 95
Analysis and interpretation
• How to Analyze and Interpret Data from Your Evaluation
• Collating the Data
• Summarizing the Data
– Extract key comments from the collated data.
C. Patanothai 2110646:10-Evaluation 96
Reviewing the Data to Identify Usability Problems
• Usability Defects (characteristics) – Irritates or confuses the user
– Makes a system hard to install, learn, or use
– Causes mental overload for the user
– Causes poor user performance
– Violates design standards or guidelines
– Reduces trust or credibility of the system
– Tends to cause repeated errors
– Could make the system hard to market
C. Patanothai 2110646:10-Evaluation 97
Working with Quantitative Data
• Tabulations, charts, and rankings for visual rep.
• Descriptive statistics: mean(average), median(middle value), mode(most common value)
• Inferential statistics: tests of statistical significance yielding probability.
C. Patanothai 2110646:10-Evaluation 98
C. Patanothai 2110646:10-Evaluation 99
Working with Qualitative Data
• Making Decisions with Qualitative Data
– Grouping comments or observed problems
– Establish a coding scheme
C. Patanothai 2110646:10-Evaluation 100
Experiment Concerns
• Reliability – whether one would get the same result if the test were to be replaced – individual differences between user
• the best user may be 10x as fast as the slowest user • the best 25% = 2x as fast as the slowest 25% of the user
– user A uses interface X 40% faster than user B using interface Y
– (Is interface X better than interface Y?) or – user B is slower in general than user A – the result with user C and D might be the opposite – use standard statistical tests to estimate the
confidence intervals of test results
C. Patanothai 2110646:10-Evaluation 101
Experiment Concerns
• Validity – whether the result actually reflects the usability issues – common sense
– understanding of the test method
– problems involved • using the wrong users
• giving the users the wrong tasks
• not including time constraints Validity
– e.g., testing MIS with business school students compare to chemistry students
C. Patanothai 2110646:10-Evaluation 102
Some consideration
• The differences between the participants in your evaluation and the real users
• The differences between your test environment and the users’ environment
C. Patanothai 2110646:10-Evaluation 103
Hypothesis
• prediction of outcome – framed in terms of IV and DV
e.g. “error rate will increase as font size decreases”
• null hypothesis: – states no difference between conditions
– aim is to disprove this
e.g. null hyp. = “no change with font size”
104 C. Patanothai User Interface Evaluations
Experimental design
• between-subjects (randomized) – each subject performs under only one condition – no transfer of learning – more users required – variation can bias results – solved by careful selection of participants
• within-subjects (repeated measures) – each subject performs experiment under each condition. – transfer of learning possible
• vary the order of experiment
– less costly and less likely to suffer from user variation
105 C. Patanothai User Interface Evaluations
Analysis of data
• Before you start to do any statistics: – look at data – save original data
• Choice of statistical technique depends on – type of data – information required
• Type of data – discrete - finite number of values – continuous - any value
106 C. Patanothai User Interface Evaluations
Analysis - types of test
• parametric – assume normal distribution – reasonable results even when the data are not
precisely normal
• non-parametric – do not assume normal distribution – less powerful – more reliable
• contingency table – classify data by discrete attributes – count number of data items in each group
107 C. Patanothai User Interface Evaluations
Choosing a technique
Independent
variable
Dependent
variable
Parametic
Two valued Normal Student's t test on difference of means
Discrete Normal ANOVA (Analysis Of Variance)
Continuous Normal Linear (or non-linear) regression factor analysis
Non-parametic
Two valued Continuous Wilcoxon (or Mann-Whitny) rank-sum test
Discrete Continuous Rank-sum version of ANOVA
Continuous Continuous Spearman's rank correlation
Contingency tests
Two valued Discrete No special test
Discrete Discrete Contingency table and chi-squared test
Continuous Discrete (Rare) Group independent variable the as above
108 C. Patanothai User Interface Evaluations
Analysis of data
• What information is required? – is there a difference?
• hypothesis testing – how big is the difference?
• point estimation – how accurate is the estimate?
• standard deviation, confidence interval
• Parametric and non-parametric tests mainly address first of these
109 C. Patanothai User Interface Evaluations
Example of non-parametric statistics
• Original data:
– condition A: 33, 42, 25, 79, 52
– condition B: 87, 65, 92, 93, 91, 55
• Sort & rank
– 25 1, 33 2, …, 93 11
• Transformed data:
– condition A: 2, 3, 1, 7, 4
– condition B: 8, 6, 10, 11, 9, 5
110 C. Patanothai User Interface Evaluations
Example of non-parametric statistics
• is there any difference between the two condition – Wilcoxon test – calculate the sum of the rank for each condition and
subtract the least value it could have • 1+2+3+4+5 = 15 for condition A • 1+2+3+4+5+6 = 21 for condition B
rank sum least U
condition A: (2+3+1+7+4) – 15 = 2
condition B: (8+6+10+11+9+5) – 21 = 28 • note: 2+28 = 5*6
– take the smaller U value and compare it with a set of
critical values in a book of statistical tables, to see if it is unusually small
111 C. Patanothai User Interface Evaluations
Example of non-parametric statistics
• is there any difference between the two condition – Wilcoxon test
– take the smaller U value and compare it with a set of critical values in a book of statistical tables, to see if it is unusually small
– the critical value at the 5% level is 3 > 2
– reject null hypothesis
– conclusion • there is likely to be a difference between the conditions
112 C. Patanothai User Interface Evaluations
Example – icon designs
• User will remember the natural icons more easily than the abstract ones.
• 10 participants
113
Copy Save Delete
C. Patanothai User Interface Evaluations
Example – icon designs
(1) (2) (3) (4) (5)
Natural (s)Abstract
(s)
Participant
mean
Natural
(1) - (3)
Abstract
(2) - (3)
1 AN 656 702 679 -23 23
2 AN 259 339 299 -40 40
3 AN 612 658 635 -23 23
4 AN 609 645 627 -18 18
5 AN 1049 1129 1089 -40 40
6 NA 1135 1179 1157 -22 22
7 NA 542 604 573 -31 31
8 NA 495 551 523 -28 28
9 NA 905 893 899 6 -6
10 NA 715 803 759 -44 44
mean (µ) 698 750 724 -26 26
s.d. (σ) 265 259 262 14 14
s.e.d. 117.15 s.e. 4.55
Student's t 0.32 (n.s.)
Participant
number
Presentation
order
5.78 (p<1%, two tailed)
114 C. Patanothai User Interface Evaluations
Experimental studies on groups
More difficult than single-user experiments
Problems with:
– subject groups
– choice of task
– data gathering
– analysis
115 C. Patanothai User Interface Evaluations
Subject groups
larger number of subjects more expensive
longer time to `settle down’ … even more variation!
difficult to timetable
so … often only three or four groups
116 C. Patanothai User Interface Evaluations
The task
must encourage cooperation
perhaps involve multiple channels
options:
– creative task e.g. ‘write a short report on …’
– decision games e.g. desert survival task
– control task e.g. ARKola bottling plant
117 C. Patanothai User Interface Evaluations
Data gathering
several video cameras + direct logging of application
problems:
– synchronization
– sheer volume!
one solution:
– record from each perspective
118 C. Patanothai User Interface Evaluations
Analysis
• Vast variation between groups
• solutions: – within groups experiments – micro-analysis (e.g., gaps in speech) – anecdotal and qualitative analysis
• looking for critical incidents, • interesting events or breakdowns
• look at interactions between group and media
• controlled experiments may `waste' resources!
119 C. Patanothai User Interface Evaluations
Interpretation of User-Observation Data
• Assigning Severities (H.M.L.)
C. Patanothai 2110646:10-Evaluation 120
Usability observation
Evaluator’s comments
Cause Severity rating
The user did not select the right menu item (Options) to initiate the task
The user was not sure which menu Options was in
The menu name is inappropriate, as it dose not relate to the required action.
High
Interpretation of User-Observation Data
• Recommending Changes
C. Patanothai 2110646:10-Evaluation 121
Usability observation
Cause of the usability deflect
Severity rating
Recommended solution
Statis
The user did not select the right menu item (Options) to initiate the task
The menu name is inappropriate, as it dose not relate to the required action
High The menu name should be change to “Group
Make change in next revision
Writing the Evaluation Report
• As a record of what you did.
• To communicate the findings to other stakeholders
• Should You Describe Your Method?
– For academics – yes, for business usually no
• Describing Your Results
– Graphics, screenshots help
C. Patanothai 2110646:10-Evaluation 122
C. Patanothai 2110646:10-Evaluation 123
Different Purposes of Evaluations
• Exploratory Evaluations
– explore the ui design features
– gather feedback on prelim. design
– verify assumptions
• Validation Evaluation
• Assessment Evaluation
• Comparison Evaluation
C. Patanothai 2110646:10-Evaluation 124
Different Purposes of Evaluations
• Validation Evaluation
– establish a hypothesis
– establish a null hypothesis
– decide the sample size
– ensure randomness
C. Patanothai 2110646:10-Evaluation 125
Different Purposes of Evaluations
• Assessment Evaluation
• Comparison Evaluation
– is A better than B?
– between subjects / within subjects
C. Patanothai 2110646:10-Evaluation 126