User Interface Designchate/2110646/10-Evaluation.pdf · VCR display, or remembers where ... Input devices will need to accommodate this. ... •Times for primitive operations are

User Interface Design

Lecture 12

Evaluation

C. Patanothai 2110646:10-Evaluation 2



Evaluation Types

• Without user

– cognitive walk-through

– keystroke-level model

– heuristics evaluation

• With user

– usability test

– user test


COGNITIVE WALK-THROUGH

Cognitive Walk-Through

• “Evaluates the steps required to perform a task and attempts to uncover mismatches between how the users think about a taks and how the UI designer thinks about the task”

C. Patanothai Cognitive Walk-Through 7

concentrate on LEARNABILITY

Cognitive Walkthrough Method

Step 0:

The user selects a task to be performed and writes down all the steps (actions) in the task.

For each action in the task:

Step 1:

The user explores the artifact, prototype, or task scenario, looking for the action that might enable him or her to perform the selected task.

How does the use know what to do next? Is the correct action sufficiently evident to the user (can recognize it) or does the user have to recall what to do from memory?


Cognitive Walkthrough Method

Step 2:

The user selects the action that appears to match most closely what he or she is trying to do.

Will the user connect the description of the correct action with what he or she is trying to do?

Step 3:

The user interprets the system’s response and assesses if any progress has made toward completing the task.

Will the user know if he or she has made a right or wrong choice?


Remote


Produce a “Cognitive Walkthrough” of a TV or other remote control you may have at home.

•From “Turn ON”

•Actions in between

•To “Turn OFF”

Step 0: Select a Task

• set the VCR for a timed recording of a program start at 21:00 and finishing at 22:30 on Ch4 on August 18, 2005.

• when switched ON, the VCR displays the number of the channel it last used.

• when switched OFF, it displays a digital clock.

• To set the timed recording, the VCR must be switched ON.



User actions (UA) System responses (SR)

UA1 Press the PROG button on the hand set.

SR1 VCR display shows a form fill-in for setting the start and stop times. These times are divided into separated sections for the hour and minute, separated by a colon. The cursor is flashing on the hour section on the start time.

UA2 Press the up arrow until the number 21 is showing.

SR2 21 is showing in the hour section of the stat time.

UA3 Press the right arrow once to move the cursor to the minute section of the stat time.

SR3 00 is showing and flashing in the minute section of the start time. This defaulted to 00 on selection of 21 in the hour section.



UA4 00 in the minute section of the start time is what is wanted. Press the right arrow once to move the cursor to the hour section of the finish time.

SR4 The cursor is flashing in the hour section of the finish time.


SR5 22 is showing in the hour section of the finish time.

UA6 Press the right arrow once to move the cursor to the minute section of the finish time.

SR6 00 is showing and flashing in the minute portion of the finish time. This defaulted to 00 on selection of 22 in the hour section.



UA7 Press the up arrow until the number 30 is shown.

SR7 30 is showing in the minute section of the finish time.

UA8 Press the right arrow once to move the cursor to the day section of the date field.

SR8 On the display, the full date has now defaulted to the current date. The cursor is flashing in the day section of the date.


SR9 18 is showing in the day section of the date field.

UA10 Press the right arrow once to move the cursor to the month section of the date field

SR10 The cursor is flashing in the month section of the date field.




SR11 18 is showing in the day section of the date field, and 8 is showing in the month section of the date field.

UA12 Press the right arrow once to move the cursor to the year section.

SR12 The cursor is flashing in the year section of the date field.


SR13 18 is showing in the day section, 8 is showing in the month section, and 04 is showing in the year section of the date field.

UA14 Press the right arrow once to move the cursor to select the channel to record.

SR14 The cursor is flashing in the channel field.




SR15 4 is showing as the channel to record form.

UA16 Press the right arrow once so the system accepts the setting.

SR16 The clock returns to the display. A small 1 is displayed on the left side of the clock, which indicates one timed recording has been set.

UA17 Press the TIMER button to initiate timed recording mode.

SR17 Video switches itself off and into timed recording mode. A small red clock is displayed in the upper right-hand corner of the display to indicate that the video is set for time recording.


UA1 Press the prog button on the handset.

Question 1 Is the correct action sufficiently evident to the user? Neither the handset nor the VCR display give any indication that the user needs to press the PROG button to do a timed recording.

Question 2 Will the user connect the description of the correct action with what he or she is trying to do? Experienced users might associate timed recording with setting or programming (prog) the VCR. However, this is probably not the case for novice users.

Question 3 Will the user know if he or she has made a right or wrong choice on the basis of the system’s response to the chosen action? Once the PROG button is pressed, the VCR display changes to form fill-in that guides the user in entering the information (although the display on the handset does not change). Any user who notices the VCR display, or remembers where the form fill-in appears, will know that he or she has mad a right choice.


UA2 Press the up arrow until the number 21 is showing in the hour section of the start time.

Question 1 Is the correct action sufficiently evident to the user? No. It is not evident that to set the time one can use only the four unlabeled arrow keys. In fact, the handset is confusing for the user because there is a number pad above the four arrow keys. The user might assume that he/she can use the number pad to enter the time values into the form.

Question 2 Will the user connect the description of the correct action with what he or she is trying to do? No. There are no markings on the arrows themselves, nor anywhere near them, that might indicate that they are to be used for entering information into the programming form fill-in.

Question 3 Will the user know if he or she has made a right or wrong choice on the basis of the system’s response to the chosen action? If the user is lucky enough to discover that the arrow buttons change the times and channel, then there will be feedback on the VCR display as the form gets filled in. However, this could easily be missed if the user stops looking at the VCR display – perhaps because he/she is so engrossed (and irritated) in trying to make the handset work.

Environment and effect

Environmental characteristic How it affects the design

The environment is noisy. The use of sound for alerting users to problems may not be effective

The environment is dusty or dirty.

Equipment might require some type of protective covering (e.g., a keyboard might need a membranous cover).

Users wear protective clothing such as gloves.

Input devices will need to accommodate this.


Environment and effect

Environmental characteristic How it affects the design

The work is highly pressured and subject to frequent interruptions.

The application must allow the user to stop his or her work and restart it later, preferably from the point where the user left off.

There is a need for workers to share information, or the work is designed so that they work in groups rather than in isolation.

The workplace will need to be laid out carefully to take this factor into consideration.


KEYSTROKE LEVEL MODEL

Keystroke Level Model

• Card, Moran, Newell—CACM, July 1980

• Outline

– Problem statement

– Model

– Empirical validation

– Applications

C. Patanothai Keystroke Level Model 22

Problem Statement

• Goal – Develop simple model to describe time to do task with a

given method on an interactive system

• Ttask = Tacquire + Texecute where Ttask = total time to complete task Tacquire = time to select method to complete task Texecute = time to perform method

• Model predicts Texecute

• Assume expert users and no errors


Model

• Texecute = (time to execute primitive op)

• Primitive operations

– K key press

– B button press

– P point to target with mouse

– H home hands to keyboard or mouse

– D draw line with mouse

– M mental preparation (pause)

– R system response time


Model (cont.)

• Times for primitive operations are predicated from experiments

– Time to press key ranges between

– 0.08 sec/char best typist (135 wpm)

– 0.12 sec/char good typist (90 wpm)

– 0.28 sec/char average typist (40 wpm)

– 0.50 sec/char random letters

– 0.75 sec/char complex codes

– 1.2 sec/char slow typist


Actual Parameters

Parameter Estimate Time (sec) K nktk 0.12 (good)

0.20 (average typist)

0.28 (average non-typist)

1.20 (non-typist)

B down/up

click 0.10

0.20

P Fitt's law

Average 0.1log2(D/S + 0.5)

1.10

H 0.40

D(nD, lD) nD – # of segments

lD – total length 0.9 nD + 0.16 lD

M 0.6 - 1.35 (use 1.2)

R


Scenario 1 (Assumption)

• One file is to be deleted

• File icon is visible and can be pointed to

• Trash can icon is visible and pointable

• Cursor must end up in the original window that the file icon was in

• Hand starts and ends on mouse

• User is average non-secretary typist (40 wpm)


Scenario 1 (action sequence)

1. point to file icon

2. press and hold mouse button

3. drag file icon to trash can icon

4. release mouse button

5. point to original window


Scenario 1 (Operator sequence)

1. point to file icon P

2. press and hold mouse button B

3. drag file icon to trash can icon P

4. release mouse button B

5. point to original window P


Total time = 3P + 2B = 3*1.1 + 2*0.1 = 3.5 sec

New design

• In the new design, the procedure we intend people to use is to first select the file to be deleted, and

• then select DELETE on the FILE menu.


New design (action sequence)

1. point to file icon

2. click mouse button

3. point to file menu

4. press and hold mouse button

5. point to DELETE item

6. release mouse button

7. point to original window


New design (operator sequence)


2. click mouse button BB

3. point to file menu P


5. point to DELETE item P




Total time = 4P + 4B = 4*1.1 + 4*0.1 = 4.8 sec

New design

• slower

• require additional mouse movement

• solution

– power key


New design (power key)



3. move hand to keyboard H

4. hit command key command-T KK

5. move hand back to mouse H


Total time = P + 2B + 2H + 2K = 1.1 + 0.2 + 0.8 + 0.56 = 2.66 sec

Scenario 2: Hidden trash can

• One file is to be deleted. • File icon is visible and can be pointed to in

window A. • Trash can icon is covered by one window, B, that

must be moved out of the way to see the trash can, but B must remain open on the screen.

• There is room for both window A and window B on the screen at the same time.

• Cursor must end up in window A • Hand starts and ends on mouse


Scenario 2: current design

1. point to the title bar of window B P 2. hold down the mouse button B 3. drag the window to another place P 4. release the mouse button B 5. point to file icon P 6. press and hold mouse button B 7. drag file icon to trash can icon P 8. release mouse button B 9. point to original window P


Total time = 5P + 4B = 5*1.1 + 4*0.1 = 5.9 sec

Scenario 2: new design

• same as scenario 1

• Total time, menu version = 4.8 sec

• Total time, power key version = 2.66 sec

• Current design: Time = 5.9 sec


Including M’s (top-down)

• Initiating a task • Making a strategy decision • Retrieving a chunk from memory • Finding something on the screen • Thinking of a task parameter • Verifying that a specification or action is correct • Some useful convention

– The values for task parameters have to be explicitly obtained in a step.

– Pointing to an object on the screen should be preceded by a mental operator to locate the object.

– If something on the screen changes in response to user input, there should be a step to verify that the desired result appeared.


Assigning Ms for new vs. expert

• New users verify every step.

• New users have small chunks, expert users have big chunks.

• Experienced users can overlap Ms with physical operators.


Scenario 1a: new users

• New user will stop and check feedback from system at every step.

• The general procedure – find the file icon,

– make sure it is selected,

– and drag it to the trash can,

– making sure the trash can has been hit (reverse videos),

– and then verify the whole process by checking that the trash can is bulging.


Scenario 1a: new users

1. Initiate the deletion (decide to do the task) M 2. find the file icon M 3. point to file icon P 4. press and hold mouse button B 5. verify that the icon is reverse-video M 6. find the trash can icon M 7. drag file icon to trash can icon P 8. verify that the trash can icon is reverse-video M 9. release mouse button B 10. verify that the trash can icon is bulging M 11. find the original window M 12. point to original window P


Total time = 3P + 2B + 7M = 11.9 sec

Scenario 1b: expert

• The general procedure – find the file icon to be deleted and drag it to the trash can. – Experienced user thinks of selecting and dragging an item as a

single operation - a chunk – The user must find the to-be-deleted icon since it is different

every time – Moving icons to the trash can is highly practiced: – The trash can does not have to be located, so finding it is

overlapped with pointing to it – Verifying that the trash can has been hit is overlapped with

pointing to it – Final result (bulging can) is not checked since it is redundant

with verifying that the can has been hit – Pointing to the original window is overlapped with finding it


Scenario 1b: expert

1. initiate the deletion M

2. find the file icon M



5. drag file icon to trash can icon P





Scenario 2a: expert

1. initiate the deletion M 2. notice that trash is covered and decide to uncover it M 3. choose a suitable empty place on the screen M 4. initiate the window move M 5. point to the title bar of window B P 6. hold down the mouse button B 7. drag the window to the other place P 8. release the mouse button B 9. find the icon for the to-be-deleted file M 10. point to file icon P 11. press and hold mouse button B 12. drag file icon to trash can icon P 13. release mouse button B 14. point to original window P



Scenario 2b: expert (power key)

1. initiate the deletion M

2. find the icon for the to-be-deleted file M



5. move hand to keyboard H

6. hit command key command-T KK

7. move hand back to mouse H


Total time = P + 2B + 2H + 2K + 2M = 5.06 sec

Why is KLM Useful?

• Provides model to reason about keystrokes required to implement operation

• Reminds us to think about timer required for basic interface actions

• Mental processing (M) is significant


Considerations

• Keystrokes are fast

• Ms, such as memory retrievals and visual searches, are very slow

• The location of M is not as important as # of M’s.

• Mouse moves are very slow

• Switching hand between mouse and keyboard is moderately slow


HEURISTIC EVALUATION


Heuristics (original)

• H1-1: Simple & natural dialog

• H1-2: Speak the users’ language

• H1-3: Minimize users’ memory load

• H1-4: Consistency

• H1-5: Feedback

• H1-6: Clearly marked exits

• H1-7: Shortcuts

• H1-8: Precise & constructive error messages

• H1-9: Prevent errors

• H1-10: Help and documentation

10/6/98 49

Heuristics (revised)

• H2-1. Visibility of System Status

• H2-2. Match Between the System and the Real World

• H2-3. User Control and Freedom

• H2-4. Consistency and Standards

• H2-5. Error Prevention

• H2-6. Recognition Rather than Recall

• H2-7. Flexibility and Ease of Use

• H2-8. Aesthetic and Minimalist Design

• H2-9. Help Users Recognize, Diagnose, and Recover from Errors

• H2-10. Help and Documentation

10/6/98 50

Heuristics (revised set)

• H2-1: Visibility of system status – keep users informed about what is going on

– example: pay attention to response time • 0.1 sec: no special indicators needed

• 1.0 sec: user tends to lose track of data

• 10 sec: max. duration if user to stay focused on action

• for longer delays, use percent-done progress bars

10/6/98 51

searching database for matches

Heuristics (cont.)

• Bad example: Mac desktop

– Dragging disk to trash

• should delete it, not eject it

• H2-2: Match between system & real world

– speak the users’ language

– follow real world conventions

10/6/98 52

Heuristics (cont.)

• Wizards – must respond to Q before going to

next

– for infrequent tasks • (e.g., modem config.)

– not for common tasks

– good for beginners

• have 2 versions (WinZip)

10/6/98 53

H2-3: User control & freedom

* “exits” for mistaken choices, undo, redo

* don’t force down fixed paths

Heuristics (cont.)

• H2-4: Consistency & standards

10/6/98 54

Heuristics (cont.)

• Use selection

• breadcrumb to recognize where it is

10/6/98 55

H2-5: Error prevention

H2-6: Recognition rather than recall

* make objects, actions, options, and directions visible or easily retrievable

Heuristics (cont.)

• H2-7: Flexibility and efficiency of use

– accelerators for experts (e.g., gestures, kb shortcuts)

– allow users to tailor frequent actions (e.g., macros)

10/6/98 56

Heuristics (cont.)

• H2-8: Aesthetic and minimalist design

– no irrelevant information in dialogues

10/6/98 57

Heuristics (cont.)

• H2-9: Help users recognize, diagnose, and recover from errors – error messages in plain language

– precisely indicate the problem

– constructively suggest a solution

10/6/98 58

Heuristics (cont.)

• H2-10: Help and documentation

– easy to search

– focused on the user’s task

– list concrete steps to carry out

– not too large

– support new users and experts

10/6/98 59

Phases of Heuristic Evaluation

1) Pre-evaluation training – give evaluators needed domain knowledge and

information on the scenario

2) Evaluation – individuals evaluate and then aggregate results

3) Severity rating – determine how severe each problem is (priority)

4) Debriefing – discuss the outcome with design team

10/6/98 60

How to Perform Evaluation

• At least two passes for each evaluator

– first to get feel for flow and scope of system

– second to focus on specific elements

• If system is walk-up-and-use or evaluators are domain experts, then no assistance needed

– otherwise might supply evaluators with scenarios

• Each evaluator produces list of problems

– explain why with reference to heuristic or other info.

– be specific and list each problem separately

10/6/98 61

Examples

• Can’t copy info from one window to another – violates “Minimize the users’ memory load” (H1-3)

– fix: allow copying

• Typography uses mix of upper/lower case formats and fonts – violates “Consistency and standards” (H2-4)

– slows users down

– probably wouldn’t be found by user testing

– fix: pick a single format for entire interface

10/6/98 62

How to Perform Evaluation (cont.)

• Why separate listings for each violation? – risk of repeating problematic aspect

– may not be possible to fix all problems

• Where problems may be found – single location in UI

– two or more locations that need to be compared

– problem with overall structure of UI

– something that is missing • hard w/ paper prototypes so work extra hard on those

10/6/98 63

Severity Rating

• Used to allocate resources to fix problems

• Estimates of need for more usability efforts

• Combination of

– frequency

– impact

– persistence (one time or repeating)

• Should be calculated after all evals. are in

• Should be done independently by all judges

10/6/98 64

Severity Ratings (cont.)

0 - don’t agree that this is a usability problem

1 - cosmetic problem

2 - minor usability problem

3 - major usability problem; important to fix

4 - usability catastrophe; imperative to fix

10/6/98 65

Debriefing

• Conduct with evaluators, observers, and development team members

• Discuss general characteristics of UI

• Suggest potential improvements to address major usability problems

• Dev. team rates how hard things are to fix

• Make it a brainstorming session

– little criticism until end of session

10/6/98 66

Severity Ratings Example

10/6/98 67

1. [H1-4 Consistency] [Severity 3][Fix 0] The interface used the string "Save" on the first screen for saving the user's file, but used the string "Write file" on the second screen. Users may be confused by this different terminology for the same function.

HE vs. User Testing

• HE is much faster – 1-2 hours each evaluator vs. days-weeks

• HE doesn’t require interpreting user’s actions

• User testing is far more accurate (by def.) – takes into account actual users and tasks

– HE may miss problems & find “false positives”

• Good to alternate between HE and user testing – find different problems

– don’t waste participants

10/6/98 68

Results of Using HE

• Discount: benefit-cost ratio of 48 [Nielsen94]

– cost was $10,500 for benefit of $500,000

– value of each problem ~15K (Nielsen & Landauer)

– how might we calculate this value?

• in-house -> productivity; open market -> sales

• Correlation between severity & finding w/ HE

10/6/98 69

Results of Using HE (cont.)

• Single evaluator achieves poor results

– only finds 35% of usability problems

– 5 evaluators find ~ 75% of usability problems

– why not more evaluators???? 10? 20?

• adding evaluators costs more

• many evaluators won’t find many more problems

10/6/98 70

Decreasing Returns

• Caveat: graphs for a specific example

10/6/98 71

problems found benefits / cost

USABILITY EVALUATION


Why Evaluate the Usability?

• Does the Interface Meet the Usability Requirements?

– Effective

– Efficient

– Engaging

– Error tolerant

– Easy to learn

• Exploring Other Concerns in Evaluations

– i.e.

• Why users are unable to complete tasks easily.

• Is the UI developed for all levels of users?

• Are all design features acceptable to users?


Learnability Efficiency Memorability Error Satisfaction

The Activities of Usability Evaluations

• The Process of Usability Evaluation Is Iterative

• Techniques for Usability Evaluations

– User Observations

– Inspections of the User Interface (heuristic inspection)

• Conform to usability standards?

– Other Evaluation Techniques

• Variations of user observation or inspection


What Happens in a User Observation Evaluation Session?

• Welcome participant, explain purpose, make participant comfortable

• Ask participant to complete tasks while you observe and record

• Following completion of tasks, as for participant’s views, or to complete posttest questionnaire

• Thank participant.


Strategic choices

• What is the purpose of the evaluation?

• Which user will you choose?

• Where will you do the evaluation?

• What data do you need to collect?

• What product, system, or prototype are you testing?

• What tasks will you ask the participants to try?

• Are they any specific concerns or questions that you want to ask the participant about?


Creating an Evaluation Strategy

• What Is the Purpose of This Evaluation?

– Does system meet usability requirements/concerns

– Qualitative Usability Requirements

• Desired features

– “The users on an e-shopping site should be able to order an item easily and without assistance.”

– “Railway clerks work in extremely noisy environments, so any warning messages to them should be visually distinct and highlighted on the screens.”



– Quantitative Usability Requirements/Usability Metrics

• Explicit measures used: percentages, timings, or numbers are used.

– “It should be possible for the users to load any page of a web site in 10 seconds using a 56K modem.”

– “It should take no more than two minutes for an experienced user (one who has domain knowledge and has undergone the prescribed level of training when the new system is introduced) to enter a customer’s details in the hotel’s database”


Level of usability metric

• current

• best case

• planned

• worst case



• Prioritizing Usability Requirements and Concerns

– The usability requirements most important to the success of the system are given priority.

– Assign values to the five dimensions of usability, the Five Es.



– What Type of Data Do I Want to Collect?

• Quantitative data

– Numeric content

• Qualitative data

– Non-numeric content


Evaluation Data

Dimension Possible quantitative data to collect

Possible qualitative data to collect

Effective Task completed accurately or not

Task finished correctly or not

Efficient Keystroke/click elapsed time Navigation paths

Task is easy or difficult

Engaging Numeric measure of satisfaction

User satisfaction surveys

Easy to learn # of false starts Time spent in incorrect routes Time spent to complete a task

Level of confidence

Error tolerant Level of accuracy Feeling of confidence



• What Am I Evaluating?

• What Constraints Do I Have?

– Money

– Timescales

– Availability of usability equipment

– Availability of participants and the costs of recruiting them

– Availability of evaluators

• Documenting the Evaluation Strategy


Choosing Your Users

• Who Is a Real User?

– Users who reflect the different skills, domain knowledge, system experience

• Users Working Alone or in Pairs

• Number of Participants

• Recruiting Extra Participants


How many participants are needed?


Choosing Your Users

• Ideas for Participants

– colleagues, family members, real users

• Offering Incentives

– Thank you letter, pay for out-of-pocket expenses, samples, gifts

• Recruiting Screeners and Pre-test Questionnaires


Creating a Timetable

• Decide the Duration of the Evaluation Session 30-90 minutes

• Create an Evaluation Timetable – sessions, evaluation, reporting

• Preparing Task Descriptions – the tasks the participant will perform while interacting with the prototype during the evaluation

– Task Cards

– Task Descriptions for Global Warming


Where Will You Do the Evaluation?

• Field Studies – user’s own environment

• Controlled Studies – other than user’s environment


Deciding how to collect data

• Preparing to Collect Evaluation Data

• Timing and Logging Actions

– Automatic Logging of Keystrokes and Mouse Clicks

– Specialist Logging Software for Usability Evaluations

– Choosing a Logging Product


Think-Aloud and Offering Help

• Using Cognitive Walkthrough Questions

– “Is there anything there that tells you what to do next?”

– “Is there a choice on the screen that lines up with what you want to do? If so, which one?”

– “Now that you’ve tried it, has it done what you wanted it to do?”


Taking Notes When Observing Users


Conducting Post-Session Discussions

• Retrospective protocol

• Post-session interview / debrief


Questionnaires

• Advantages – Can’t forget to ask a question – All participants see the same questions – Ability to collect quantitative data

• Disadvantages – Difficult to design – Must predict topics the users will need – Closed questions don’t give reasons why the

users answered the way that they have.


Using Technologies to Help with Recording

• Video and Audio Recording

• Eye-Tracking Equipment

• Practitioner’s Choice of Technology

– What to Do If a Participant Does Not Agree to Be Recorded


Roles for Evaluators

• Facilitator

• Note-Taker

• Equipment Operator

• Observer

• Meeter and Greeter

• Recruiter

• The Lone Evaluator


Analysis and interpretation

• How to Analyze and Interpret Data from Your Evaluation

• Collating the Data

• Summarizing the Data

– Extract key comments from the collated data.


Reviewing the Data to Identify Usability Problems

• Usability Defects (characteristics) – Irritates or confuses the user

– Makes a system hard to install, learn, or use

– Causes mental overload for the user

– Causes poor user performance

– Violates design standards or guidelines

– Reduces trust or credibility of the system

– Tends to cause repeated errors

– Could make the system hard to market


Working with Quantitative Data

• Tabulations, charts, and rankings for visual rep.

• Descriptive statistics: mean(average), median(middle value), mode(most common value)

• Inferential statistics: tests of statistical significance yielding probability.



Working with Qualitative Data

• Making Decisions with Qualitative Data

– Grouping comments or observed problems

– Establish a coding scheme


Experiment Concerns

• Reliability – whether one would get the same result if the test were to be replaced – individual differences between user

• the best user may be 10x as fast as the slowest user • the best 25% = 2x as fast as the slowest 25% of the user

– user A uses interface X 40% faster than user B using interface Y

– (Is interface X better than interface Y?) or – user B is slower in general than user A – the result with user C and D might be the opposite – use standard statistical tests to estimate the

confidence intervals of test results


Experiment Concerns

• Validity – whether the result actually reflects the usability issues – common sense

– understanding of the test method

– problems involved • using the wrong users

• giving the users the wrong tasks

• not including time constraints Validity

– e.g., testing MIS with business school students compare to chemistry students


Some consideration

• The differences between the participants in your evaluation and the real users

• The differences between your test environment and the users’ environment


Hypothesis

• prediction of outcome – framed in terms of IV and DV

e.g. “error rate will increase as font size decreases”

• null hypothesis: – states no difference between conditions

– aim is to disprove this

e.g. null hyp. = “no change with font size”

104 C. Patanothai User Interface Evaluations

Experimental design

• between-subjects (randomized) – each subject performs under only one condition – no transfer of learning – more users required – variation can bias results – solved by careful selection of participants

• within-subjects (repeated measures) – each subject performs experiment under each condition. – transfer of learning possible

• vary the order of experiment

– less costly and less likely to suffer from user variation


Analysis of data

• Before you start to do any statistics: – look at data – save original data

• Choice of statistical technique depends on – type of data – information required

• Type of data – discrete - finite number of values – continuous - any value


Analysis - types of test

• parametric – assume normal distribution – reasonable results even when the data are not

precisely normal

• non-parametric – do not assume normal distribution – less powerful – more reliable

• contingency table – classify data by discrete attributes – count number of data items in each group


Choosing a technique

Independent

variable

Dependent

variable

Parametic

Two valued Normal Student's t test on difference of means

Discrete Normal ANOVA (Analysis Of Variance)

Continuous Normal Linear (or non-linear) regression factor analysis

Non-parametic

Two valued Continuous Wilcoxon (or Mann-Whitny) rank-sum test

Discrete Continuous Rank-sum version of ANOVA

Continuous Continuous Spearman's rank correlation

Contingency tests

Two valued Discrete No special test

Discrete Discrete Contingency table and chi-squared test

Continuous Discrete (Rare) Group independent variable the as above


Analysis of data

• What information is required? – is there a difference?

• hypothesis testing – how big is the difference?

• point estimation – how accurate is the estimate?

• standard deviation, confidence interval

• Parametric and non-parametric tests mainly address first of these


Example of non-parametric statistics

• Original data:

– condition A: 33, 42, 25, 79, 52

– condition B: 87, 65, 92, 93, 91, 55

• Sort & rank

– 25 1, 33 2, …, 93 11

• Transformed data:

– condition A: 2, 3, 1, 7, 4

– condition B: 8, 6, 10, 11, 9, 5



• is there any difference between the two condition – Wilcoxon test – calculate the sum of the rank for each condition and

subtract the least value it could have • 1+2+3+4+5 = 15 for condition A • 1+2+3+4+5+6 = 21 for condition B

rank sum least U

condition A: (2+3+1+7+4) – 15 = 2

condition B: (8+6+10+11+9+5) – 21 = 28 • note: 2+28 = 5*6

– take the smaller U value and compare it with a set of

critical values in a book of statistical tables, to see if it is unusually small



• is there any difference between the two condition – Wilcoxon test

– take the smaller U value and compare it with a set of critical values in a book of statistical tables, to see if it is unusually small

– the critical value at the 5% level is 3 > 2

– reject null hypothesis

– conclusion • there is likely to be a difference between the conditions


Example – icon designs

• User will remember the natural icons more easily than the abstract ones.

• 10 participants

113

Copy Save Delete

C. Patanothai User Interface Evaluations

Example – icon designs

(1) (2) (3) (4) (5)

Natural (s)Abstract

(s)

Participant

mean

Natural

(1) - (3)

Abstract

(2) - (3)

1 AN 656 702 679 -23 23

2 AN 259 339 299 -40 40

3 AN 612 658 635 -23 23

4 AN 609 645 627 -18 18

5 AN 1049 1129 1089 -40 40

6 NA 1135 1179 1157 -22 22

7 NA 542 604 573 -31 31

8 NA 495 551 523 -28 28

9 NA 905 893 899 6 -6

10 NA 715 803 759 -44 44

mean (µ) 698 750 724 -26 26

s.d. (σ) 265 259 262 14 14

s.e.d. 117.15 s.e. 4.55

Student's t 0.32 (n.s.)

Participant

number

Presentation

order

5.78 (p<1%, two tailed)


Experimental studies on groups

More difficult than single-user experiments

Problems with:

– subject groups

– choice of task

– data gathering

– analysis


Subject groups

larger number of subjects more expensive

longer time to `settle down’ … even more variation!

difficult to timetable

so … often only three or four groups


The task

must encourage cooperation

perhaps involve multiple channels

options:

– creative task e.g. ‘write a short report on …’

– decision games e.g. desert survival task

– control task e.g. ARKola bottling plant


Data gathering

several video cameras + direct logging of application

problems:

– synchronization

– sheer volume!

one solution:

– record from each perspective


Analysis

• Vast variation between groups

• solutions: – within groups experiments – micro-analysis (e.g., gaps in speech) – anecdotal and qualitative analysis

• looking for critical incidents, • interesting events or breakdowns

• look at interactions between group and media

• controlled experiments may `waste' resources!


Interpretation of User-Observation Data

• Assigning Severities (H.M.L.)


Usability observation

Evaluator’s comments

Cause Severity rating

The user did not select the right menu item (Options) to initiate the task

The user was not sure which menu Options was in

The menu name is inappropriate, as it dose not relate to the required action.

High

Interpretation of User-Observation Data

• Recommending Changes


Usability observation

Cause of the usability deflect

Severity rating

Recommended solution

Statis

The user did not select the right menu item (Options) to initiate the task

The menu name is inappropriate, as it dose not relate to the required action

High The menu name should be change to “Group

Make change in next revision

Writing the Evaluation Report

• As a record of what you did.

• To communicate the findings to other stakeholders

• Should You Describe Your Method?

– For academics – yes, for business usually no

• Describing Your Results

– Graphics, screenshots help



Different Purposes of Evaluations

• Exploratory Evaluations

– explore the ui design features

– gather feedback on prelim. design

– verify assumptions

• Validation Evaluation

• Assessment Evaluation

• Comparison Evaluation



• Validation Evaluation

– establish a hypothesis

– establish a null hypothesis

– decide the sample size

– ensure randomness



• Assessment Evaluation

• Comparison Evaluation

– is A better than B?

– between subjects / within subjects


Documents

User Interface Designchate/2110646/10-Evaluation.pdf · VCR display, or remembers where ... Input devices will need to accommodate this. ... •Times for primitive operations are