65
Title Text Evaluation: Controlled Experiments 1

Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Title Text

Evaluation: Controlled Experiments

1

Page 2: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Outline

• Evaluation beyond usability tests• Controlled Experiments• Other Evaluation Methods

• CHI 2014/2015 Cool stuff: A glimpse into recent HCI research

2

Page 3: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Evaluation Beyond Usability Tests

3

Page 4: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Usability Evaluation (last week)

• Expert tests / walkthroughs• Usability Tests with users

• Main goal: formative– identify usability problems– improve the tool

4

Page 5: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Summative Evaluation (focus today)

• How good is it? Useful?• Better than other tools?

5

Page 6: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Formative and Summative:Usually combined

6Evaluation over time

formative summative

Page 7: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Evaluation goals (summative)

7

• Generalizability– Results can be applied to other people

• Precision– We measured what we wanted to measure

(controlling factors that were not intended to study)

• Realism– Study context is realistic

... usually trade-off between them!

Page 8: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

8

© McGrath / Carpendale

The selection of a research method depends on the research question and the object under study!

Page 9: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Controlled Experiments

9

Page 10: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Controlled experiment

• Or:– Laboratory Experiment – Lab study – User Study– A/B Testing (used in marketing)– …

10

Page 11: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Focus

11

• Precision• Generalizability (?)

• Overall goal– Reveal cause-effect relationships– e.g. smoking causes cancer

Page 12: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Scenario

12

A B

Which is better?

Page 13: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

13© Carpendale

Test it with users!

Page 14: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Hypothesis

• A precise problem statement• Example:

– H1 = Participants will buy more beer when using variant B than variant A

– Null-Hypothese H0 = no difference in beer purchase

14

A B

Page 15: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Independent Variables

• Factors to be studied• Typical independent variables (in HCI)

– Different types of design– Task type: e.g., searching/browsing– Participant demographics: e.g., male/female – Different technologies: touch pad vs. keyboard

• Control of Independent Variable– Levels: The number of variables in each factor– Limited by the length of the study and the number of

participants• How different?

– Entire interfaces vs. very specific parts15

A

B

Page 16: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Control Environment

• Make sure nothing else could cause your effect

• Control confounding variables• Randomization!

16

A

B

Page 17: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Different Designs: Between-Subjects

• Divide the participants into groups, each group does one condition

• Randomize: Group Assignment• Potential problem?

17

A

B

Group 1

Group 2

Page 18: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Different Designs: Within-Subjects

• Everybody does all the conditions• Can account for individual differences and reduce noise (that’s

why it may be more powerful and requires less participants)• Severely limits the number of conditions, and even types of

tasks tested (may be able to workaround by having multiple sessions)

• Can lead to ordering effects —> Randomize Order

18

A

B

Page 19: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Dependent Variable

• The things that you measure• Performance indicators:

– task completion time, error rates, mouse movement…– (numbers of beers bought)

• Subjective participant feedback: – satisfaction ratings, closed-ended questions,

interviews…– questionnaires (HCI lecture last week)

• Observations: – behaviors, signs of frustrations…

19

Page 20: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Tasks

• Specifying good tasks for controlled experiments is tricky– Specifically, if you are measuring performance criteria

• Task criteria– comparability for different interfaces– clear end point

• Example– usability test: >>buy a book for a 4 year old<<– controlled experiment: >>find and buy the book

‘The Gruffalo’<<20

Page 21: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Results: Application of Statistics

• Descriptive Statistics– Describes the data you gathered (e.g. visually)

• Inferential Statistics– Make predictions/inferences from your study to

the larger population

21

Page 22: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Descriptive statistics

• Central tendency– mean {1, 2, 4, 5}– median {15, 19, 22, 29, 33, 45, 50}– mode {12, 15, 22, 22, 22, 34, 34}

22

Page 23: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Descriptive statistics

• Central tendency– mean {1, 2, 4, 5} 3– median {15, 19, 22, 29, 33, 45, 50} 29– mode {12, 15, 22, 22, 22, 34, 34} 22

• Measures of spread– range– variance– standard deviation

23note: for inferential standard deviation N becomes (N-1) —> estimate for sampled population

=

=

Page 24: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Visualization of descriptive statistics

24

• Mean• 25/75% Quartiles• Min / Max• (alternative: with outliers)

e.g., Boxplot

Page 25: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Inferential statistics

• Goal: Generalize findings to the larger population

25http://www.latrobe.edu.au/psy/research/cognitive-and-developmental-psychology/esci

Page 26: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Excursus: Tragedy of the error bars

26

CI = Confidence intervals

SE = Standard Error (SD of the sampling distribution of the sample mean)

SD = Standard Deviation

Page 27: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Excursus: 95% Confidence intervals

• USE THEM!• Interpretation: We can be 95% confident that

the real mean lies within our confidence interval!

27

Page 28: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Null Hypothesis Testing

• Statistically significant results– p < .05– The probability that we incorrectly reject the

Null-Hypotheses• Many different tests

– t-test, ANOVA, …

28

A B

Page 29: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Validity

• Is there a causal relationship?• Errors:

– Type I: False positives– Type II: False negatives

• Internal Validity– Are there alternate causes?

• External Validity– Can we generalize the study?– E.g. generalizable to the

larger population of undergrad students

29

type I

type IIguilty

notguilty

Page 30: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Internal Validity: Storks deliver babies!?

30

• R. Matthews, “Storks Deliver Babies”. Journal of Teaching Statistics, vol. 22, issue 2, pages 36-38, 2001;

• There is a correlation coefficient of r=0.62 (reasonably high)

• A statistical test can be employed that shows that this correlation is in fact significant (p = 0.008)

• What are the flaws?

Page 31: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Pragmatically …A step-by-step how-to

31

Page 32: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Experimental Procedure:Typical example

• Identify research hypothesis• Specify the design of the study• Think about statistics *before* you run the

study• Run a pilot study• Recruit participants• Run the actual data collection sessions• Analyze the data• Report the results

32

Page 33: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Experimental Procedure:Typical example

• Identify research hypothesis• Specify the design of the study• Think about statistics *before* you run the

study• Run a pilot study • Recruit participants • Run the actual data collection sessions • Analyze the data• Report the results

33

Page 34: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Run a pilot study

• … to test the study design• … to test the system• … to test the study instruments

34

Page 35: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Recruit participants

• Reflecting the larger population?– in the best case yes– pragmatic decision though

• How many?– Depends on effect size and study design--power

of experiment– Usually 15+ (per group)– Note: much higher than for usability test (~5)

35

Page 36: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Run the actual data collection process• System and instruments ready?• Greet participants• Introduce purpose of study and procedure

– or deliberately don’t– Don’t bias: “compare my interface vs. this other interface”,

• Get consent of the participants– ethics!

• Assign participants to specific experiment condition– according to pre-defined randomization method

• Introduction to system(s) and/or training tasks• Participants complete the actual tasks

– take measures of dependent variables• Participants answer questionnaire (if any)• Debriefing session• Payment (if any).

– monetary, coupons, chocolate 36

Page 37: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Report the results

• Introduction / motivation• Study design• Results• Discussion• Conclusions • References / Appendix

• See, for instance, Saul Greenberg’s recommendation:– http://pages.cpsc.ucalgary.ca/~saul/hci_topics/

assignments/controlled_expt/ass1_reports.html37

Page 38: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Other Evaluation Methods

38

Page 39: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Field Studies

39

• Realism

• Reveal: “a richer understanding by using a more holistic approach” (Carpendale, 08)

Page 40: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Qualitative Methods

• Observation Techniques– fly-on-wall techniques– interruptions by observer

• Interview Techniques– contextual?

40

Page 41: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Qualitative Methods as “Add-on”

Often controlled experiment +• Experimenter Observations• Collecting Participants Opinions• Think-Aloud Protocol (be careful!)

Helpful for...• Usability Improvement (cf. HCI three weeks ago) • New insights, explanation of unforeseen results, new

questions• Can help to confirm results

41

Page 42: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Qualitative Methods as Primary

• Pre-design studies– Rich understanding of a complex domain– Problems, challenges, domain language

• During-, Post-design studies– Case studies/ Field studies

Helpful for...• holistic understanding

42

Page 43: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Qualitative Methods as Primary

• In Situ Observations• Participatory Observations• Laboratory Observational Studies• Contextual Interviews• Focus Groups

43

Page 44: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Qualitative Challenges

• Sample Sizes– Doing intensive studies with a lot of participants?– Time? Data produced?

• Subjectivity– Social relationship?

• Analyzing the data– Grounded theory – Open and axial coding

44

Page 45: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

New Ways of Evaluation

• Mechanical Turk (more and more popular)• Measuring brain activities• ...

45

Page 46: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Cool stuff from CHI 2015

46

Page 47: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Affordances++ (CHI 2015)

47

Page 48: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Fancy Hardware (CHI 2015)

48

Page 49: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Sustainability (CHI 2015)

49

Page 50: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

And skin again (CHI 2015)

50

Page 51: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Dance floor (CHI 2015)

51

Page 52: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Socializing with robots (CHI 2015)

52

Page 53: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Cool visualization stuff (CHI 2015)

53

Page 54: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Cool stuff from CHI 2014

54

Page 55: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Older people (CHI 2014)

55

Page 56: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Pervasive Design (CHI 2014)

56

Page 57: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Understanding human factors (CHI 2014)

57

Page 58: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Visualization (CHI 2014)

58

Page 59: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

59

Page 60: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

60

Page 61: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Even more videos from CHI 2014

61

Page 62: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Healthcare Studies at Healthcare Human Factors (HHF)

laboratory in Toronto

62

https://www.youtube.com/watch?v=WxQLzdLjwp4

Page 63: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

63

Page 64: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Cool Hardware Stuff

64

Page 65: Title Text - univie.ac.atvda.univie.ac.at/Teaching/HCI/15s/LectureNotes/10_LabStudies.pdf · Outline • Evaluation beyond usability tests • Controlled Experiments • Other Evaluation

Sustainability

65