Peter yeates youre certainly relatively competent amee 2012

Preview:

DESCRIPTION

Slides to accompany presentation at AMEE 2012. The presentation describes our recent research showing that assessors' judgements within medical education are importantly influence by recent experience through a process known as contrast effect.

Citation preview

peter.yeates@manchester.ac.uk..... @brainstormpete

You’re Certainly Relatively Competent

Judgmental relativity in performance assessments: the influence of recent experience on Mini-CEX score choices

Peter Yeates Karen MannPaul O’Neill Kevin Eva

peter.yeates@manchester.ac.uk @brainstormpete

peter.yeates@manchester.ac.uk.... @brainstormpete

Background

Mini-CEX assessments

peter.yeates@manchester.ac.uk.... @brainstormpete

Background

• Scores by assessors highly variable– 40% of observed score variance– Range 1-6 on 9 point scale

• Novel enquiry to understand judgement processes

• ? Assessors comparing trainees rather than criterion- referenced judgements

– Yeates, et al, 2012

Background

Assimilation

Contrast

Comparison could cause two possible effects:

Recent Experience of other trainees

No Influence

peter.yeates@manchester.ac.uk..... @brainstormpete

peter.yeates@manchester.ac.uk.... @brainstormpete

Research questions

1. Does recent observation of either “good” or “poor” performances influence assessors’ Mini-CEX scores?

– Assimilation / Contrast / No influence

2. If so, do other influences mediate this effect?

peter.yeates@manchester.ac.uk.... @brainstormpete

Methods

• Internet-based experimental design• Consultant physicians

– Nationwide recruitment (England & Wales)• Randomised to groups• Blinded to intervention

Group A

Group B

G1 G3G2

P1 P3P2

B1

B1 B2

B2

B3

B3

Intervention Comparison

scor

esc

ore

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

rANOVA

peter.yeates@manchester.ac.uk.... @brainstormpete

Results

• 41 participants completed– 32% female– 11 out of 14 postgrad deaneries– 13 different medical specialities

• Groups comparable at baseline on:– Gender: 35% vs 29% (non-sig)– Duration of consultancy: 13 yrs vs 8 yrs (p=0.03)

1

2

3

4

5

Intervention

Me

an

sc

ore

s

Group mean scores for intervention and comparison phases

Group A

Group B

Group B scored 0.67 higher on 6 point scale than group A (F (1, 39) = 12.0, p = 0.001) Cohen’s d = 0.63 (moderate effect)

1

2

3

4

5

Intervention Comparison

Me

an

sc

ore

s

Group mean scores for intervention and comparison phases

Group A

Group B

peter.yeates@manchester.ac.uk.... @brainstormpete

Methods Follow up study:

Group A

Group B

G1 G2 B1 B2sc

ore

scor

e

scor

e

scor

e

scor

e

scor

e

P1 P2

P1 P2 B1 B2

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

G1 G2

Descending proficiency

Ascending proficiency

peter.yeates@manchester.ac.uk.... @brainstormpete

Methods Follow up study:

Additional measures:

Memory Insight

“What percentage of trainees would do better?”Consider existing memory of trainees

“How confident do you feel about the scores you gave?”High conf, Low manipulation = insight

Results Follow up study:

*

*

Group A: Good to PoorGroup B: Poor to Good

Group A and B mean scores by level of performance

1

2

3

4

5

6

Good Borderline Poor

Level of performance

Mea

n s

core

s

Group A

Group B

Results Follow up study:

*

*

Group A: Good to PoorGroup B: Poor to Good

Group A and B mean scores by level of performance

1

2

3

4

5

6

Good Borderline Poor

Level of performance

Mea

n s

core

s

Group A

Group B

Results Follow up study:

*

*

Group A: Good to PoorGroup B: Poor to Good

F=9.80(1, 47), p=0.003, Cohen’s d=0.52

Group A and B mean scores by level of performance

1

2

3

4

5

6

Good Borderline Poor

Level of performance

Mea

n s

core

s

Group A

Group B

*

*

Results Follow up study:

Group A: Good to PoorGroup B: Poor to Good

F=16.0 (1, 46), p<0.001, Cohen’s d=0.67

Participants' "Percent Better" ratings by Level and Group

0

10

20

30

40

50

60

70

80

90

100

Good Borderline Poor

Per

cen

t B

ette

r ra

tin

gs

Group A

Group B

*

*

~

peter.yeates@manchester.ac.uk.... @brainstormpete

Results Follow up study:

Confidence ratings:– Uniformly high (median = 6 out of 7)– No variation by level– No significant interaction between group effect

and confidence(i.e. high and low confidence assessors just as

susceptible)

peter.yeates@manchester.ac.uk.... @brainstormpete

Discussion

Recent experience caused a Contrast Effect

Theoretical:• Competence based on relative rather than absolute

criteria• Robust despite appeals to long term memory• Assessors lack insight into susceptibility

Practical:• ? Fairness / safety of exams

peter.yeates@manchester.ac.uk..... @brainstormpete

References:Azar, et al, 2007 Journal of Socio-economics 36 : 1-14Gingerich, et al, 2011 Academic Medicine 86: s1-s7Ginsburg, et al,2010 Academic Medicine 85(5):780-786Govaerts , et al, 2011 Advances in Health Sciences Education 16(2): 151-65Kogan, et al, 2011 Medical Education 45 (10) 1048-60Mussweiler , et al, 2003 Psychological review 110 (3): 472-489Wedell, et al, 2005 Basic and Applied social Psychology 27(3) :213-28Yeates, et al, 2012 Advances in Health Sciences Education. On-line ahead of print

Questions ?

QR

peter.yeates@manchester.ac.uk..... @brainstormpete

Group A

Group B

G1 G3G2

P1 P3P2

B1

B1 B2

B2

B3

B3

Intervention Comparison

scor

esc

ore

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

scor

e

Hawk / Dove Index (HDI):Participant z-scoreHow far from the middle of the group?

Predictors:HDI, Group

Outcome

Recent experience vs. Hawk / Dove differences

peter.yeates@manchester.ac.uk.... @brainstormpete

Hawk / Dove Results

• Overall model explained 50% of observed score variance– r2 = 0.50, p <0.001

• “Hawk / Dove accounted for 18% – Change in r2 = 0.18, p = 0.006

• Group (recent-experience) then accounted for a further 24%– Change in r2 = 0.24, p <0.001

• As a result Group (recent experience) accounted for more score variation than assessors’ fixed Hawk/Dove differences

peter.yeates@manchester.ac.uk.... @brainstormpete

Video654321

Scor

e

6.00

5.00

4.00

3.00

2.00

1.00

41

109

272

242

279

42

271

194

158

274