42
How does health psychology measure up? A critical look at measurement in health psychology Matthew Hankins 16th September 2011

How does health psychology measure up

Embed Size (px)

DESCRIPTION

Slides from a 20 minute presentation given at the Division of Health Psychology Annual Conference (Southampton 2011).

Citation preview

Page 1: How does health psychology measure up

How does health psychology measure up?

A critical look at measurement in health psychology

Matthew Hankins16th September 2011

Page 2: How does health psychology measure up

2

The empirical basis of Health Psychology• Why do Health Psychologists collect data?

– Theory generation, esp. identifying constructs– Theory corroboration – Measuring outcomes (trials etc.)

• The value of such activities is therefore critically dependent on the quality of the data

Page 3: How does health psychology measure up

3

Questionnaire measures• Majority of data collected by Health Psychologists

is generated by questionnaire measures (‘scales’)

• Questionnaires vary in the quality of data that they generate

– Validity: extent to which the questionnaire measures what is intended

– Reliability: extent to which variance in data reflects variance in construct measured

• Index of measurement error

Page 4: How does health psychology measure up

4

Pragmatic approach• Validity

– Unidimensionality (factor analysis)– Associations between measures– Discrimination between known groups

• Reliability

– Estimated by Cronbach’s Alpha– Or test-retest correlation

Page 5: How does health psychology measure up

5

Scale development• Combination of these approaches is derived from

‘Classical Test Theory’ (CTT)

– Originated with Spearman (1904)– Landmark text: Guilford 2nd ed. (1954) – Fully developed by Lord & Novick (1968)

• Further developments: ‘item-response theory’ (IRT)

– E.g Rasch model (1960)

• CTT implicit in most empirical Health Psychology research

Page 6: How does health psychology measure up

6

CTT vs. IRT• Argument tends to be that IRT is superior to CTT

• In particular, it is argued that IRT is ‘objective’ measurement

• For large samples, differences more apparent than real:

– Strong correlations between CTT data & IRT data

• And differences tend to be smaller than the margin of error

– If data treated as ordinal, perfect correlation between CTT & Rasch data

Page 7: How does health psychology measure up

7

What is a scale?• A scale orders people on the construct of interest

• Both CTT & IRT agree that a person’s position on the dimension can be estimated from the item scores

• Strength of IRT is that it does not assume that a set of correlated items forms a scale

• Implicit in CTT: if items load on same factor, we automatically assume that they form a scale

Construct

Low Person A Person B Person C Person D High

Page 8: How does health psychology measure up

8

Scaling problem• Whether a set of items forms a scale is a hypothesis

(Guttman 1950)

– Formally tested whether items formed ‘Guttman scales’

• “In contemporary psychometric practice, it is the rule rather than the exception that two people having the same score on a test will have [endorsed]different items…Such scores are crude empirical devices known to have some predictive efficiency, but they cannot be called measurements in any strict sense” (Loevinger 1948)

• Additionally, there is no rational basis for adding up a set of ordinal Likert scores unless they have been shown to scale

Page 9: How does health psychology measure up

9

Example: PHQ-9• Feeling tired + Little interest in doing things +

Poor appetite several days in last 2 weeks

– Scale score = +3

• Thoughts of hurting yourself in some way nearly every day in last 2 weeks

– Scale score = +3

• Are these responses really equivalent?

Page 10: How does health psychology measure up

10

Implications• If a set of items are assumed to form a scale, then

we cannot be sure that the scale score accurately ranks people on the construct of interest

– People with different positions may be assigned the same score

– People with the same position may be assigned different scores

• Unless we test the hypothesis, assessing reliability & validity is pointless

Page 11: How does health psychology measure up

11

What we would like: interval scales

What we think we have: ordinal scales

What we probably have: disordered categories

A scale that cannot rank-order people is not a scale

Disordered categories

Page 12: How does health psychology measure up

12

Item ‘difficulty’ (intensity)• The problem arises because CTT does not account

for item difficulty or intensity

• Some items are endorsed at low levels of the construct

– ‘Low intensity item’– Endorsement may indicate low or high level of construct

• Some items are endorsed at high levels of the construct

– ‘High intensity item’– Endorsement indicates high level of construct

Page 13: How does health psychology measure up

13

Example: PHQ-9• Feeling tired on several days is a low intensity item

– Endorsed at low level of depression– But may also be endorsed at higher levels of

depression

Depression

Low Yes Yes Yes Yes High

Page 14: How does health psychology measure up

14

Example: PHQ-9• Thoughts of hurting yourself in some way nearly

every day in last 2 weeks is a high intensity item

– Endorsed at high level of depression– But not endorsed at lower levels of depression

Depression

Low No No No Yes High

Page 15: How does health psychology measure up

15

How CTT fails to deal with item intensityFactor analysis groups items of similar intensity

• Factor analysis of a unidimensional construct will produce more than one ‘factor’

• These ‘factors’ are simply sets of items with similar intensities

Page 16: How does health psychology measure up

16

Example: GHQ-12

• Example: GHQ-12

• Many studies report 2- or 3-factor solutions

• ‘Factors’ simply group items by intensity

Psychiatric morbidity

Low High7 4 5 2 6 10 111 12 98 3

Page 17: How does health psychology measure up

17

How CTT fails to deal with item intensitySelecting items on basis of factor analysis exacerbates problem, but simultaneously conceals it

• Items are selected on basis of similar intensities, creating scales with limited range but high reliability

Psychiatric morbidity

Low High7 4 5 2 6 10 111 12 98 3

Low High

7 41 128 3

Psychiatric morbidity

Page 18: How does health psychology measure up

18

Why Rasch modelling is not the answer• Rasch modelling explicitly takes into account item

intensities

– Stochastic Guttman scale

• Additionally claims to produce interval scaling & ‘objective’ measurement

• Increasingly popular in Health Psychology

Page 19: How does health psychology measure up

19

Problems• Rasch models require very large samples to allow

estimation of person and item parameters

• Very strong assumptions, e.g. logistic item-response curve

• The data must fit the model, not the other way round

– Discards useful data to fit arbitrary assumptions

• Interval scaling is questionable gain if psychological constructs are not quantitative in the first place

Page 20: How does health psychology measure up

20

Non-parametric IRT (NPIRT)• E.g. Mokken (1971)

• Takes into account item intensities

– Stochastic Guttman scale

• Claims only to rank order people

• Very weak assumptions

– Retains data

• Complements CTT

– Uses simple scale score

Page 21: How does health psychology measure up

21

Page 22: How does health psychology measure up

22

PROMIS project• NIH funded project since 2004 ($100m)

• Establish a domain framework and develop candidate items for adult and paediatric Patient Reported Outcome Measures

• Questionnaires developed using published methodology

• Scaling methods include NPIRT and Graded Response Model (GRM)

Page 23: How does health psychology measure up

23

Summary• The credibility of Health Psychology research &

practice rests on its empirical evidence base

• This evidence base relies on the quality of questionnaire data

• The quality of questionnaire data may be compromised by the use of inappropriate methods

• We should stop relying on factor analysis & reliability coefficients & test the hypothesis that a set of items constitutes a scale

Page 24: How does health psychology measure up

Examples of NPIRT

Page 25: How does health psychology measure up

• Mokken (1971) proposed two models

– Monotone homogeneity model (MH)– Doubly monotone model (DM)

• Scales fitting the MH model rank order people on the attribute of interest

• Corollary is that scales not fitting the MH model do not rank order people on the attribute of interest

Page 26: How does health psychology measure up

• Select items for the scale based on homogeneity

• Assess whether the resulting scale fits the MH model

• Scaling procedure and the MH model based on the following minimal assumptions:

– For all items, if person A has a higher degree of X than person B, A’s probability of endorsing an item will be equal to or higher than B’s

– Local independence: item scores are uncorrelated for the same degree of attribute

Page 27: How does health psychology measure up

• If the purpose of the scale is to rank order people on a given attribute then the scale must be monotone homogenous

• Probability of item being endorsed must be monotone nondecreasing against attribute

• i.e. probability of item endorsement does not decrease with an increase in the measured attribute

* - as estimated from the remaining items of the scale

Page 28: How does health psychology measure up

For this GHQ-12 item the probability of endorsement reaches 50% at a low level of psychological distress

It is therefore a low intensity item: people endorsing this item are signalling a low level of distress

Note that probability (Y-axis) increases with increase in class score (X-axis)

Page 29: How does health psychology measure up

For this GHQ-12 item the probability of endorsement reaches 50% at a high level of psychological distress

It is therefore a high intensity item: people endorsing this item are signalling a high level of distress

Note that probability (Y-axis) also increases with increase in class score (X-axis), but curves:

(a)Do not have the same slope

(b)Are not required to have the same shape

Page 30: How does health psychology measure up

• If two items belong to a unidimensional scale, then:

– Endorsing the more intense item entails that the less intense item also be endorsed

– Endorsing the less intense item does not entail that the more intense item be endorsed

• For a Guttman scale, these are deterministic statements

• For a Mokken scale, these are probabilistic statements

Page 31: How does health psychology measure up

• A Guttman error occurs when the more intense item is endorsed but not the less intense item

• Too many Guttman errors imply that items are not measuring the same attribute

More intense item

Less intense item

Page 32: How does health psychology measure up

• This asymmetrical relationship between item pairs can be summarised with Loevinger’s H

– H is the coefficient of homogeneity between two items i and j

• Ranges from 0.0 to 1.0

– 0.0 indicates no association between items– 1.0 indicates perfect association, given the differences in item

intensity– 1.0 also indicates no Guttman errors

• Mokken (1971) developed H for scale development

– Hij : Homogeneity of pair of items

– Hi : Homogeneity of item i with all items

– H : Homogeneity of scale

Page 33: How does health psychology measure up

• All Hij > 0

• Start with item pair with highest Hij

• Select third item to maximise scale H

• Proceed until H reaches threshold value c

• Produces a unidimensional scale– c = 0.3; weak scale– c = 0.4; medium scale– c = 0.5; strong scale– c = 1.0; perfect Guttman scale

Page 34: How does health psychology measure up

Results for GHQ-12

Step Item Scale H1 p6d 0.791 n4d 0.792 n6d 0.733 n5d 0.684 n2d 0.645 n3d 0.616 p5d 0.597 p3d 0.578 p4d 0.559 n1d 0.5310 p2d 0.5111 p1d 0.50

• => the items of the GHQ-12 form a strong unidimensional scale

Page 35: How does health psychology measure up

Monotone homogeneity model: GHQ-12

Item H #vi maxvi zmax #zsig

p1d 0.44 0 0.00 0.00 0

n1d 0.45 0 0.00 0.00 0

p2d 0.43 1 0.06 0.99 0

p3d 0.50 0 0.00 0.00 0

n2d 0.55 0 0.00 0.00 0

n3d 0.51 0 0.00 0.00 0

p4d 0.47 0 0.00 0.00 0

p5d 0.50 1 0.05 0.90 0

n4d 0.56 0 0.00 0.00 0

n5d 0.50 0 0.00 0.00 0

n6d 0.56 1 0.05 0.93 0

p6d 0.53 1 0.04 0.68 0

• Small deviations from MH model but none significant

Page 36: How does health psychology measure up
Page 37: How does health psychology measure up
Page 38: How does health psychology measure up

Conclusion

• The GHQ-12 is a strongly homogenous unidimensional scale

• Small deviations from monotone homogeneity, none significant

• The GHQ-12 summed score can rank order people by the measured attribute

• i.e. it can serve as an ordinal measure of severity of psychiatric impairment

• Compare to results of EFA/CFA studies

Page 39: How does health psychology measure up

Example: Northwick Park dependency scale

• Item selection from pool of 16 items

Item Scale H

Q8 0.93

Q5 0.93

Q9 0.93

Q2 0.91

Q1 0.88

Q13 0.87

Q7 0.84

Q12 0.82

Q6 0.79

Q14 0.76

Q4 0.74

Q3 0.70

Q11 0.67

Q15 0.62

• 14 items form unidimensional scale

Page 40: How does health psychology measure up

• Two items with serious violations of monotone homogeneity

Item H #vi maxvi zmax #zsig

Q3 0.45 6 0.25 2.88 4

Q11 0.32 5 0.28 3.43 2

Q3: help required using toilet (urination)

Q11: help required with drinking

Page 41: How does health psychology measure up
Page 42: How does health psychology measure up

• These items decrease in probability at the top end of the scale

• With extreme dependency, patients require less help with drinking and emptying bladder– Because at this extreme, they are more likely to be

tube-fed and catherised • Hence, for these items, probability of

endorsement decreases as dependency increases– Scale is not monotone homogenous

• The summed score will not rank order people on the measured attribute