44
Survey Methodology Reliability & Validity

Survey Methodology Reliability & Validity

  • Upload
    imaran

  • View
    97

  • Download
    0

Embed Size (px)

DESCRIPTION

Survey Methodology Reliability & Validity. Reference. The majority of this lecture was taken from How to Measure Survey Reliability & Validity by Mark Litwin , Sage Publications,1995 . Lecture objectives. To review the definitions of reliability and validity - PowerPoint PPT Presentation

Citation preview

Page 1: Survey Methodology Reliability  &  Validity

Survey MethodologyReliability & Validity

Page 2: Survey Methodology Reliability  &  Validity

Reference

The majority of this lecture was taken from How to Measure Survey Reliability & Validity by Mark Litwin, Sage Publications,1995.

Page 3: Survey Methodology Reliability  &  Validity

Lecture objectives• To review the definitions of

reliability and validity

• To review methods of evaluating reliability and validity in survey research

Page 4: Survey Methodology Reliability  &  Validity

Reliability

Page 5: Survey Methodology Reliability  &  Validity

Definition• The degree of stability exhibited when

a measurement is repeated under identical conditions.

• Lack of reliability may arise from divergences between observers or instruments of measurement or instability of the attribute being measured. (from Last. Dictionary of Epidemiology)

Page 6: Survey Methodology Reliability  &  Validity

Assessment of reliability

Reliability is assessed in 3 forms:–Test-retest reliability–Alternate-form reliability–Internal consistency reliability

Page 7: Survey Methodology Reliability  &  Validity

Test-retest reliability• Most common form in surveys• Measured by having the same

respondents complete a survey at two different points in time to see how stable the responses are.

• Usually quantified with a correlation coefficient (r value).

• In general, r values are considered good if r 0.70.

Page 8: Survey Methodology Reliability  &  Validity

Test-retest reliability

• If data are recorded by an observer, you can have the same observer make two separate measurements.

• The comparison between the two measurements is intra-observer reliability.

• What does a difference mean?

Page 9: Survey Methodology Reliability  &  Validity

Test-retest reliability• You can test-retest specific questions

or the entire survey instrument.• Be careful about test-retest with

items or scales that measure variables likely to change over a short period of time, such as energy, pain, happiness, anxiety.

• If you do it, make sure that you test-retest over very short periods of time.

Page 10: Survey Methodology Reliability  &  Validity

Test-retest reliability

• Potential problem with test-retest is the practice effect– Individuals become familiar with the

items and simply answer based on their memory of the last answer

• What effect does this have on your reliability estimates?

• It inflates the reliability estimate.

Page 11: Survey Methodology Reliability  &  Validity

Alternate-form reliability

• Use differently worded forms to measure the same attribute.

• Questions or responses are reworded or their order is changed to produce two items that are similar but not identical.

Page 12: Survey Methodology Reliability  &  Validity

Alternate-form reliability

• Be sure that the two items address the same aspect of behavior with the same vocabulary and the same level of difficulty– Items should differ in wording only

• It is common to simply change the order of the response alternatives– This forces respondents to read the

response alternatives carefully and thus reduces practice effect

Page 13: Survey Methodology Reliability  &  Validity

Example: Assessment of Depression

Version A:During the past 4 weeks, I have felt downhearted:

Every day 1Some days 2Never 3

Version B:During the past 4 weeks, I have felt downhearted:

Never 1Some days 2Every day 3 Notice the change in the ordinal

scaling of the choices.

Page 14: Survey Methodology Reliability  &  Validity

Alternate-form reliability• You could also change the wording

of the response alternatives without changing the meaning!

Page 15: Survey Methodology Reliability  &  Validity

Example: Assessment of urinary functionVersion A:During the past week, how often did you usually empty your bladder?

1 to 2 times per day3 to 4 times per day5 to 8 times per day12 times per dayMore than 12 times per day

Version B:During the past week, how often did you usually empty your bladder?

Every 12 to 24 hoursEvery 6 to 8 hoursEvery 3 to 5 hoursEvery 2 hoursMore than every 2 hours

Page 16: Survey Methodology Reliability  &  Validity

Alternate-form reliability• You could also change the actual

wording of the question– Be careful to make sure that the two

items are equivalent– Items with different degrees of

difficulty do not measure the same attribute

– What might they measure?• Reading comprehension or cognitive

function

Page 17: Survey Methodology Reliability  &  Validity

Example: Assessment of Loneliness

Version A:How often in the past month have you felt alone in the

world?Every daySome daysOccasionallyNever

Version B: During the past 4 weeks, how often have you felt a sense

of loneliness?All of the timeSometimesFrom time to timeNever

Page 18: Survey Methodology Reliability  &  Validity

Example of nonequivalent item rewording

Version A:When your boss blames you for something you did not do, how

often do you stick up for yourself?All the timeSome of the timeNone of the time

Version B:When presented with difficult professional situations where a

superior censures you for an act for which you are not responsible, how frequently do you respond in an assertive way?

All of the timeSome of the timeNone of the time

Page 19: Survey Methodology Reliability  &  Validity

Alternate-form reliability• You can measure alternate-form

reliability at the same timepoint or separate timepoints.

• Another method is to split the test in two, with the scores for each half of the test being compared with the other.

- This is called a split-halves method- You could also split into thirds and

administer three forms of the item, etc.

Page 20: Survey Methodology Reliability  &  Validity

Internal consistency reliability

• Applied not to one item, but to groups of items that are thought to measure different aspects of the same concept.

• Cronbach’s alpha (a) not to be confused w/ Type I

Error Measures internal consistency reliability among a group of items combined to form a single scale– It is a reflection of how well the different

items complement each other in their measurembent of different aspects of the same variable or quality

– Interpret like a correlation coefficient, a 0.70 is good.

Page 21: Survey Methodology Reliability  &  Validity

Cronbach’s alpha (a) Let,

then,

Page 22: Survey Methodology Reliability  &  Validity

Cronbach’s alpha (a)

• The variance of the “test” scores is the most important part of Cronbach’s a.

• The larger , the smaller the ratio which is then subtracted from 1 large a.

Page 23: Survey Methodology Reliability  &  Validity

Cronbach’s alpha (a)

• High alpha is good and high alpha is caused by high “test” variance.

• But why is high test variance good?– High variance means you have a wide spread of

scores, which means subjects are easier to differentiate.

– If a test has a low variance, the scores for the subjects are close together. Unless the subjects truly are close in their “ability”, the test is not useful.

Page 24: Survey Methodology Reliability  &  Validity

McMaster’s Family Assessment Device Question

Strongly Agree

Agree

Disagree

Strongly Disagree

1. Planning family activities is difficult because we misunderstand each other.

1 2 3 4

2. In times of crisis we can turn to each other for support.

1 2 3 4

3. We cannot talk to each other about the sadness we feel.

1 2 3 4

4. Individuals are accepted for what they are.

1 2 3 4

5. We avoid discussing our fears and concerns.

1 2 3 4

6. We can express feelings to each other.

1 2 3 4

7. There are lots of bad feelings in the family.

1 2 3 4

8. We feel accepted for what we are.

1 2 3 4

9. Making decisions is a problem in our family.

1 2 3 4

10. We are able to make decisions about how to solve problems.

1 2 3 4

11. We do not get along well with each other.

1 2 3 4

12. We confide in each other 1 2 3 4

The odd numbered questions are negative traits of family dynamics so for the purposes computing Cronbach’s a we need to reverse the scaling to 1 is strongly disagree and 4 is strongly agree.

Page 25: Survey Methodology Reliability  &  Validity

McMaster’s Family Assessment Device

All items on the survey are positively correlated, but we again it is important to note the negative traits of family dynamics were recoded so they would be positively correlated with the good family dynamic traits.

Page 26: Survey Methodology Reliability  &  Validity

McMaster’s Family Assessment Device

The McMaster’s Family Assessment Device has a very high degree of reliability using Cronbach’s a, a = .91. We also see Cronbach a’s for each question. What do these tell us?

Page 27: Survey Methodology Reliability  &  Validity

McMaster’s Family Assessment Device

What makes a question “good” or “bad”? This is usually measured by looking at how Cronbach’s a would change if the question were removed from the survey. Here we can see no one question in the McMaster’s instrument results in a large change in the overall a if it were moved. Question 12 results in the largest change in a, .9100 .8977, so we might consider it the “best”.

If an question’s deletion gives a higher overall a, then it could/should be removed from the survey.

Page 28: Survey Methodology Reliability  &  Validity

Calculation of Cronbach’s Alpha (a) with Dichotomous Question ItemsExample: Assessment of Emotional

Health During the past month: Yes NoHave you been a very nervous person? 1 0Have you felt downhearted and blue? 1 0Have you felt so down in the dumps that

nothing could cheer you up? 1 0

Note: Each question is dichotomous (Y/N) or (T/F) coded as 1 for Yes and 0 for No.

Page 29: Survey Methodology Reliability  &  Validity

Hypothetical Survey Results

Patient

Item 1

Item 2

Item 3

Summed scale score

1 0 1 1 2

2 1 1 1 3

3 0 0 0 0

4 1 1 1 3

5 1 1 0 2

Percentage positive

3/5=.6

4/5=.8

3/5=.6

Page 30: Survey Methodology Reliability  &  Validity

CalculationsMean score = 2

Sample variance =

We conclude that this scale has good reliability.

𝛼=[1−∑ (% 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 )𝑖 (𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 )𝑖𝑠2 ]( 𝑘

𝑘−1 )

𝛼=[1− ( .6 ) ( .4 )+( .8 ) ( .2 )+( .6 )(.4)𝑠2 ]( 3

2 )=0.86

Page 31: Survey Methodology Reliability  &  Validity

Internal consistency reliability

• If internal consistency is low you can add more items or re-examine existing items for clarity

Page 32: Survey Methodology Reliability  &  Validity

Interobserver reliability• How well two evaluators agree in

their assessment of a variable• Use correlation coefficient to

compare data between observers• May be used as property of the

test or as an outcome variable.• Cohen’s k

Page 33: Survey Methodology Reliability  &  Validity

Validity

Page 34: Survey Methodology Reliability  &  Validity

Definition• How well a survey measures what

it sets out to measure.• Mishel Uncertainty of Illness

Survey (MUIS) measures uncertainty associated with illness.

• McMaster’s Family Assessment Device measures family “functioning”.

Page 35: Survey Methodology Reliability  &  Validity

Assessment of validity• Validity is measured in four forms

– Face validity– Content validity– Criterion validity– Construct validity

Page 36: Survey Methodology Reliability  &  Validity

Face validity• Cursory review of survey items by

untrained judges– Ex: Showing the survey to untrained

individuals to see whether they think the items look okay

– Very casual, soft– Many don’t really consider this as a

measure of validity at all

Page 37: Survey Methodology Reliability  &  Validity

Content validity• Subjective measure of how

appropriate the items seem to a set of reviewers who have some knowledge of the subject matter.– Usually consists of an organized

review of the survey’s contents to ensure that it contains everything it should and doesn’t include anything that it shouldn’t

– Still very qualitative

Page 38: Survey Methodology Reliability  &  Validity

Content validity• Who might you include as

reviewers?• How would you incorporate these

two assessments of validity (face and content) into your survey instrument design process?

Page 39: Survey Methodology Reliability  &  Validity

Criterion validity• Measure of how well one

instrument stacks up against another instrument or predictor– Concurrent: assess your instrument

against a “gold standard”– Predictive: assess the ability of your

instrument to forecast future events, behavior, attitudes, or outcomes.

– Assess with correlation coefficient

Page 40: Survey Methodology Reliability  &  Validity

Construct validity• Most valuable and most difficult

measure of validity.• Basically, it is a measure of how

meaningful the scale or instrument is when it is in practical use.

Page 41: Survey Methodology Reliability  &  Validity

Construct validity• Convergent: Implies that several

different methods for obtaining the same information about a given trait or concept produce similar results– Evaluation is analogous to alternate-

form reliability except that it is more theoretical and requires a great deal of work-usually by multiple investigators with different approaches.

Page 42: Survey Methodology Reliability  &  Validity

Construct validity• Divergent: The ability of a

measure to estimate the underlying truth in a given area-must be shown not to correlate too closely with similar but distinct concepts or traits.

Page 43: Survey Methodology Reliability  &  Validity

Summary• Reliability refers to the consistency of

the results of survey. High reliability is important but NOT unless the test is also valid.

• For example, a bathroom scale that consistently measures your weight BUT is reality is 10 lbs. off your actual weight is useless (but possibly flattering).

Page 44: Survey Methodology Reliability  &  Validity

Summary• Validity refers to whether or not the

instrument measures what is supposed to be measuring.

• Much harder to establish and requires scrutinizing the instrument a number of ways.