72
Validity and reliabili In Research

Validity and reliability

  • Upload
    zasha

  • View
    97

  • Download
    4

Embed Size (px)

DESCRIPTION

Validity and reliability. In Research. Agenda. 1. 2. 3. AT the end of this lesson, you should be able to:. Discuss validity. Discuss reliability. Discuss validity in qualitative research. Discuss validity in experimental design . Discuss how to achieve validity and reliability. 4. 5. - PowerPoint PPT Presentation

Citation preview

Page 1: Validity and reliability

Validity and reliabilityIn Research

Page 2: Validity and reliability

•AgendaAT the end of this lesson, you should be able to:

Discuss validity

Discuss reliability

Discuss validity in qualitative research

Discuss validity in experimental design

1

2

5

3

4

Discuss how to achieve validity and reliability

Page 3: Validity and reliability

The consistency of scores or answers from one administration of an instrument to another, or from one set of items to another.

A reliable instrument yields similar results if given to a similar population at different times.

Reliability

Page 4: Validity and reliability

Appropriateness, meaningfulness, correctness, and usefulness of inferences a researcher makes.

Validity of ?? Instrument? Data?

Validity

Page 5: Validity and reliability

Validity• Internal validity is the extent to which research findings are free

from bias and effects• External validity is the extent to which the findings can be

generalised

Page 6: Validity and reliability

• Content-related evidence of validity focuses on the content and format of an instrument.

• Is it appropriate? • Comprehensive? • Is it logical? • How do the items or questions represent the content? Is the

format appropriate?

Validity - Content-related evidence

Page 7: Validity and reliability

This refers to the relationship between the scores obtained using the instrument and the scores obtained using one or more other instruments or measures. For example, are students’ scores on teacher made tests consistent with their scores on standardized tests in the same subject areas?

Validity - Criterion-related evidence

Page 8: Validity and reliability

Construct validity is defined as “establishing correct operational measures for the concepts being studied” (Yin, 1984).

For example, if one is looking at problem solving in leaders, how well does a particular instrument explain the relationship between being able to problem solve and effectiveness as a leader.

Validity - Construct-related evidence

Page 9: Validity and reliability

ATTAINING VALIDITY AND RELIABILITY

Page 10: Validity and reliability

Adequacy : the size and scope of the questions must be large enough to cover the topic.

Format of the instrument: Clarity of printing, type size, adequacy of work area, appropriateness of language, clarity of directions, etc.

Elements of content-related evidence

Page 11: Validity and reliability

Consult other experts who rate the items. Rate items, eliminating or changing those that do not meet the

specified content. Repeat until all raters agree on the questions and answers.

How to achieve content validity

Page 12: Validity and reliability

To obtain criterion-related validity, researchers identify a characteristic, assess it using one instrument (e.g., IQ test) and compare the score with performance on an external measure, such as GPA or an achievement test.

Criterion-related validity

Page 13: Validity and reliability

A validity coefficient is obtained by correlating a set of scores on one test (a predictor) with a set of scores on another (the criterion).

The degree to which the predictor and the criterion relate is the validity coefficient. A predictor that has a strong relationship to a criterion test would have a high coefficient.

Validity coefficient

Page 14: Validity and reliability

This type of validity is more typically associated with research studies than testing.

It relates to psychological traits, so multiple sources are used to collect evidence. Often times a combination of observation, surveys, focus groups, and other measures are used to identify how much of the trait being measured is possessed by the observee.

Construct-related validity

Proactive Coping Skills

Page 15: Validity and reliability

The consistency of scores obtained from one instrument to another, or from the same instrument over different groups.

Reliability

Page 16: Validity and reliability

Every test or instrument has associated with its errors of measurement.

These can be due to a number of things: testing conditions, student health or motivation, test anxiety, etc.

Test developers work hard to try to ensure that their errors are not grounded in flaws with the test itself.

Errors of measurement

Page 17: Validity and reliability

Test-retest: Same test to same group Equivalent-forms: A different form of the same instrument is

given to the same group of individuals Internal consistency: Split-half procedure Kuder-Richardson: Mathematically computes reliability from

the # of items, the mean, and the standard deviation of the test.

Reliability Methods

Page 18: Validity and reliability

• Reliability coefficient - a number that tells us how likely one instrument is to be consistent over repeated administrations

• Alpha or Cronbach’s alpha • used on instruments where answers aren’t scored “right” and “wrong”.

It is often used to test the reliability of survey instruments.

Reliability coefficient

Page 19: Validity and reliability

This is a calculation that shows the extent to which a measurement would vary under changed circumstances. In other words, it tells you how much of the error is due to issues related to measuring.

Standard error of the measurement

Page 20: Validity and reliability

INTERNAL VALIDITY

Page 21: Validity and reliability

•Validity• Validity can be used in three ways.

• instrument or measurement validity• external or generalization validity• Internal validity, which means that what a

researcher observes between two variables should be clear in its meaning rather than due to something that is unclear (“something else”)

Page 22: Validity and reliability

• Any one (or more) of these conditions:• Age or ability of subjects• Conditions under which the study was conducted• Type of materials used in the study• Technically, the “something else” is called a threat to internal validity.

What is “something else”?

Page 23: Validity and reliability

• Subject characteristics• Loss of subjects• Location• Instrumentation• Testing• History• Maturation• Attitude of subjects• Implementation

Threats to internal validity

Page 24: Validity and reliability

•Subject characteristics• Subject characteristics can pose a threat if there is selection bias, or if there are unintended factors present within or among groups selected for a study. For example, in group studies, members may differ on the basis of age, gender, ability, socioeconomic background, etc. They must be controlled for in order to ensure that the key variables in the study, not these, explain differences.

Page 25: Validity and reliability

•Subject characteristics• Age Intelligence• Strength Vocabulary• Maturity Reading ability• Gender Fluency• Ethnicity Manual dexterity• Coordination Socioeconomic status• Speed Religious/political belief

Page 26: Validity and reliability

•Loss of subjects (mortality)• Loss of subjects limits generalizability, but it can also

affect internal validity if the subjects who don’t respond or participate are over represented in a group.

Page 27: Validity and reliability

•Location• The place where data collection occurs, aka “location” might pose a threat. For example, hot, noisy, unpleasant conditions might affect test scores; situations where privacy is important for the results, but where people are streaming in and out of the room, might pose a threat.

Page 28: Validity and reliability

• Decay: If the nature of the instrument or the scoring procedure is changed in some way, instrument decay occurs.

• Data Collector Characteristics: The person collecting data can affect the outcome.

• Data Collector Bias: The data collector might hold an opinion that is at odds with respondents and it affects the administration.

Instrumentation

Page 29: Validity and reliability

• In longitudinal studies, data are often collected through more than one administration of a test.

• If the previous test influences subsequent ones by getting the subject to engage in learning or some other behavior that he or she might not otherwise have done, there is a testing threat.

Testing

Page 30: Validity and reliability

• If an unanticipated or unplanned event occurs prior to a study or intervention, there might be a history threat.

History

Page 31: Validity and reliability

• Sometimes the very fact of being studied influences subjects. The best known example of this is the Hawthorne Effect.

Attitude of subjects

Page 32: Validity and reliability

• This threat can be caused by various things; different data collectors, teachers, conditions in treatment, method bias, etc.

Implementation

Page 33: Validity and reliability

• Standardize conditions of study• Obtain more information on subjects• Obtain as much information on details of the study: location,

history, instrumentation, subject attitude, implementation• Choose an appropriate design• Train data collectors

Minimizing Threats

Page 34: Validity and reliability

Qualitative ResearchValidity and reliability??

Page 35: Validity and reliability

• Many qualitative researchers contend that validity and reliability are irrelevant to their work because they study one phenomenon and don’t seek to generalize

• Fraenkel and Wallen - any instrument or design used to collect data should be credible and backed by evidence consistent with quantitative studies.

• Trustworthiness

•Qualitative research.

Page 36: Validity and reliability

•Quantitative vs. Qualitative

Traditional Criteria for Judging Quantitative Research

Alternative Criteria for Judging Qualitative Research

Internal validity CredibilityExternal validity TransferabilityReliability DependabilityObjectivity Confirmability

Page 37: Validity and reliability

In qualitative research• Reliability pertained to the extent to which the study is

replicable and how accurate the research methods and the techniques used to produce data

• Objectivity of the researcher - researcher must look at her bias and preconceived notions of what she will find before she begins her research.

• Objectivity of the interviewee

Page 38: Validity and reliability

• Triangulation• Member check• Audit trail

In qualitative research

Page 39: Validity and reliability

Let’s look at one particular designValidity in experimental research

Page 40: Validity and reliability

Experimental Designs Should be Developed to Ensure Internal and External Validity of the Study

Page 41: Validity and reliability

Internal Validity:

• Are the results of the study (DV) caused by the factors included in the study (IV) or are they caused by other factors (EV) which were not part of the study?

Page 42: Validity and reliability

(Selection Bias/Differential Selection) -- The groups may have been different from the start. If you were testing instructional strategies to improve reading and one group enjoyed reading more than the other group, they may improve more in their reading because they enjoy it, rather than the instructional strategy you used.

Subject Characteristics

Threats to Internal Validity

Page 43: Validity and reliability

(Mortality) -- All of the high or low scoring subject may have dropped out or were missing from one of the groups. If we collected posttest data on a day when the debate society was on field trip , the mean for the treatment group would probably be much lower than it really should have been.

Loss of Subjects

Threats to Internal Validity

Page 44: Validity and reliability

Perhaps one group was at a disadvantage because of their location.  The city may have been demolishing a building next to one of the schools in our study and there are constant distractions which interfere with our treatment.

Location

Threats to Internal Validity

Page 45: Validity and reliability

The testing instruments may not be scores similarly. Perhaps the person grading the posttest is fatigued and pays less attention to the last set of papers reviewed. It may be that those papers are from one of our groups and will received different scores than the earlier group's papers

Threats to Internal Validity

Instrumentation Instrument Decay

Page 46: Validity and reliability

The subjects of one group may react differently to the data collector than the other group. A male interviewing males and females about their attitudes toward a type of math instruction may not receive the same responses from females as a female interviewing females would.

Threats to Internal Validity

Data Collector Characteristics

Page 47: Validity and reliability

The person collecting data my favors one group, or some characteristic some subject possess, over another. A principal who favors strict classroom management may rate students' attention under different teaching conditions with a bias toward one of the teaching conditions.

Threats to Internal Validity

Data Collector Bias

Page 48: Validity and reliability

The act of taking a pretest or posttest may influence the results of the experiment. Suppose we were conducting a unit to increase student sensitivity to racial prejudice. As a pretest we have the control and treatment groups watch a movie on racism and write a reaction essay.

The pretest may have actually increased both groups' sensitivity and we find that our treatment groups didn't score any higher on a posttest given later than the control group did. If we hadn't given the pretest, we might have seen differences in the groups at the end of the study.

Threats to Internal Validity

Testing

Page 49: Validity and reliability

Something may happen at one site during our study that influences the results. Perhaps a classmate was injured in a car accident at the control site for a study teaching children bike safety. The control group may actually demonstrate more concern about bike safety than the treatment group.

Threats to Internal Validity

History

Page 50: Validity and reliability

There may be natural changes in the subjects that can account for the changes found in a study. A critical thinking unit may appear more effective if it taught during a time when children are developing abstract reasoning.

Threats to Internal Validity

Maturation

Page 51: Validity and reliability

The subjects may respond differently just because they are being studied. The name comes from a classic study in which researchers were studying the effect of lighting on worker productivity. As the intensity of the factory lights increased, so did the worker productivity. One researcher suggested that they reverse the treatment and lower the lights. The productivity of the workers continued to increase. It appears that being observed by the researchers was increasing productivity, not the intensity of the lights.

Threats to Internal Validity

Hawthorne Effect

Page 52: Validity and reliability

One group may view that it is in competition with the other group and may work harder than they would under normal circumstances. This generally is applied to the control group "taking on" the treatment group.

Threats to Internal Validity

John Henry Effect

Page 53: Validity and reliability

The control group may become discouraged because it is not receiving the special attention that is given to the treatment group. They may perform lower than usual because of this.

Threats to Internal Validity

Resentful Demoralization of the Control Group

Page 54: Validity and reliability

(Statistical Regression) -- A class that scores particularly low can be expected to score slightly higher just by chance. Likewise, a class that scores particularly high, will have a tendency to score slightly lower by chance. The change in these scores may have nothing to do with the treatment.

Threats to Internal Validity

Regression

Page 55: Validity and reliability

The treatment may not be implemented as intended. A study where teachers are asked to use student modeling techniques may not show positive results, not because modeling techniques don't work, but because the teacher didn't implement them or didn't implement them as they were designed.

Threats to Internal Validity

Implementation

Page 56: Validity and reliability

Threats to Internal Validity

Compensatory Equalization of Treatment

Someone may feel sorry for the control group because they are not receiving much attention and give them special treatment. For example, a researcher could be studying the effect of laptop computers on students' attitudes toward math. The teacher feels sorry for the class that doesn't have computers and sponsors a popcorn party during math class. The control group begins to develop a more positive attitude about mathematics.

Page 57: Validity and reliability

Experimental Treatment Diffusion

Threats to Internal Validity

Sometimes the control group actually implements the treatment. If two different techniques are being tested in two different third grades in the same building, the teachers may share what they are doing. Unconsciously, the control may use of the techniques she or he learned from the treatment teacher.

Page 58: Validity and reliability

Once the researchers are confident that the outcome (dependent variable) of the experiment they are designing is the result of their treatment (independent variable) [internal validity], they determine for which people or situations the results of their study apply [external validity].

Page 59: Validity and reliability

External Validity:

• Are the results of the study generalizable to other populations and settings?

• Population• Ecological

Page 60: Validity and reliability

...the extent to which one can generalize from the study sample to a defined population-- If the sample is drawn from an accessible population, rather than the target population, generalizing the research results from the accessible population to the target population is risky.

Threats to External Validity (Population)

Population Validity is the extent to which the results of a study can be generalized from the specific sample that was studied to a larger group of subjects. It involves...

Page 61: Validity and reliability

Ecological Validity is the extent to which the results of an experiment can be generalized from the set of environmental conditions created by the researcher to other environmental conditions (settings and conditions).

Threats to External Validity (Ecological)

There are 10 common threats to external validity.

Page 62: Validity and reliability

(not sufficiently described for others to replicate) If the researcher fails to adequately describe how he or

she conducted a study, it is difficult to determine whether the results are applicable to other

settings.

Threats to External Validity (Ecological)

Explicit description of the experimental

treatment

Page 63: Validity and reliability

(catalyst effect)If a researcher were to apply several treatments, it is difficult to determine how well each of the treatments would work individually. It might be that only the combination of the treatments is effective.

Threats to External Validity (Ecological)

Multiple-treatment interference

Page 64: Validity and reliability

(attention causes differences)Subjects perform differently because they know they are being studied. "...External validity of the experiment is jeopardized because the findings might not generalize to a situation in which researchers or others who were involved in the research are not present" (Gall, Borg, & Gall, 1996, p. 475)

Threats to External Validity (Ecological)

Hawthorne effect

Page 65: Validity and reliability

Threats to External Validity (Ecological)

(anything different makes a difference)A treatment may work because it is novel and the subjects respond to the uniqueness, rather than the actual treatment. The opposite may also occur, the treatment may not work because it is unique, but given time for the subjects to adjust to it, it might have worked.

Novelty and disruption effect

Page 66: Validity and reliability

(it only works with this experimenter)The treatment might have worked because of the person implementing it. Given a different person, the treatment might not work at all.

Threats to External Validity (Ecological)

Experimenter effect

Page 67: Validity and reliability

(pretest sets the stage)A treatment might only work if a pretest is given. Because they have taken a pretest, the subjects may be more sensitive to the treatment. Had they not taken a pretest, the treatment would not have worked.

Threats to External Validity (Ecological)

Pretest sensitization

Page 68: Validity and reliability

(posttest helps treatment "fall into place")The posttest can become a learning experience. "For example, the posttest might cause certain ideas presented during the treatment to 'fall into place' “ . If the subjects had not taken a posttest, the treatment would not have worked.

Threats to External Validity (Ecological)

Posttest sensitization

Page 69: Validity and reliability

Interaction of history and treatment effect

Threats to External Validity (Ecological)

(...to everything there is a time...)Not only should researchers be cautious about generalizing to other population, caution should be taken to generalize to a different time period. As time passes, the conditions under which treatments work change.

Page 70: Validity and reliability

(maybe only works with M/C tests)A treatment may only be evident with certain types of measurements. A teaching method may produce superior results when its effectiveness is tested with an essay test, but show no differences when the effectiveness is measured with a multiple choice test.

Threats to External Validity (Ecological)

Measurement of the dependent variable

Page 71: Validity and reliability

Interaction of time of measurement and treatment effect

Threats to External Validity (Ecological)

(it takes a while for the treatment to kick in)It may be that the treatment effect does not occur until several weeks after the end of the treatment. In this situation, a posttest at the end of the treatment would show no impact, but a posttest a month later might show an impact.

Page 72: Validity and reliability

NEXT WEEKConsultation