Download pdf - PPT Valid Reliable

8/9/2019 PPT Valid Reliable

1/16

1

VALIDITYVALIDITY

ANDANDRELIABILITYRELIABILITY

Delivered By:Delivered By:

Assoc. Prof. Dr. Othman Md.JohanAssoc. Prof. Dr. Othman Md.Johan


2/16

2

Primary Criteria Of EvaluationPrimary Criteria Of Evaluation

Two primary criteria of evaluation of the inTwo primary criteria of evaluation of the inany measurement or observation are:any measurement or observation are:

Whether we are measuring what we intend toWhether we are measuring what we intend tomeasure.measure.

Whether the same measurement processWhether the same measurement process

yields the same results.yields the same results. These two concepts are validity and reliability.These two concepts are validity and reliability.


3/16

3

ReliabilityReliability

Reliability is concerned with questions ofReliability is concerned with questions ofstability and consistencystability and consistency

-- does the same measurement tool yielddoes the same measurement tool yieldstable and consistent results whenstable and consistent results whenrepeatedrepeated over time.over time.

-- eg. a tape measure is a highly reliableeg. a tape measure is a highly reliablemeasuring instrument.measuring instrument.


4/16

4

ValidityValidity

Validity refers to the extent we are measuringValidity refers to the extent we are measuringwhat we hope to measure (and what we thinkwhat we hope to measure (and what we thinkwe are measuring).we are measuring).

Eg. A tape measure that has been createdEg. A tape measure that has been createdwith accurate spacing for inches, feet, etc.with accurate spacing for inches, feet, etc.should yield valid results as well.should yield valid results as well.

i.e. : Measuring this piece of wood with ai.e. : Measuring this piece of wood with a"good" tape measure should produce a correct"good" tape measure should produce a correctmeasurement of the wood's lengthmeasurement of the wood's length


5/16

5

To apply these concepts to PsychologicalTo apply these concepts to Psychologicalassessment and evaluation, we want to useassessment and evaluation, we want to usemeasurement tools that are bothmeasurement tools that are both reliable andreliable and

valid.valid.

We want questions that yield consistentWe want questions that yield consistentresponses when asked multiple timesresponses when asked multiple times -- this isthis is

reliability.reliability. Similarly, we want questions that get accurateSimilarly, we want questions that get accurate

responses from respondentsresponses from respondents -- this is validitythis is validity


6/16

6

ReliabilityReliability

Reliability refers to a condition where aReliability refers to a condition where ameasurement process yields consistent scoresmeasurement process yields consistent scores(given an unchanged measured phenomenon)(given an unchanged measured phenomenon)over repeat measurements.over repeat measurements.

Three criteria of reliability.Three criteria of reliability.

-- Test Test--retestReliabilityretestReliability

-- InterInter--item reliabilityitem reliability

-- Interobserver reliabilityInterobserver reliability


7/16

7

TestTest--retest Reliabilityretest Reliability

When a researcher administers the same measurementWhen a researcher administers the same measurementtool multiple timestool multiple times -- asks the same question, follows theasks the same question, follows thesame research procedures, and assuming that there hassame research procedures, and assuming that there has

been no change in whatever he is measuring etc.been no change in whatever he is measuring etc. andandhe obtains consistent results.he obtains consistent results.

Eg. When a researcher asks the same person the sameEg. When a researcher asks the same person the samequestion twice ("What's your name?"), and he gets backquestion twice ("What's your name?"), and he gets backthe same results both times. If so, the measure has testthe same results both times. If so, the measure has test--

retest reliability.retest reliability.

Measurement of the piece of wood talked about earlierMeasurement of the piece of wood talked about earlierhas high testhas high test--retest reliability.retest reliability.


8/16

8

InterInter--item reliabilityitem reliability

This is a dimension that applies to casesThis is a dimension that applies to cases

where multiple items are used to measurewhere multiple items are used to measurea single concept.a single concept.

In such cases, answers to a set ofIn such cases, answers to a set of

questions designed to measure somequestions designed to measure somesingle concept (e.g., altruism) should besingle concept (e.g., altruism) should be

associated with each other.associated with each other.


9/16

9

Interobserver reliabilityInterobserver reliability

The extent to which different interviewers or observersThe extent to which different interviewers or observersusing the same measure get equivalent results.using the same measure get equivalent results.

If different observers or interviewers use the sameIf different observers or interviewers use the same

instrument to score the same thing, their scores shouldinstrument to score the same thing, their scores shouldmatch.match.

-- Eg: the interobserver reliability of an observationalEg: the interobserver reliability of an observationalassessment of parentassessment of parent--child interaction is oftenchild interaction is oftenevaluated by showing two observers a videotape ofevaluated by showing two observers a videotape ofa parent and child at play.a parent and child at play.

If the instrument has high interobserver reliability, theIf the instrument has high interobserver reliability, thescores of the two observers should match.scores of the two observers should match.


10/16

10

ValidityValidity

Validity refers to the extent we are measuring what weValidity refers to the extent we are measuring what wehope to measure (and what we think we are measuring).hope to measure (and what we think we are measuring).

A valid measure should satisfy four criteria.A valid measure should satisfy four criteria.

-- Face ValidityFace Validity

-- Content ValidityContentValidity

-- CriterionCriterion--related Validityrelated Validity

-- Construct ValidityConstructValidity


11/16

11

Face ValidityFace Validity

An assessment of whether a measure appears, on the face of it, toAn assessment of whether a measure appears, on the face of it, tomeasure the concept it is intended to measure.measure the concept it is intended to measure.

This is a very minimum assessmentThis is a very minimum assessment -- if a measure cannot satisfyif a measure cannot satisfythis criterion, then the other criteria are inconsequential.this criterion, then the other criteria are inconsequential.

Eg. of observational measures of behavior that would have faceEg. of observational measures of behavior that would have facevalidity.validity.

-- striking out at another person would have facestriking out at another person would have face validity for anvalidity for anindicator of aggression.indicator of aggression.

-- offering assistance to a stranger would meet the criterion ofoffering assistance to a stranger would meet the criterion offace validity for helping.face validity for helping.

-- However, asking people about their favorite movie to measureHowever, asking people about their favorite movie to measureracial prejudice has little face validity.racial prejudice has little face validity.


12/16

12

Content ValidityContent Validity

The extent to which a measure adequatelyThe extent to which a measure adequatelyrepresents all facets of a concept.represents all facets of a concept.

Consider a series of questions that serve as indicatorsConsider a series of questions that serve as indicators

of depression (don't feel like eating, lost interest inof depression (don't feel like eating, lost interest inthings usually enjoyed, etc.).things usually enjoyed, etc.).

If there were other kinds of common behaviors thatIf there were other kinds of common behaviors thatmark a person as depressed that were not includedmark a person as depressed that were not included

in the index, then the index would have low contentin the index, then the index would have low contentvalidity since it did not adequately represent all facetsvalidity since it did not adequately represent all facetsof the concept.of the concept.


13/16

13

CriterionCriterion--related validityrelated validity

CriterionCriterion--related validity applies to instrumentsrelated validity applies to instrumentsthan have been developed for usefulness asthan have been developed for usefulness asindicator of specific trait or behavior, either nowindicator of specific trait or behavior, either now

or in the future.or in the future.

Eg.: driving test as a social measurement thatEg.: driving test as a social measurement thathas pretty good predictive validity.has pretty good predictive validity.

That is to say, an individual's performance on aThat is to say, an individual's performance on adriving test correlates well with his/her drivingdriving test correlates well with his/her drivingability.ability.


14/16

14

Construct ValidityConstruct Validity

There is not necessarily a pertinent criterionThere is not necessarily a pertinent criterionavailable for many things we want to measure.available for many things we want to measure.So we relate it to other measures as specified bySo we relate it to other measures as specified bytheory or previous research.theory or previous research.

So, Construct validity concerns with the extentSo, Construct validity concerns with the extentto which a measure is related to other measuresto which a measure is related to other measuresas specified by theory or previous research.as specified by theory or previous research.

Does a measure stack up with other variablesDoes a measure stack up with other variables

the way we expect it to?the way we expect it to? Eg: selfEg: self--esteem refers to a person's sense ofesteem refers to a person's sense of

selfself--worth or selfworth or self--respect.respect.


15/16

15

Construct ValidityConstruct Validity

Clinical observations in psychology had shownClinical observations in psychology had shownthat people who had low selfthat people who had low self--esteem often hadesteem often had

depression.depression.

Therefore, to establish the construct validity ofTherefore, to establish the construct validity ofthe selfthe self--esteem measure, the researchersesteem measure, the researchersshowed that those with higher scores on theshowed that those with higher scores on the

selfself--esteem measure had lower depressionesteem measure had lower depressionscores, while those with low selfscores, while those with low self--esteem hadesteem hadhigher rates ofhigher rates of


16/16

16

Group WorkGroup Work

Factors affecting test validityFactors affecting test validity

Factors affecting test reliabilityFactors affecting test reliability

State the method/methods use to determineState the method/methods use to determine

-- FaceV

alidityFaceV

alidity-- Content ValidityContentValidity

-- CriterionCriterion--related Validityrelated Validity

-- Construct ValidityConstructValidity

State the method/methods use to determineState the method/methods use to determine-- Test Test--retestReliabilityretestReliability

-- InterInter--item reliabilityitem reliability

-- Interobserver reliabilityInterobserver reliability