37
Louzel M Linejan Presenter

Louzel Report - Reliability & validity

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Louzel Report - Reliability & validity

Louzel M LinejanPresenter

Page 2: Louzel Report - Reliability & validity

Reliability definition Methods of estimating Reliability

Validity definition Forms of Validity Factors affecting validity

Reliability vis-à-vis Validity

Page 3: Louzel Report - Reliability & validity
Page 4: Louzel Report - Reliability & validity

● refers to how consistently data are collected (Lee 2004)

● the degree to which a test consistently measures whatever it measures and indicates the consistency of the scores produced (Raagas,2009).

● the extent to which results are consistent over time and an accurate representation of the total population under study (Joppe 2000).

● concerns with the replicability and consistency of the methods, conditions and results (Wiersa and Jurs, 2005).

Page 5: Louzel Report - Reliability & validity

● expressed numerically, usually as a coefficient ranging from 0.0 to 1.0; meaning the score of the respondent perfectly reflected their true status with respect to the variable being measured.

If a test is perfectly reliable, the reliability coefficient is 1.0; meaning the score of the respondent perfectly reflected their true status with respect to the variable being measured.

● no test is perfectly reliable and the scores are invariably affected by errors of measurements resulting from a variety of causes.

Page 6: Louzel Report - Reliability & validity

1. Stability ( also called Test – Retest Reliability) 

- the degree to which results/scores on the same test are consistent over time. The more similar the scores on the test over time, the more stable or consistent are the scores.

- indicates score variation that occurs from one testing session to another.

- provides evidence that scores obtained on a test at one time (test), are the same or close to the same when the test is readministered some other time (retest).

Page 7: Louzel Report - Reliability & validity

1. Stability ( also called Test – Retest Reliability)

The procedure for determining test-retest reliability is basically simple:

Administer the test to an appropriate group After some time has passed, administer the same test to the same group Correlate the two sets of scores Evaluate the results

Page 8: Louzel Report - Reliability & validity

1. Stability ( also called Test – Retest Reliability)

Disadvantage:

difficulty of knowing how much time should elapse between the two testing sessions

- If the interval is too short, the chances of the subject’s remembering responses made on the first test are increased, and the estimate of reliability tends to be artificially high.

- If the interval is too long, the respondents’ test performance may increase due to the intervening learning or maturation, and the estimate of the reliability tends to be artificially low.

Page 9: Louzel Report - Reliability & validity

2. Equivalence (or Equivalent Forms)

- Two tests that are identical, except for the actual items included.

- The two forms measure the same variable, have the same number of items, the same structure, the same difficulty level, and the same direction for administration, scoring and interpretation.

- If there is equivalence, the two tests can be used interchangeably. The correlation between scores on the two forms will yield an estimate of their reliability.

Page 10: Louzel Report - Reliability & validity

2. Equivalence (or Equivalent Forms)

The procedure for determining equivalent-forms reliability is similar to that for determining test-retest reliability:

  Administer one form of the test to an appropriate

group After some time, or shortly thereafter, administer

the second form of the test to the same group Correlate the two sets of scores Evaluate the results  

Page 11: Louzel Report - Reliability & validity

3. Internal Consistency Reliability (Methods of Internal Analysis)  - commonly used form of reliability which deals with one test a time. This is obtained through Split-Half, Kuder-Richardson and Cronbach Coefficient Alpha. Each provides information about the consistency among the items in a single test.  - applicable to instruments that have more than one item as it refers to how homogenous the items of a test are; or how well the measure of a single construct

Page 12: Louzel Report - Reliability & validity

3. Internal Consistency Reliability a. Split-Half Reliability

- A common approach is to split a test into two reasonable equivalent halves. These independent subjects are then used as a source of the two independent scores needed for reliability’s estimation.

- simplest statistical technique; randomly splits the questionnaire items into 2 groups. A score for each participant is then calculated based on each half of the scale.

Page 13: Louzel Report - Reliability & validity

a. Split-Half Reliability

This procedure requires only 1 administration of the test. Test items are divided into 2 halves, with the items of the 2 halves are then scores independently.

The problem with this method is that there are several ways in which a set of data can be split into two and so the results might stem from the way in which the data were split.

Page 14: Louzel Report - Reliability & validity

3. Internal Consistency Reliability b. Kuder-Richardson  Kuder and Richardson developed two of the most widely accepted methods for estimating reliability. These are the K-R20 and K-R21. These estimate the consistency reliability by determining how all items in a test relate to all other test items and the whole test. These are useful for true-false and the multiple choice items.

Page 15: Louzel Report - Reliability & validity

3. Internal Consistency Reliability b. Kuder-Richardson

  K – R20 = most advisable if the “proportion of correct responses to a particular item” vary a lot; provide the mean of all possible split-half coefficients

K – R21 = most advisable if the items do not vary much in difficulty, i.e., the “proportion of correct responses to a particular item” are more or less similar; may be substituted forK-R20 if it can be assumed that item difficulty levels are similar.

Page 16: Louzel Report - Reliability & validity

3. Internal Consistency Reliability c. Cronbach Coefficient Alpha

used only if the item scores are other than 0 & 1. This is advisable for essay items, problem solving and 5-scaled items. ; based on 2 or more parts of the test, requires only one administration of the test.

Page 17: Louzel Report - Reliability & validity
Page 18: Louzel Report - Reliability & validity

● degree to which a test measures what is supposed to measure and consequently, permits appropriate interpretations of test scores (Raagas,2009). ● Validity determines whether the research truly measures that which it was intended to measure or how truthful the research results are (Joppe 2000).

● refers to the ability of the survey questions to accurately measure what they claim to measure (Lee 2004) ● anwers the question: Are we measuring what we want to measure? (Muijs, 2004)

Page 19: Louzel Report - Reliability & validity

● Validity’s 3 terms of degrees:

- highly valid

- moderately valid

- generally invalid

● The validation process begins with an understanding of the interpretation to be made from the tests or instruments.

Page 20: Louzel Report - Reliability & validity

● CONTENT VALIDITY

● CONSTRUCT VALIDITY

● CRITERION-RELATED VALIDITY

Page 21: Louzel Report - Reliability & validity

the degree to which a test measures an intended content area

the degree to which adequate data is collected as it relates to the construct being measured.

Establishing the representativeness of the items with respect to whatever is being measured

Content validity is also invoked when the argument is made that the measurement so self-evidently reflects or represents the various aspects of the phenomenon being researched on.

Page 22: Louzel Report - Reliability & validity

Requires Item Validity and Sampling Validity

Item Validity- concerned whether the test items are relevant to the intended content area

Sampling Validity - concerned with how well the test sample

represents the total content area

Page 23: Louzel Report - Reliability & validity

Establishes validity through a comparison with Criterion-standard by which the validity of the test will be judged. If the scores of the measure being validated relate highly to the criterion, the measure is valid. If not, the measure is not valid, for the purpose for which the criterion measure is used.

Measures the ability of survey instrument or question to predict or estimate.

this is whether the questionnaire is measuring what it claims to measure.

Page 24: Louzel Report - Reliability & validity

Concurrent Validity degree to which the scores on a test are related to scores

on another test administered at the same time. Whether or not the test scores estimate a specified

present performance Based on establishing an existing situation - “What is”

Predictive Validity degree to which scores on a test are related to scores on

another test administered in the future. Whether or not the test scores predict a specified future

performance Based on establishing – “What is likely to happen?”

Page 25: Louzel Report - Reliability & validity

seeks to determine whether the construct underlying a variable is actually measured.

determined by a series of validation studies that can include content and criterion-related approaches.

both confirmatory and disconfirmatory evidence are used

The extreme difficulty of this kind of validation lies in the observable nature of many of the construct (such as social class, personality, attitudes etc) used to explain behavior.

One way to assess to construct validity is to test whether or not the measure confirms hypotheses generated from the theory based on the concepts.

Page 26: Louzel Report - Reliability & validity

Threats arise when researchers draw incorrect inferences from the sample to people, settings or situations not sufficiently related to the sample such as a different racial, ethnic or socioeconomic group (Creswell, 2003).

Group threats – if our experimental and control groups have wide and extensive differences.

Regression to the mean – if the participants produce extreme scores on a pre-test (either very high and low).

Time threats – with the passage of time, events may occur which produce changes in our participants’ behavior.

Page 27: Louzel Report - Reliability & validity

Respondent’s History –events in the participants’ lives which are entirely unrelated to our manipulations of the variables.

Maturation – participants- especially the young ones – may change simply as a consequence of development.

Reactivity and Experiment Effects - measuring a person’s behavior may affect their behavior, for variety of reasons. People’s reaction to having their behavior measured may cause them to change their behavior.

Page 28: Louzel Report - Reliability & validity

Instrumentation – may also result in systematic error if they are not carefully planned. Lack of specifity in assessing certain variables could lead to varying interpretations by respondents.

Characteristics of Subjects / Respondents of the study – respondents in surveys may bear basic personal characteristics that could influence their response toparticular factors or variables of the study variables

Researcher’s personal characteristics – researcher’s may influence the way subjects in experiments or respondents in a survey will respond.

Page 29: Louzel Report - Reliability & validity

Suppose the reported reliability coefficient for a test was 0.24, this definitely is not good. Would this tell something about the validity of the test?

What if a test is so hard and no respondent could answer even a single item?

Scores would still be consistent, but not valid.If a test measures what it is supposed to measure, it is

reliable, but a reliable test can consistently measure the wrong thing and be invalid.

Yes, it would. It would show that the validity is not high because if it were, the reliability would be higher.

Page 30: Louzel Report - Reliability & validity

Reliability is necessary but not sufficient for establishing validity.

A valid test is always reliable but a reliable test is not always valid.

What if the reported reliability was 0.92, which is definitely high. Would this tell anything about validity?

“not really”. It would only indicate that the test validity might be also high, because the reliability is high, but not necessarily; the test could be consistently measuring the wrong thing.

Page 31: Louzel Report - Reliability & validity
Page 32: Louzel Report - Reliability & validity

- ensure that the quality of questions we ask is clear and unambiguous. Unambiguous and clear question are likely to be more reliable, and the same goes for items on a rating scale for observers.

- Another way to make an instrument more reliable is by measuring it with more than one item.

- ensure that dependent variable is measured as precisely as possinble

Page 33: Louzel Report - Reliability & validity

a. Split-Half Reliability

When we need to predict the reliability of a test twice as long as given test, as in the split halves method, the formula is shown below:

The problem with this method is that there are several ways in which a set of data can be split into two and so the results might stem from the way in which the data were split.

Page 34: Louzel Report - Reliability & validity

b. Kuder-Richardson  K – R20 = most advisable if the p values vary a lot

K – R21 = most advisable if the items do not vary much in difficulty, i.e., the p values are more or less similar

Page 35: Louzel Report - Reliability & validity

c. Cronbach Coefficient Alpha  used only if the item scores are other than 0 & 1. This is advisable for essay items, problem solving and 5-scaled items.

where si = standard deviation of a single test item and S = standard deviation of the total score of each examinee.

Page 36: Louzel Report - Reliability & validity

Internal Reliability – consistency in the research process (can be addressed in a number of ways. Much of QR involves observation by multiple observers as part of data gathering. Relies on the logical analysis of the results External Reliability – asks the question “ Are the findings generalizable.

Page 37: Louzel Report - Reliability & validity

Internal Validity – cause and effect inference linking the independent variable and the dependent variable. To answer this question, the researcher must be confident that factors such as extraneous variables have been controlled and are not producing and effect that is mistaken as experimental treatment effect. External Reliability – deals with the generalizability of the results of the study. To what populations, variables, situations and so forth do the results generalize? Generally, the more extensive the results can be generalize, the more useful the research given that there is adequate validity.