Item Analysis, Test Validity and ReliabilityItem Analysis, Test Validity and ReliabilityPrepared by:
Rovel A. AparicioMathematics Teacher
THERE IS always A better WAY
Stages in Test Construction
A. Determining the Objectives A. Determining the Objectives
B. Preparing the Table of Specifications B. Preparing the Table of Specifications
C. Selecting the Appropriate Item FormatC. Selecting the Appropriate Item Format
I. Planning the Test I. Planning the Test
D. Writing the Test items D. Writing the Test items
E. Editing the Test items
Stages in Test Construction
A. Administering the testA. Administering the test
Item analysis Item analysis
II. Trying Out the Test II. Trying Out the Test
C. Preparing the Final Form of the Test
Stages in Test Construction
IV. Establishing Test Reliability IV. Establishing Test Reliability
III. Establishing Test Validity III. Establishing Test Validity
V. Interpreting the Test ScoresV. Interpreting the Test Scores
DISCRIMINATION INDEX
refers to the degree to which success or failure of an item indicates possession of the acheivement being measured.
DIFFICULTY INDEX
the percentage of the pupils who got the items rigth.interpreted as how easy or how difficult an item is.
Item Analysis
GOAL: Improve the test.IMPORTANCE: Measure the effectiveness of individual test item.
ACTIVITY NO.1
COMPUTE THE DIFFICULTY INDEX AND
DISCRIMINATION INDEX OF PERIODICAL TEST.
U-L INDEX METHOD(STEPS)
1. Score the papers and rank them from highest to lowest according to the total score.2. Separate the top 27% and the bottom 27% of the papers.3. Tally the responses made to each test item by each individual in the upper 27% group.4.Tally the responses made to each test item by each individual in the lower 27% group.
U-L INDEX METHOD(STEPS)
5. Compute the difficulty index. [d= (U+L)/(nu+nl)]6. Compute the discrimination index. [D=(U-L)/nu] or [D=(U-L)/nl]
No. of pupils tested- 60
Item no.
Upper 27%
Lower 27%
Difficulty Index
Discrimination Index
Remarks
1
2
3
4
5
6
7
8
9
10
1214
610
13
3
13
6
912
6
12
10
4
14
11
6
2
7
8
0.81
0.50
0.56
0.34
0.56
0.63
0.53
0.41
0.78
0.44
0.13
0.25
0.25
0.44
-0.50
0.38
0.56
-0.44
0.06
0.13
Revised
Retained
Retained
Retained
Retained
Rejected
Retained
Rejected
Rejected
Revised
DISCRIMINATION INDEX
< .09 Poor items (Reject)
.10-.39 Reasonably Good (Revise)
.40-1.00 Very Good items (Retain)
DIFFICULTY INDEX
.00-.20 Very Difficult
.21-.80 Moderately Difficult
.81-1.00 Very Easy
Item Analysis
Construct
Validity
Construct Validity
Criterion- related Validity
Criterion- related Validity Content
Validity
Content Validity
Types of Test Validity
Types of Test Validity
Establishing Test Validity
Establishing Test Validity
Types of ValidityTypes of Validity MeaningMeaning ProcedureProcedure
1.Content Validity
How well the sample test bar tasks represent the domain of tasks to be measured.
Compare test tasks with test specifications describing the task domain under consideration (non-statistical)
Establishing Test Validity
Types of ValidityTypes of Validity MeaningMeaning ProcedureProcedure
2. Construct Validity
How test performance canbe describedpsychologically.
Experimentally determine what factorsinfluence scores on test. The procedure may be logical and statistical using correlations and other statistical methods.
Establishing Test Validity
Types of ValidityTypes of Validity MeaningMeaning ProcedureProcedure
3. Criterion- related Validity
How well test performance predicts future performance or estimates current performance on some valued measures other than the test itself.
Compare test scores with measure of performance(grade) obtain on later date(for prediction).or another measure of performance obtain concurrently(for estimating present status.( PrimarilyStatistical). Correlatetest results with outside criterion.
Measure of Internal
Consistency
Measure of Internal
Consistency
Measure ofStability andEquivalence
Measure ofStability andEquivalence
Measure ofStability
Measure ofStability
Types of ReliabilityMeasure
Types of ReliabilityMeasure
Establishing Test Reliability
Measure ofEquivalence
Measure ofEquivalence
Establishing Test Reliability
Types of ReliabilityMeasures
Types of ReliabilityMeasures
Methods of Estimating Reliability
Methods of Estimating Reliability ProcedureProcedure
1. Measure of Stability
Test- retest method
Give a test twice to the same group withany time intervalbetween tests fromseveral minutes to several years.(Pearson r)
Establishing Test Reliability
Types of ReliabilityMeasures
Types of ReliabilityMeasures
Methods of Estimating Reliability
Methods of Estimating Reliability ProcedureProcedure
2. Measure of Equivalence
Equivalent forms method
Give two forms of a test to the same group in close succession(Pearson r)
Establishing Test Reliability
Types of ReliabilityMeasures
Types of ReliabilityMeasures
Methods of Estimating Reliability
Methods of Estimating Reliability ProcedureProcedure
3. Measure of Stability
Test- retest with equivalentforms
Give two forms ofa test to the same group with increasedtime intervalsbetween forms.(Pearson r)
Establishing Test Reliability
Types of ReliabilityMeasures
Types of ReliabilityMeasures
Methods of Estimating Reliability
Methods of Estimating Reliability ProcedureProcedure
4. Measure of internal consistency
Kuder-Richarsonmethod
Give a test once. Score the total test and apply the Kuder Richardson formula.
Establishing Test Reliability
Types of ReliabilityMeasures
Types of ReliabilityMeasures
Methods of Estimating Reliability
Methods of Estimating Reliability ProcedureProcedure
4. Measure of internal consistency
Split half method
Give a test once. Score equivalent halves of the test. (e.g. odd and evennumbered items.(Pearson r and Spearman- Brown formula)
ACTIVITY NO.2
TEST THE RELIABILITY OF PERIODICAL TEST.
Pearson r Standard Scores(Directions)
1. Begin by writing the pairs of scores to be studied in two columns. Be sure that the pair of scores for each pupils is in the same row. Label one set of scores X , the other Y.2.Get the sum (∑) of the scores for each column. Divide the sum by the number of scores (N) in each column to get the mean.3.Subtract each score in column X from the mean x. Write the difference in column x. Be sure to put an algebraic sign.
Pearson r Standard Scores(Directions)
4. Subtract each score in column Y from the mean y. Write the difference in column y. Don't forget the sign.5. Square each score in column X. Enter each result under X2 .
6. Square each score in column Y. Enter each result under Y2 .
7. Compute the standard deviation of X and Y and enter the result under the column of SDx and SDy respectively .
Pearson r Standard Scores(Directions)
8. Divide each entry in column x and y by the standard deviation SDx and SDy respectively and enter the result under Zx and Zy respectively.9. Multiply Zx and Zy and enter the result under ZxZy.10. Get the sum (∑) ZxZy.11. Apply the formula r=∑ZxZy
N
Direction of Relationship
Negative coefficient means, as one variable increases, the other decreases.
Positive Coefficientmeans, as one variable increases, the other also increases
Magnitude or size ofRelationship
0.8 and above means high correlation
0.5 means moderatecorrelation
0.3 and below meanslow correlation
Interpretation of Coefficient of Correlation
Correlation is a measure of relationship between two variables.
Criteria:
less than 10%- Homogenous
greater than 10%-- Heterogenous
c.v. = (mean/s.d.)x100
Interpretation of Coefficient of Variation
Coeffecient of Variation is defined as the ratio of the standard deviation and the mean and usually expressed in percent.
REMEMBER:
1. Use item analysis procedures to check the quality of the test. The item analysis should be interpreted with care and caution2. A test is valid when it measures what it is supposed to measure3. A test is reliable when it is consistent .