1 CHAPTER 5 Test Scores as Composites This Chapter is about the Quality of Items in a Test
71
1 CHAPTER 5 CHAPTER 5 Test Scores as Test Scores as Composites Composites This Chapter is about the This Chapter is about the Quality of Items in a Test. Quality of Items in a Test.
1 CHAPTER 5 Test Scores as Composites This Chapter is about the Quality of Items in a Test
1 CHAPTER 5 Test Scores as Composites This Chapter is about the
Quality of Items in a Test.
Slide 2
2 Test Scores as Composites What is the Composite Test Score? A
composite test score is a total test score created by summing two
or more subtest scores i.e., WAIS IV Full Scale IQ consisted of
1-Verbal Comprehension 2-Perceptual Reasoning Index, 3-Working
Memory Index, and 4-Processing Speed Index. Qualifying Examinations
and EPPP Exams are also composite test scores. A composite test
score is a total test score created by summing two or more subtest
scores i.e., WAIS IV Full Scale IQ consisted of 1-Verbal
Comprehension Index, 2-Perceptual Reasoning Index, 3-Working Memory
Index, and 4-Processing Speed Index. Qualifying Examinations and
EPPP Exams are also composite test scores.
Slide 3
3 Item Scoring Schemes Systems We have 2 different scoring
system Item Scoring Schemes [skeems]Systems We have 2 different
scoring system 1. Dichotomous Scores Dichotomous Scores are
restricted to 0 and 1 such as scores on True and False, and
multiple-choice question 2. Non-dichotomous Scores Non dichotomous
Scores are not restricted to 0 and 1 Can have range of possible
points such as in essays. 1,2, 3, 4, 5..
Slide 4
4 Dichotomous Scheme Examples 1. The space between nerve cell
endings is called the a. Dendrite a. Dendrite b. Axon ; b. Axon ;
c. Synapse c. Synapse d. Neutron d. Neutron (In this item,
responses a, b, and d are scored 0; response c is scored 1.) 2.
Teachers in public school systems should have the right to strike.
a. Agree a. Agree b. Disagree b. Disagree (In this item, a response
of Agree is scored 1; Disagree is scored 0). Or, you can use True
or False.
Slide 5
5 Practical Implication for Test Construction Variance and
Covariance measure the quality of items in a test. Reliability and
validity measure the quality of the entire test. =SS/N used by one
set of data Variance is the degree of variability of scores from
mean.
Slide 6
6 Practical Implication for Test Construction Correlation is
based on a statistic called Covariance (Cov xy or S xy)
COVxy=SP/N-1 used for 2 sets of data Covariance is a number that
reflects the degree to which 2 variables vary together.
r=sp/ssx.ssy
Slide 7
7Variance X = ss/N Pop 1 s = ss/n-1 or ss/df Sample 1 s =
ss/n-1 or ss/df Sample 2 4 5 SS= x -(x)/N SS= ( x- ) Sum of Squared
Deviation from Mean
Slide 8
8Covariance Covariance is a number that reflects the degree to
which 2 variables vary together. Original Data X Y X Y 1 3 1 3 2 6
2 6 4 4 4 4 5 7 5 7
Slide 9
9CovarianceCOVxy=SP/N-1 2 ways to calculate the SP SP=
xy-(x.y/N) SP= (x-x)(y-y) SP requires 2 sets of data SS requires
only one set of data
Slide 10
10 Descriptive Statistics for Dichotomous Data
Slide 11
11 Descriptive Statistics for Dichotomous Data Item Variance
& Covariance
Slide 12
12 Descriptive Statistics for Dichotomous Data P=Item
Difficulties: P= (#of examinees who answered an item correctly /
total # of examinees or P=f/N See handout The higher the P value
The easier the item
Slide 13
Relationship between Item Difficulty P and Variance Variance
(quality) (quality) 0 difficult 0.5 1 easy P= Item Difficulty
13
Slide 14
14 Non-dichotomous Scores Examples 1. Write a grammatically
correct German sentence using the first person singular form of the
verb verstehen. (A maximum of 3 points may be awarded and partial
credit may be given.) 2. An intellectually disabled person is a
nonproductive member of society. 5. Strongly agree 4. Agree, 3. No
opinion 5. Strongly agree 4. Agree, 3. No opinion 2. Disagree 1.
Strongly disagree 2. Disagree 1. Strongly disagree (Scores can
range from 1 to 5 points. with high scores indicating a positive
attitude toward intellectually disabled citizens.) (Scores can
range from 1 to 5 points. with high scores indicating a positive
attitude toward intellectually disabled citizens.)
Slide 15
15 Descriptive Statistics for Non-dichotomous Variables
Slide 16
16 Descriptive Statistics for Non-dichotomous Variables
Slide 17
17 Variance of a Composite C =SS/N a =SS a /N a b =SS b /N b C
= a+b Ex. From WAIS III-- FSIQ=VIQ+PIQ If More than 2 subtests, C =
a + b + c Calculate the variance for each subtest and add them
up.
Slide 18
18 Variance of a Composite C What is the Composite Test Score?
Ex. WAIS IV Full Scale IQ which consist of a-Verbal Comprehension
Index, b-Perceptual Reasoning Index, c-Working Memory Index, and
d-Processing Speed Index. More than 2 subtests C = a + b + c +
d
Slide 19
19 *Suggestions to Increase the Total Score Variance of a Test
1-Increase the number of items in a test 2-Item difficulties p
(medium range) 3-Items with similar content have higher
correlations & higher covariance 4-Item scores & total
scores variances alone are not indices (in-d- cz) of test quality
(reliability and validity).
Slide 20
20 *1-Increase the Number of Items in a Test (how to calculate
the test variance) Variance for a test of 25 items is higher than a
variance for a test of 20 items. =N( x )+N(N-1)(COV x )= Ex. If the
COVx=items covariance = (0.10) x =items variance (0.20) x =items
variance (0.20) N= #of items in a test -- first try N=20 =test
variance For 20 items 42, then try N=25 then try N=25 and =test
variance for 25 items 65 and =test variance for 25 items 65
Slide 21
21 2-Item Difficulties Item difficulties should be almost equal
for all of the items and difficulty levels should be in the medium
range.
Slide 22
22 3-Items with Similar Content have Higher Correlations &
Higher Covariance
Slide 23
23 4- Item Scores & Total Scores Variances Alone are not
Indices (in-d- cz) of Test Quality Variance and Covariance are
important and necessary however, they are not sufficient to
determine the test quality. Variance and Covariance are important
and necessary however, they are not sufficient to determine the
test quality. To determine a higher level of test quality we use
Reliability and Validity. To determine a higher level of test
quality we use Reliability and Validity.
Slide 24
UNIT II RELIABILITY CHAP 6: RELIABILITY AND THE CLASSICAL TRUE
SCORE MODEL CHAP 7: PROCEDURES FOR ESTIMATING RELIABILITY CHAP 8:
INTRODUCTION TO GENERALIZABILITY THEORY CHAP 9: RELIABILITY
COEFFICIENTS FOR CRITERION-REFERENCED TESTS 24
Slide 25
25 CHAPTER 6 Reliability and the Classical True Score Model
Reliability (p)=Reliability is a measure of
consistency/dependability, or when a test measures same thing more
than once and results in same outcome. Reliability refers to the
consistency of examinees performance over repeated administrations
of the same test or parallel forms of the test (Linda Crocker
Text).
Slide 26
THE MODERN THE MODERN MODELS MODELS 26
Slide 27
*TYPES OF RELIABILITY *TYPES OF RELIABILITY TYPE OF RELIABILITY
WHT IT IS HOW DO YOU DO IT WHAT THE RELIABILITY COEFFICIENT LOOKS
LIKE Test-Retest 2 Admin stability A measure of stability same
test/measure same group Administer the same test/measure at two
different times to the same group of participants r test1.test2 Ex.
IQ test Parallel/alternate Interitem/Equivalent Forms 2 Admin
equivalence A measure of equivalence twodifferent forms same test
to the same group Administer two different forms of the same test
to the same group of participants r testA.testB Ex. Stats Test r
testA.testB Test-Retest with Alternate Forms 2 Admin stability
equivalence A measure of stability and equivalence On Monday, you
administer form A to 1st half of the group and form B to the second
half. On Friday, you administer form B to 1st half of the group and
form A to the 2nd half Inter-Rater 1 Admin agreement A measure of
agreement Have two raters rate behaviors and then determine the
amount of agreement between them Percentage of agreement Internal
Consistency 1 Admin consistently each item measures the same
underlying construct A measure of how consistently each item
measures the same underlying construct Correlate performance on
each item with overall performance across participants Cronbachs
Alpha Method Kuder-Richardson Method Split Half Method Hoyts Method
27
Slide 28
Test-Retest Class IQ Scores Students X 1 st time on Mon Y 2 nd
time on Fri John 125 120 Jo 110 112 Mary 130 128 Kathy 122 120
David 115 120 28
Slide 29
Parallel/alternate Forms Scores on 2 forms of stats tests
Students Form A Form B John 95 92 Jo 84 82 Mary 90 88 Kathy 76 80
David 81 78 29
Slide 30
Test-Retest with Alternate Forms On Monday, you administer form
A to 1st half of the group and form B to the second half. On
Friday, you administer form B to 1st half of the group and form A
to the 2nd half Students Form A to 1st group (Mon) Students Form B
to 2nd group (Mon) David 85 Mark 82 Mary 94 Jane 95 Jo 78 George 80
John 81 Mona 80 Kathy 67 Maria 70 Next slide 30
Slide 31
Test-Retest with Alternate Forms On Friday, you administer form
B to 1st half of the group and form A to the second Students Form B
to 1 st group (Fri) Students Form A to 2nd group (FRi) David 85
Mark 82 Mary 94 Jane 95 Jo 78 George 80 John 81 Mona 80 Kathy 67
Maria 70 31
Slide 32
32 HOW RELIABILITY IS MEASURED Reliability is Measured by Using
a Correlation Coefficient Correlation Coefficient r test1test2 or r
x.y r test1test2 or r x.y Reliability Coefficients: Indicates how
scores on one test change, relative to scores on a second test
Indicates how scores on one test change, relative to scores on a
second test Can range from 0.0 to 1 Can range from 0.0 to 1 1.00 =
perfect reliability1.00 = perfect reliability 0.00 = no
reliability0.00 = no reliability
Slide 33
THE CLASSICAL MODEL MODEL 33
Slide 34
34 A CONCEPTUAL DEFINITION OF RELIABILITY CLASSICAL MODEL
Method Error Method Error Observed Score = True Score Error Score
Trait Error Trait Error X=T E
Slide 35
Classical Test Theory The Observed Score, X=T+E X is the score
you actually record or observe on a test. The True Score, T=X-E or,
the difference between the Observed score and Error score is the
True score T score is the reflection of the examinee true knowledge
The Error Score, E =X-T or, the difference between the Observed
score and True score is the Error score. E are factors that cause
the True Score and observed score to differ. 35
Slide 36
36 A CONCEPTUAL DEFINITION OF RELIABILITY (X) Observed Score
X=T E Score that actually observed Score that actually observed
Consists of two components Consists of two components True
ScoreTrue Score Error ScoreError Score Method Error Method Error
Observed Score = True Score Error Score Trait Error Trait
Error
Slide 37
37 A CONCEPTUAL DEFINITION OF RELIABILITY True Score T=X-E
Perfect reflection of true value for individual Perfect reflection
of true value for individual Theoretical score Theoretical score
Method Error Method Error Observed Score = True Score Error Score
Trait Error Trait Error
Slide 38
38 Method error is due to characteristics of the test or
testing situation Trait error is due to individual characteristics
Conceptually, Reliability = True Score Observed Score Reliability
of the observed score becomes higher if error is reduced!! Method
Error Method Error Observed Score = True Score Error Score Trait
Error Trait Error True Score True Score + Error Score A CONCEPTUAL
DEFINITION OF RELIABILITY
Slide 39
39 Error Score E=X-T Is the Difference between Observed and
True score Is the Difference between Observed and True score X=TE
X=TE 95=90+5 or 85=90-5 The difference between T and X is 5 points
or E=5 95=90+5 or 85=90-5 The difference between T and X is 5
points or E=5 Method Error Method Error Observed Score = True Score
Error Score Trait Error Trait Error A CONCEPTUAL DEFINITION OF
RELIABILITY OR
Slide 40
40 The Classical True Score Model X=TE X= Represents the
observed test score T= Represents the individual's True knowledge
of score E= Represents the random error component
Slide 41
41 Classical Test Theory What Makes up the Error Score? E=X-T
Error Score consist of; 1-Method Error and 2-Trait Error 1-Method
Error Method Error is the difference between True & Observed
Scores resulting from the test or testing situation. Method Error
is the difference between True & Observed Scores resulting from
the test or testing situation. 2-Trait Error Trait Error is the
difference between True & Observed Scores resulting from the
characteristics of examinees. See next slide
Slide 42
42 What Makes up the Error Score?
Slide 43
43 Expected Value of True Score Definition of the True Score
The True score is defined as the expected value of the examinees
test scores (mean of observed scores) over many repeated testing
with the same test.
Slide 44
44 Error Score Definition of the Error Score Error scores for
an examinee over many repeated testing should be Zero. eEj=Tj-Tj=0
eEj=Expected value of Error Tj=Examinee True Score Ex. next Ex.
next
Slide 45
45 Error Score X-E=T or, the difference between the Observed
score and Error score is the True score (scores are from the same
examinee) 98-8= 90 98-8= 90 88+2=90 88+2=90 80+10=90 80+10=90
100-10=90 XE=T 100-10=90 XE=T 95-5=90 95-5=90 81+9=90 81+9=90
88+2=90 88+2=90 90-0=90 90-0=90-8+2+10-10-5+9+2-0=0
Slide 46
46 *INCREASING THE RELIABILITY OF A TEST Meaning Decreasing
Error 7 Steps 1. Increase Sample Size (n) 2. Eliminate Unclear
Questions 3. Standardize Testing Conditions 4. Moderate the Degree
of Difficulty of the tests (P) the tests (P) 5. Minimize the
Effects of External Events 6. Standardize Instructions (Directions)
7. Maintain Consistent Scoring Procedures (use rubric)
Slide 47
47 *Increasing Reliability of your Items in a Test
Slide 48
48 *Increasing Reliability Cont..
Slide 49
49 How Reliability (p) is Measured for an Item/score P=True
Score/True Score + Error Score or p=T/T+E 0=== p === 1 Note: In
this formula you always add your Error(the difference between T and
X) to the True Score in the denominator (), Whether is positive or
negative. p=T/T + (the difference between T and X which is E)
p=T/T+E
Slide 50
Which Item has the Highest Reliability? Maximum points for this
question is 10 p=T/T+E +2= 8.. 8/10=0.80 -3=6. 6/9=0.666
+7=1.1/8=0.125 -1=9..9/10=0.90 +4=6....6/10=0.60 -4=6.....6/10=0.60
+1=7....7/8=0.875 0=1010/10=1.0 -5=4..4/9=0.444 +6=3..3/9=0.333
>MORE ERROR MORE ERROR