The measurement model: what does it mean and what you can do with it? Presented by Michael Nering, Ph. D

The measurement model: what does it mean and what you can do with it? Presented by Michael Nering, Ph. D. Slide 3 Goals of this session Present commonly used measurement models IRT Show how these models form the backbone of any large scale assessment program Equating Scaling To discuss the meaning of the measurement models Ability estimation Item characteristics Slide 4 Now, really . This session is an introduction to the world of psychometrics My goal is for you to understand that psychometrics: Is not a black box Is really just a set of procedures Slide 5 A little about me Yes, Im a psychometrican B.A. in psychology at Kent State Ph. D. in psychology at Univ of Minn Working at Measured Progress since 1999 Research areas of interest: IRT, equating, scaling, person fit, adaptive testing Slide 6 Slide 7 Why Psychology? Psychometricians typically come from: Educational measurement programs Psychometric programs I/O programs Ultimately, we are all after the pursuit of understanding people by way of quantification Slide 8 Psychometrics Defined Psychological Measurement Psycho metrics The business of measuring psychological things Slide 9 What are psychological things? Any latent trait Any characteristic that is not directly observable Examples: Depression, bi-polar, personality disorder Math, reading, writing, science abilities We dont care lets use Slide 10 Counterparts to Psychometrics Econometrics Measurement of economic things Sociometrics Measurement of social things Slide 11 All metrics are ultimately a blend of things Psycho- metrics PsychStatsMath Slide 12 Quantification in Psychology Deep roots that came originally from philosophy Philosophy in the 1500s branched into several disciplines because of the need to quantify certain things to better understand human beings Slide 13 Philosophys Many Branches This desire to better understand humans lead to two primary areas of study Physiology 1543 Belgian physiologists practices the dissection of cadavers Psychology 1524 Marco Marulik publishes The Psychology of Human Thought Slide 14 Yes, I did just use the word cadaver but trust me its okay Slide 15 The last 100 years of psychometrics Classical test theory and Spearmans 1904 contribution True score theory Reliability theory, p-values, point biserial coefficients Item response theory Slide 16 Lets talk about IRT When I say measurement model I really do mean some sort of IRT model Lots of historical developments Lord & Novick text of 1968 Many advantages over CTT Slide 17 So, what is IRT? A family of mathematical models that describe the interaction between examinees and test items Examinee performance can be predicted in terms of the underlying trait Provides a means for estimating scores for people and characteristics of items Common framework for describing people and items Slide 18 The ogive Natural occurring form that describes something about people Used throughout science, engineering, and the social sciences Also, used in architecture, carpentry, engineering, photograph, art, and so forth Slide 19 The ogive Slide 20 Slide 21 A little jargon The item characteristic curve (ICC) Also called: Item response function Trace line Etc. Stochastic: 1) involving a random variable, or 2) involving chance or probability Slide 22 The ICC Does this one little function really do everything? Scale items & people onto a common metric? Help in standard setting? Foundation of equating? Some meaning in terms of student ability? Slide 23 Does this one little function really do everything? Lets talk more about the ICC Slide 24 The ICC Any line in a Cartesian system can be defined by a formula The simplest formula for the ogive is the logistic function: Slide 25 The ICC Where is the item parameter, and is the person parameter The function represents the probability of responding correctly to item i given the ability of person j. Slide 26 is the inflection point Item i i =0.125 Slide 27 We can now use the item parameter to calculate p Lets assume we have a student with =1.0, and we have our = 0.125 Then we can simply plug in the numbers into our formula Slide 28 Using the item parameters to calculate p p = 0.705 i =1.00 Slide 29 Wait a minute What do you mean a student with an ability of 1.0?? Does an ability of 0.0 mean that a student has NO ability? What if my student has a reading ability estimate of -1.2? What in the world does that mean???? Slide 30 The ability scale Ability is on an arbitrary scale that just so happens to be centered around 0.0 We use arbitrary all the time: Fahrenheit Celsius Decibels DJIA Slide 31 Scaled Scores Although ability estimates are centered around zero reported scores are not However, scaled scores are typically a linear transformation of ability estimates Example of a linear transformation: (Ability x Slope) + Intercept Slide 32 The need for scaled scores the kids will have negative ability estimates Slide 33 Scaled Scores Slide 34 Use of scaled scores Student/parent level report School/district report Cross year comparisons Performance level categorization Slide 35 Theres a lot here Scaled scores are surface level information Behind the scenes: we use fancy formulas to depict interaction between students and test items theres a probabilistic relationship between students and test items Slide 36 Unfortunately, life can get a lot worse Items vary from one another in a variety of ways: Difficulty Discrimination Guessing Item type (MC vs. CR) Slide 37 Items can vary in terms of difficulty Ability of a student Easier item Harder item Slide 38 Items can vary in terms of discrimination Discrimination is reflected by the pitch in the ICC Thus, we allow the ICCs to vary in terms of their slope Slide 39 Good item discrimination 2 close ability levels Noticeable difference in p Slide 40 Poor item discrimination smaller difference Same 2 ability levels Slide 41 Guessing This item is asymptotically approaching 0.25 Slide 42 Polytomous Items Slide 43 Im sure by now you might be having a couple of thoughts How can I get up, open the door, and walk out without anybody noticing? Im stuck in a psychometric prison help me! Slide 44 But, trust me Im really trying to make a simple point Slide 45 Items and people Interact in a variety of ways We can use IRT to show that there exists a nice little s-shaped curve that shows this interaction As ability increases the probability of a correct response increases Slide 46 Advantages of IRT Because of the stochastic nature of IRT there are many statistical principles we can take advantage of A test is a sum of its parts Slide 47 The test characteristic curve A test is made up of many items The TCC can be used to summarize across all of our items The TCC is simply the summation of ICCs along our ability continuum For any ability level we can use the TCC to estimate the overall test score for an examinee Slide 48 A bunch of ICCs are on a test Slide 49 The test characteristic curve Slide 50 From an observed test score (i.e., a students total test score) we can estimate ability The TCC is used in standard setting to establish performance levels The TCC can also be used to equate tests from one year to the next The test characteristic curve Slide 51 Estimating Ability Total score = 3 Ability0.175 Slide 52 Standard Setting Advanced Prof. Basic Below Failing Basic Slide 53 Equating Year 1 TCC Slide 54 Equating -3-2 0 1 2 3 Year 2 TCC & Scale Slide 55 Equating Our 2 nd scale goes away and our TCC are closer together Slide 56 Equating Remaining differences due to non-common items Slide 57 Equating The adjustment to the TCC can be done a variety of different ways Lets take a look a one commonly used method of equating, namely the Mean Shift method Slide 58 Mean shift method of equating Slide 59 These items are common between the two years Slide 60 Mean shift method of equating Slide 61 Mean shift method of equating Slide 62 Mean shift method of equating The difference between 1 and 2 is our scaling constant This is used to make an adjustment to all the items administered in Year 2, so that they are then on the same scale as Year 1 Slide 63 Example = 0.20 2 = -0.10 We need to add 0.30 ( 1- 2:.2+.1=.3) in order for the equating items to have the same mean This 0.30 difference is due to an arbitrary scaling difference and NOT due to any differences in ability Slide 64 Mean shift method of equating 0.30 is then added to all the item difficulty values Slide 65 Mean shift method of equating By shifting all our item difficulties to last years scale we are ultimately putting this years TCC onto last years scale Slide 66 Equating The example we just saw was merely one example of an equating methods There are several methods (Kolen & Brennan) that are available Slide 67 What have we learned? IRT: used to model interaction between items and people Item characteristics: item vary in terms of difficulty, discrimination, guessing, etc. Equating: used to relate test from one year to the next Scaling: used to represent student ability Slide 68 The Assessment Cycle Administration ICCs & TCCs Equating Ability estimates & scaling Reporting Slide 69 So, how is this all done? Slide 70 Psychometricians often play the role of the magical wizard of assessments Slide 71 But, really This session has served as your training in psychometric methods For career opportunities please send along a copy of your vita to: Measured Progress Attn: psychometric Department 171 Watson Road Dover, Nh 03820

Documents

The measurement model: what does it mean and what you can do with it? Presented by Michael Nering, Ph. D