Developing A Knowledge Base About Testing, Measurement, and Evaluation

Developing A Knowledge Base About Testing, Measurement, and

Evaluation

Testing

• A good testing program tells part of a student's educational story.

• Such a program includes a variety of measurements such as:

• student self-assessment, • teacher-made tests, • criterion-referenced standardized tests, • norm-referenced standardized tests.

Testing• Reading is a complex behavior.• Diagnosis is as complex as the reading process itself.• A good testing program takes time, but it is time well

spent.• The goal is to help all children maximize their full

reading potential. • This is what a reading diagnosis and improvement

program is all about.• A sound testing program that uses a variety of

measures should help teachers to focus on individual improvement.

Testing

• One of the unfortunate byproducts of the high-accountability era has been that the terms "testing" and even "assessment" now feel synonymous with standardized tests.

• But in reading diagnosis, assessment and testing must encompass broader concepts that are more interesting and vital than those addressed by state tests.

Terms

• The terms assessment, measurement, evaluation, and test are often treated as synonyms.

• But they are not. • These four terms are a hierarchy, with

assessment as the primary category.• Measurement and evaluation underneath,

with testing connecting to both measurement and evaluation.

Assessment

• Having good assessment questions is the starting point.

• They lead to measurement, or collection of evidence.

• Tests are common collection instruments. • Teachers must evaluate test results and check

their interpretation against their original assessment questions.

Assessment

• Assessment is a term with powerful potential. In everyday language it means figuring out what is going on.

• For example, when we walk into a room full of people, we assess the situation.

• When we are looking at buying a home, we might assess its condition or location.

• Assessment in school must preserve some of this everyday meaning.

Assessment

• Assessment helps teachers figure out what students know, what skills they have, what they can do, and whether they are learning anything.

• Because knowing someone else's mind can be tricky, instruments of measurement and evaluation exist to help teachers gain confidence in what they know about students' minds.

• There are three basic questions in assessment.

Assessment

• There are three basic questions in assessment.

• What do I want to know? • Why do I want to know it? • How can I discover this information with

confidence?

Assessment

• What makes answering the third question so difficult is that assessment uses a varietyof ways to measure and evaluate knowledge.

• Tests are a part of this larger process, all of which should lead us to confident statements about our students' minds.

Measurement

• Measurement is how educators obtain evidence to evaluate.

• It is parallel to evaluation in the hierarchy because without measurement, there would be no evidence to interpret (i.e., evaluate).

• The educational community leans heavily on the metaphors of scales and rulers.

• These are literally "instruments" of measurement.

Measurement• Once we have the evidence of putting something

on a scale or laying down a tape measure, then we can begin to evaluate what its weight or length means to us.

• Usually when we are measuring something, we have a desired "fit" in mind, a plan or purpose-that is, we don't usually just walk around weighing and measuring things for no apparent reason.

• It should be so with educational assessment

Measurement

• First, we must decide what our goals are.• Second, we must consider why we value those

goals.• Third, we must decide how we might proceed

toward attaining those goals. • Any means we use to achieve these goals is

an "instrument" of measurement.

Measurement

• Measurement instruments are our answer to the question of the "how" of assessment.

• In tests, one type of measurement instrument, the emphasis is often on question/answer sequences as a means of figuring out what someone else knows.

• We find this narrow emphasis unfortunate.

Measurement• The positive values of measurement outweigh

the negative connotations often associated with it.

• Measurement is useful for diagnostic, review, and predictive purposes.

• It can be used as a motivating technique for students, as well as a basis for discussing achievement with parents and other community members.

• Through ongoing measurement, teachers are also able to reevaluate their own teaching methods.

Measurement

• Smart diagnosis means we puzzle out what data are needed and then figure out how to gather them.

• When planning to teach an individual student, tests are only a part of the assessment picture because they may not provide the kind of evidence we want or need.

• For example, if we want to know whether a past traumatic experience is still affecting a child's reading performance, we might use test data, but we would also use our own observations and reports from counseling and parent conferences, which are less easy to quantify.

Measurement

• In order for measurement to be an effective part of the evaluative process, teachers must know varied instruments and be able to select, administer, and interpret them.

• Such instruments include standardized tests and teacher-made tests.

• Direct observation of student behavior is also necessary in order to collect data for valid evaluations.

Evaluation

• Evaluation is the interpretation of evidence gathered through measurement.

• When we have gathered evidence with measurement instruments, we return to the assessment question of "what do we want to know?" and see whether the evidence provides a reasonable answer.

• The scores on a test are one type of evidence, but they mean nothing in and of themselves.

Evaluation

• They have to be interpreted with respect to why we used the test as a measurement in the first place and what we hoped to learn about our students.

• When we write our own tests, we can track student responses back to individual items and also look for trends across items.

Evaluation

• By contrast, most standardized test authors keep individual test items and participant responses private.

• Teachers rely on the testing company to score and provide a written interpretation of participant responses.

• Evaluation involves passing personal judgment on the truthfulness, consistency, and validity of the evidence.

Evaluation

• Basic principles of test design are supposed to give us confidence when we finish gathering evidence and begin to interpret it.

• The teacher is the one who makes the diagnosis; no test can.

• Once again, it is the knowledgeable professional who is at the core of instruction that will lead to reading improvement.

Test Qualities

• 1. Suitability: • In selecting or preparing a test, the teacher

must determine not only whether it will yield the type of data desired but also whether the test is suitable for the age and type of students and for the locality in which they reside.

Test Qualities

• 2. Validity: • Educators often talk about the validity of a

test and generally define validity as the degree to which a test measures what we hope for it to measure.

• We can question a test's validity (i.e., whether the items relate well to the purposes we are measuring).

Test Qualities

• But we can also question the validity of inferences people make from the test.

• Individual student factors can affect the validity of evaluation, or the inferences educa-

tors make from tests.

Test Qualities

• Validity is among the main concerns diagnostic teachers have when the balance of assessment tips toward standardized testing.

• First, the standardized testing organization has almost 100 percent of the responsibility for ensuring the validity of the structure of the test and the individual items on the test.

• Teachers usually cannot examine the actual tests and items at any length.

Test Qualities

• Second, teachers seem to have few opportunities to explain or provide a rationale for individual student scores.

• Larger trends in score manipulation have revealed these frustrations.

• In the past, school administrators were known to purposefully exclude English Language Learners and students with diagnosed disabilities from standardized testing.

Test Qualities

• They knew the standardized system would not allow them to explain the special circumstances of these individuals, so they met this frustration by taking those students out of the scoring pool.

• Special education students are now included because of litigation to get them in.

Test Qualities

• According to the Standards for Educational and Psychological Testing, "Validity is the most important consideration in test evaluation.

• The concept refers to the appropriateness, meaningfulness, and usefulness of the specific inferences made from test scores.

Test Qualities

• Test validation is the process of accumulating evidence to support such inferences.

• If a test or test items are not valid, then one of two things has happened:

• (a) we measured something else, • or (b) nothing really got measured.

Test Qualities

• 3. Objectivity: The ways of giving an answer are controlled (such as true/false, multiple-choice, multiple-response, and matching questions).

• The range of answers is limited, usually to one acceptable response.

• The same score must result regardless of who grades the test.

Test Qualities

• Since essay questions allow for a variety of ways to express an answer in language, each scorer is likely to interpret essay responses differently.

• Developers of less controlled tests should give specific training for scorers and should make the essay questions as explicit and as plain as possible.

Test Qualities

• Objective testing puts strict limits on student responses.

• This constrains our ability to learn what they might know.

• So why would we want an objective test? • The key reason we administer objective tests

is to eliminate our own biases

Test Qualities

• Many of our biases are hidden to us. • For example, many teachers are surprised the

first time they observe themselves on video and learn that they tend to favor one side of their classroom when calling on students.

• Similar biases are likely to influence our scoring (measurement) and the inferences we make from scores (evaluation).

Test Qualities

• For example, sometimes teachers change their pattern of scoring essays from the beginning of a pile of papers to the end.

• Reading the early essays can affect the teacher's interpretation of those that follow.

• The scorer might also just be more tired at the end.

• Objectivity is an important principle to help us rule out biases in assessing student knowledge.

Test Qualities

• 4. Reliability: Reliability is about consistency. • In testing reading, we want to ensure that the

differences in individual scores actually come from a student's reading skill and not from how they received the instructions.

Test Qualities

• If teachers want reliability for a teacher-made test, they need to follow this same principle and write a script for delivery.

• These scripts should ensure that they do not have one way of giving the instructions on one day and another way the next.

Documents

Developing A Knowledge Base About Testing, Measurement, and Evaluation