Upload
mahmood-ahmed
View
2.448
Download
16
Embed Size (px)
DESCRIPTION
Mahmood Ahmed Assessment Assessment is only useful if it contributes to sound educational decision-making. In order to use this data effectively, beginning teachers must hav
Citation preview
Objective # 08
Identify Different Methods of Evaluation during Teaching
Liaquat University of Medical and Health Sciences
Jamshoro Sindh
College of Nursing JPMC Karachi
Senior Elective Practicum in Nursing Education
Identify Different Methods of Evaluation during Teaching
Mahmood Ahmed
Preceptor: Associate Professor Dr. M. Iqbal Afridi
Assessment
Assessment is only useful if it contributes to sound educational decision-making. In order to use this data
effectively, beginning teachers must have knowledge and skills in the areas of measurement fundamentals,
standardized tests and their interpretation, validity and reliability, constructing and using the results of formal
and informal assessments, and utilizing assessment information for the purposes of grading. Additionally,
educators in Minnesota must be knowledgeable about the Minnesota Academic Standards and how to measure
student progress toward meeting them. The paragraphs that follow describe the knowledge and skills we believe
our students must obtain in order to make effective assessment decisions.
Measurement Fundamentals: Educators must understand the basic terminology and concepts related to
assessment, evaluation, and measurement. The following concepts and definitions are of central importance to
this understanding:
Assessment vs. Evaluation: Though the terms assessment and evaluation are often used interchangeably
(Cooper, 1999), the Minnesota Department of Education differentiates between the two terms. They define
assessment as gathering information or evidence while evaluation involves using that information or
evidence to make judgments (Aune, 1999)
Teacher-Made vs. Standardized Assessments: In the broadest sense, assessments may be classified into
two categories: teacher-made assessments and standardized assessments. Teacher-made assessments are
constructed by an individual teacher or a group of teachers in order to measure the outcome of classroom
instruction. Standardized assessments, on the other hand, are commercially prepared and have uniform
procedures for administration and scoring. They are meant for gathering information on large groups of
students in multiple settings (Karmel and Karmel, 1978)
Criterion-Referenced vs. Norm-Referenced Assessment: Standardized assessments may be norm-
referenced or criterion referenced. Norm-referenced assessments compare individual students’ scores to
those of a norm-reference group, generally students of the same grade or age. They are designed to
demonstrate "differences between and among students to produce a dependable rank order" (Bond, 1996)
and are often used to classify students for ability grouping or to help identify them for placement in special
programs. They are also used to provide information to report to parents. Criterion-referenced tests, on the
other hand, determine the specific knowledge and skills possessed by a student. Thus, this form of testing
"uses as its interpretive frame of reference a specified content domain, rather than a specified population of
persons" (Anastasi, 1976). Competency tests, such as the Minnesota Basic Skills Tests, represent a specific
sub-category of criterion-referenced assessment and are used to insure that students possess minimal basic
skills (Biehler and Snowman, 1997)
Formative vs. Summative Evaluation: Formative evaluation involves "collecting, synthesizing, and
interpreting data for the purpose of improving learning or teaching (Airasian, 1997, p. 402). Thus, formative
assessment is used to provide feedback and not for grading. It typically occurs while instruction is ongoing.
Summative evaluation, on the other hand, involves "collecting, synthesizing, and interpreting information
for the purpose of determining pupil learning and assigning grades" (Airasian, 1997, p. 404). It typically
occurs at the end of instruction
Types of Standardized Tests: A description of all forms of standardized tests is far beyond the scope of this
document. Therefore, only those standardized tests most central to educational decision-making, individual
and group tests designed to measure intelligence and academic achievement will be included in this brief
summary:
Intelligence Tests: Intelligence tests are often classified into two categories: individual and group.
Individual intelligence tests, such as the Stanford-Binet and Wechsler Scales, are given in a one-to-one
setting by a trained examiner,. They are most frequently given as part of an overall psychological
evaluation, often to determine if a student is eligible for special education. Though group intelligence tests,
such as the Otis-Lennon Mental Ability Tests, are "no longer administered to all students everywhere"
(LeFrancois, 1999), some school districts still include them in their annual testing programs (Biehler and
Snowman, 1997). Results of intelligence tests, whether group or individual, are often reported as
intelligence quotients or IQ scores. Though IQ scores used to be determined by calculating a ratio (mental
age divided by chronological age multiplied by 100, thus the term intelligence "quotient"), they are now
typically calculated as standard scores with a mean of 100 and a standard deviation of about 15. Though the
results of individual intelligence tests tend to be more valid and reliable than those of group tests, teachers
must be aware of the following limitations of intelligence tests noted by Biehler and Snowman (1997):
They generally only sample abilities that relate to classroom achievement rather than overall intellectual
functioning. Therefore, many educators prefer the term scholastic aptitude test
Intelligence test results provide only an estimate of a child’s abilities to deal with "certain kinds of
problems at a particular point in time". Because of this, their results can vary over multiple
administrations
The tests may not provide a valid estimate of the abilities of minority and low income children.
Therefore, teachers must exercise caution when making educational decisions based on the results of
these tests
Traditional Tests of Academic Achievement: As is the case for intelligence tests, individual achievement
tests such as the Wide Range Achievement Test and Peabody Individual Achievement Test, are most
commonly administered to students who have been referred for possible placement in special education or
remedial programs. Group achievement tests, such as the California Test of Basic Skills, on the other hand,
are administered either annually or at planned intervals as part of the district-wide testing program in order
to certify students’ achievement and provide information to parents (LeFrancois, 1999)
Assessment Data Interpretation: To understand and appropriately use the results of standardized
assessments, educators must have a working knowledge of descriptive statistics, including measures of
central tendency, measures of dispersion, norms, and standard scores
Measures of central tendency include the mean, median, and mode. The mean is the arithmetic average
and is obtained by adding all scores and dividing by the total number of scores. It is especially important
because it is a necessary statistic for the calculation of standard scores. The median is the middle score of a
distribution. In cases where there are extreme scores (or outliers), the median may be a better measure of
central tendency than the mean. The mode is the most frequent score in a distribution. In large, normally-
distributed populations, the mode does not differ greatly from the mean and median. However, in
distributions with small numbers, it is typically the least useful of these three statistics (Glass and Stanley,
1970)
Measures of dispersion include the range and standard deviation. The range is the spread of scores in a
distribution (Vogt, 1993). It may be calculated by subtracting the lowest score from the highest and adding
one. It is, at best, a rather crude statistic because it is based on only the two most extreme scores in the
distribution. The standard deviation is a more precise and useful measure of dispersion. It is calculated by
finding the average of the absolute distance scores vary from the mean. Calculating the standard deviation is
essential for determining standard scores such as those described below (Anastasi, 1988)
Norms: The results of standardized tests are often reported as norms, which are statistics that allow one to
interpret the score of an individual student in comparison to others of the same age or grade level. They
include percentiles (the percentage of scores in the norm-reference group that fall at or below that of a
particular student), grade equivalents (which describe a student’s performance in terms of school grade
levels), and standard scores such as Z-scores, T-scores, stanines, and I.Q. scores (all of which indicate how
far a student’s score varies from the mean in standard deviation units). It is important for teachers to
understand the meaning of norms so that they are able to use the results of standardized tests to interpret
them to parents and effectively plan instruction
Validity and Reliability: In order to make effective decisions based on assessment data, the instruments
used to collect that data must be valid and reliable
Validity is defined as "the extent to which a test measures what it intends to measure" (Lefrancois,
1999). The validity of a test may be determined in different ways including carefully examining the
test’s content in regard to the curriculum (content validity), determining the degree to which test results
reflect a theory or construct (construct validity), and examining the extent to which assessment results
can be used to predict student performance (predictive validity). Of the various types of validity,
content validity is by far the most important for classroom assessments, as it is essential that the
content of teacher-made assessments accurately reflects the instructional outcomes being assessed
Reliability refers to the consistency or stability of scores yielded by a test (Airasian, 1997). In other
words, a reliable test yields consistent scores over repeated administrations (repeated-measures
reliability). A concrete estimate of a test’s reliability is provided by the standard error of measurement,
which is an index of the amount of error or unreliability in the scores yielded by a test. Though
reliability and standard error of measurement are important considerations in choosing standardized
tests, it is often difficult to determine the reliability and standard error of a teacher-made test. However,
teachers should be knowledgeable regarding ways to increase the reliability of classroom assessments
such as those described by Aiarasian (1997)
Constructing Formal Assessments: Formal, teacher-made assessments can be classified as traditional or
alternative. Traditional assessments are typically paper-and-pencil tests and are often categorized as objective
or essay. Alternative assessments are most often performance-based. The type of assessment one chooses is
dependent on many factors including grade level, content, and time constraints. Regardless of the assessment
format, good assessments have three features in common:
The assessment exercises and questions are related to the teacher’s objectives and instruction
The exercises and questions cover a representative sample of what students were taught
The items, directions, and scoring procedures are clear and appropriate" (Airasian, 1997). Additionally,
as noted by Aune (2000), effective teachers typically employ multiple assessments, including both
traditional and alternative
Traditional Assessments: Objective assessments are traditional assessments on which students are
expected to provide the one, correct answer. Typical objective assessment formats include multiple choice,
true-false, matching, completion, and short answer. All of these assessments have the advantage of
sampling students’ knowledge of a wide range of content (and, therefore, have the potential for good
content validity) in a minimal amount of time. A major disadvantage of objective test items is that they
typically measure only student learning at the knowledge and comprehension levels of Bloom’s
Taxonomy. However, multiple choice items can be constructed to sample higher cognitive levels (Airasian,
1997). In order to construct effective objective assessments, teachers must be familiar with and apply
guidelines for effective test construction such as those described by Nitko (1996), Oosterhof (1999), and
Airasian (1997)
The Essay Test is another form of traditional, paper-and-pencil assessment. It is an excellent format for
assessing students’ abilities to communicate ideas in writing. Other advantages of essay tests include
measuring higher cognitive levels and directly measuring behaviors specified by performance objectives
(Ooosterhof, 1999). However, essay tests have disadvantages as well in that they sample less content than
objective tests and, therefore, may have poorer content validity. Additionally, essay tests are time-
consuming to score, and their scoring tends to be less reliable (Oosterhof, 1999). Because of these potential
disadvantages, it is especially important for teachers to follow appropriate guidelines for constructing and
scoring essay tests such as those described by Nitko (1996), Oosterhof, 1999), and Aarasian (1997)
Alternative Assessments: The most common form of alternative assessment is performance assessment,
which can be defined as an assessment activity requiring students to use their knowledge and skills to
perform complex tasks or solve problems (Biehler and Snowman, 1997). Though the term authentic
assessment is often used interchangeably with performance assessment, Oosterhof (1999) differentiates
between the two, defining authentic assessments as tasks that require "a real application of a life skill
beyond the instructional context” Therefore, according to Oosterhof, "All authentic assessments are
performance assessments, but the inverse is not true". A major advantage of performance assessments is
that they allow evaluation of skills that cannot be easily measured by paper-and-pencil tests. Additionally,
they allow for evaluation of the process as well as the product. They are, however, time-consuming to
administer and, therefore, typically do not allow one to sample a wide range of outcomes. Consistency of
scoring (reliability) is also problematic (Oosterhof, 1999). Like essay tests, performance assessments can
be scored analytically or holistically. Analytical scoring involves breaking the desired response into its
component parts and using checklists or rating scales to calculate a score (Oosterhof, 1999). Holistic
scoring is typically done using a scoring rubric which allows for comparisons of students’ work to
"descriptions of performances that range from higher to lower". Wiggin’s has noted, "scoring rubrics must
be based on a careful analysis of existing performances of varying quality"
Portfolios: According to Arter (1995), "a portfolio is a purposeful collection of student work that tells the
story of student achievement or growth". Though portfolios are often considered to be a form of authentic
or performance assessment, as Oosterhof (1999) noted, they typically include materials that are not
authentic or performance-based. Portfolios are particularly useful for tracking change or growth in student
performance over time and are most effective when they "are characterized by a clear vision of the student
skills to be addressed, student involvement in selecting what goes into the portfolio, use of criteria to
define quality performance and provide a basis for communication, and self-reflection through which
students share what they think about their work, their learning environment, and themselves" (Arter, 1995)
Informal Assessment: As noted by Oosterhof (1999), the majority of classroom assessments are informal
in nature. They typically take place during instruction and allow the teacher to monitor student learning
and make any necessary adjustments. These informal assessments most often take the form of observations
and questions. Though these techniques are efficient and adaptable, their technical quality "tends to be
inferior to techniques associated with formal assessments". However, teachers can improve their use of
informal questions by basing them on instructional goals, allowing sufficient wait-time, and recognizing
the importance of teacher reactions to student answers. The effectiveness of observations can be improved
by keeping anecdotal records, using informal checklists and recognizing one’s own bias (Good and
Brophy, 1984)
Reporting Grades: Grades serve the important purposes of communicating the extent to which learners
have achieved classroom goals and communicating this information to students and their parents (Tombari
and Borich (1999) suggest eight steps for teachers to follow when constructing a grading system:
Identify district policy
Determine the meaning of each grading symbol
Distinguish between reporting and grading factors
Identify grade components
Decide on component weights
Determine how components will be combined
Choose a method for calculating grades
Decide how to deal with borderline grades. Methods for calculating grades include norm-referenced
grading, criterion-referenced grading, and individual-referenced grading
Norm-referenced (or relative) grading involves comparing "a pupil’s performance to that of other pupils in
the class" (Airasian, p. 301, 1997). Grading "on the curve" is an example of norm-referenced grading
Criterion-referenced grading involves comparing a "pupil’s performance to a predetermined standard
of mastery" (Airasian, 1997). Calculating grades based on fixed ranges of cumulative scores is an
example of criterion-referenced grading (Tombari and Borich, 1999)
Individual-referenced grading involves comparing pupils’ performance to their perceived abilities.
Though this form of grading is sometimes used for students with disabilities, Airasian, (1997)
recommends that other grading approaches such as contract grading, IEP-based grading, or narrative
grading be used for determining these students’ grades
Assessment of Performance Toward education Standards: Teachers must be knowledgeable about the
Standards so that they can facilitate and assess student progress toward their attainment. The graduation
requirements consist of two parts: the Basic Standards and the Academic Standards
Basic Standards: The Basic Standards specify the minimum skills students must possess in order to
graduate from a public high school . These skills are assessed by the Basic Standards Tests, which are
competency tests designed to assure that students have basic skills in mathematics, reading, and
composition. Students take the math and reading tests during 8th grade and the writing test during 10th
grade. They must pass these tests in order to obtain a high school diploma. Those who not pass a test on
their first attempt may retake it annually until they achieve a passing score
The Academic Standards define a new core of five academic content standards areas: language arts,
mathematics, science, social studies and the arts. Standards for Mathematics, Language Arts, and Arts.
Each of the academic standards will be supplemented by grade-level benchmarks. These benchmarks will
specify the academic knowledge and skills that students must achieve to complete a state standard and will
determine the content of Comprehensive Assessments, which are used to assess compliance with the
standards of education
Measuring scales:
The application of statistics in research is well documented. Before choosing a statistical method for
your own research project, knowledge regarding scales of measurement is a prerequisite. Scales of measurement
have to do with the allocation of numerical values to characteristics according to certain rules. Measurement can
thus either be quantitative or qualitative. The quantitative level of measurement includes among other things,
aspects such as interpretation and paragraph analysis, whilst the quantitative level of measurement focuses on
measures such as nominal, ordinal, internal and ratio levels of measurement. The latter are basic scales of
measurement and will be briefly outlined.
Measuring scales
Nominal measurement: Nominal measurement includes the awarding of a numeral value to a specific
characteristic. This type of measurement is the most basic form of measurement, because it measures the
lowest level that can be measured and is therefore considered a scale of measurement with limitations. The
following serves as an example of nominal measurement: A researcher wants to determine the profile of the
academic background of his students. For this he/she might need information regarding the specific level
(HG, SD, LG) his students passed during their matriculation examination
Ordinal measurement: Ordinal measurement is applicable in cases where a criterion/characteristic is
awarded to numeral value in terms of a specific order. The ordinal scale implies that the entity being
measured is quantified in terms of higher or lower, greater or lesser without specifying the size of the
intervals (Leedy 1993). The numerical 1 can be the highest, whilst 3 could be the lowest
Interval measurement: The interval scale of measurement is characterized by two features, namely: equal
units of measurement (equal intervals); and a zero point which has been established arbitrarily (Leedy
1993). The latter indicates that there is not an absolute zero point. There is therefore a specific: relationship
between the distance of the numerical value and the different sizes of a characteristic. Because of the before
mentioned characteristics, this measure scale is considered to be a more advanced type of measuring scale.
An increase or decrease of the one characteristic goes hand in hand with an increase or decrease of the other.
The interval level of measurement enables the researcher to compromise between aspects and to indicate
clearly how much more the one has of a characteristic than the other. The interval scale of measurement is
therefore suitable to calculate arithmetic mean averages, do standard deviations and determine correlation
studies, provided that the researcher takes care that the preconditions set for each scale of measurement, are
abided by
Ration measurement: This is considered the highest order of measurement that exist, because of the fixed
proportions (ratio) between the number and the amount of the characteristic; that it represents. What should
be mentioned is that, when ration levels are measured, a fixed (absolute) zero point exists. Ration level of
measurement thus enables researchers to determine whether aspects possess something of a characteristic or
not
Characteristics of measuring scales: With any type of measurement, two considerations are important -
validity on the one hand and reliability on the other hand
Reliability is the term used to deal with accuracy. A scale measurement is considered reliable if it measures
that which it is supposed to measure. Further refining of the term reliable is that, when a test is repeated by
the same researcher with a different group representing the original group, the same results should be
obtained
Validity is concerned with the soundness and the effectiveness of the measuring instrument (Leedy 1 993).
Four types of validity stand out, namely: content validity, prognostic validity, simultaneous validity; and
construct validity
Classification of statistical methods: Statistical methods in the broadest sense are classified into two main
group’s namely descriptive and inferential statistics
Descriptive statistics: Smit (1983) sees descriptive statistics as the formulation of rules and procedures
according to which data can be placed in useful and significant order. Landman (1988) states that
descriptive statistics deals with the central tendency, variability (variation) and relationships (correlations)
in data that are readily at hand. The basic principle for using descriptive statistics is the requirement for
absolute representation of data. The most important and general methods used are: ratios, percentages,
frequency tables, and distribution tendency
The histogram. The histogram is a graphic representation of frequency distribution and is being used to
represent simple frequency distribution. Characteristic is a vertical line (the y axis/ordinate) at the left
sideline of the figure and the horizontal line (x axis) at the bottom. The two lines meet at a 90 grade angle.
Because frequencies should be divided into class intervals, the benefit of graphic presentation is that data
can be observed immediately
Frequency polygon. The frequency polygon does not differ basically from the histogram, but is only used
for continual data. Instead of drafting bars for the complete histogram, a dot indicating the highest score is
placed in the middle of the class interval. When the dots are linked up, the frequency polygon is formed.
Usually an additional class is added to the end of the line in order to form an anchor
Cumulative frequency curve. The frequency on the frequency table is added, starting from the bottom of the
class interval, and adding class by class. The cumulative frequency in a specific class interval can then
clearly indicate how many persons/ measurements perform below or above the class intervals
Percentile curve. The cumulative frequency can also be converted into percentages or proportions of
distribution
Line graphic. During graphic presentations the historical line (X axis) indicated the scale of measurement,
whilst the vertical line (Y axis) indicated the frequency. In the case of a line graphic, both axes (X and Y)
are used to indicate the scale of measurement with the aim of indicating a comparison between two
comparable variables (Smit 1983)
Central tendency is defined as the central point around which data revolve. The following techniques can
be employed
The mode is defined as the score (value or category) of the variable which is observed most frequently
Median indicates the middle value of a series of sequentially ordered scores. Because the median divides
frequencies into two equal parts, it can also be described as being the fiftieth percentile
Arithmetic mean refers to a measure of central tendencies found by adding all scores and dividing them by
the number of scores
Standard deviation is a measure of the spread of dispersion of a distribution of scores. The deviation of
each score from the mean is squared; the squared deviations are then summed, the result divided by N-1,
and the square root taken (Landman 1988)
Inference statistics: apart from descriptive statistics that deal with central tendencies, statistical methods
enabling researchers to go from the known to the unknown data also exist. This is to say to make deductions
or statements regarding the broad population as the samples from which the 'known' data are drawn. These
methods, according to literature are called inferential or inductive statistics (Landman 1989). These methods
include estimation, predictions, hypothesis testing and so forth
In conclusion the role of statistical methods in research is to enable the researcher to accurately utilize the
gathered information and to be more specific in describing his findings. For more details on statistical
calculations you are referred to Huysamen (1976)
Objectives types of questions:
The most distinguishing characteristic of objective-style questions is that they have highly specific,
predetermined answers requiring a very brief response. Evaluation should be a learning experience for both
students and the teachers. Considerable effort needs to be made to develop and use objective-style questions in a
more versatile manner to include questions which demand higher level thinking. The objective item includes a
variety of different types, but they can be classified into two types, as supply (that require the pupil to supply
the answer) and Select (that require the pupil to select the answer from a given number of alternatives).
Objective Type Questions
Supply type Question Select Type Question
Short answer True -False
Completion Matching Items
Multiple choice questions
Factors affecting objectives types of questions:
Objectives types of questions are often used for the wrong reasons that is, they are faster and easier to mark
than other types of evaluation
Well designed objectives types of questions, can quickly and efficiently evaluate a wide variety of skills and
deal with a large amount of subject matters
The most effective use of well-designed objectives types of questions involves the evaluation of students’
ability to use a wide variety of skills including skills of higher levels of thinking
The teacher should be aware that the objectives types of questions may be inappropriate in certain
situations, as it is inappropriate in Grades 1 to 4
Important considerations in writing objective items:
Test for important facts and knowledge
Tailor the questions to fit the examinees’ age and ability levels as well as the purpose of the test
Write the items as clearly as possible
Clarity can be improved by using good grammar and sentence structure
If the purpose of the test item is to measure understanding of a principle rather than computational
skill, use simple numbers and have the answer come out a whole number.
Avoid lifting statements verbatim from the text
Avoid using interrelated items
There should be only one correct (best) answer
Avoid negative questions whenever possible
Don’t give the answer away
Get an independent review of your test item
Construction of objective Test Items:
The construction of good test items is an art. Needed a though grasp of subject matter, a clear conception
of the desired learning outcomes, a psychological understanding of pupils, sound judgment, persistence, and a
touch of creativity. The only additional requisite for constructing good test items is the skillful application of an
array of simple but important rules and suggestions.
Checklist for reviewing objective test items
S. No. Review Questions Yes No
1. Are the instructional objectives clearly defined?
2. Did you prepare a test blueprint? Did you follow it?
3. Did you formulate well-defined, clear test items?
4. Did you employ “correct” English in writing the items”
5. Is the most appropriate type of item to use for the intended learning
outcomes?
6. Has the text book language been avoided?
7. Did you avoid giving clues to the correct answer? For example,
grammatical clues, length of correct response clues?
8. Did you test the important ideas rather than the trivial?
9. Did you adapt the test’s difficulty to your students?
10. Did you cast the items in positive form?
11. If negative items were used, did you draw the students’ attention to
them?
12. Did you prepare a scoring key? Does each and every item have a single
correct answer?
13. Did you review your items? Yourself? Another teacher?
14. Have the units been indicated when numerical answers are expressed in
units?
15. Has the degree of precision been indicated for numerical answers?
16. Have the items been phrased so as to minimize spelling errors?
17. If revised, are the items still relevant to the intended learning outcomes?
18. Have the items been set aside for a time before reviewing them?
Comparative advantages of objective and essay Tests
Objective Test Essay Test
Learning
Outcomes
measured
It is efficient for measuring
knowledge of facts. Some types (e.g.
multiple choices) can also measure
understanding, thinking skills, and
other complex outcomes.
It is inefficient for measuring
knowledge of facts. Can measure
understanding, thinking skills, and other
complex outcomes especially useful where
originality of response is desired).
It is appropriate for measuring ability
Inefficient or inappropriate for
measuring ability to select and
organize ideas, writing abilities, and
some types of problem-solving skills.
to select and organize ideas, writing
abilities, and problem-solving skills
requiring originality.
Preparation
of
questions
A relatively large number of
questions is needed for a test.
Preparation is difficult and time-
consuming.
Only a few questions are needed for a
test.
Preparation is relatively easy (more
difficult than generally assumed).
Sampling
of course
content.
Provides an extensive sampling
of course content because of
questions that can be included in a
test.
Sampling of course content is usually
limited because of the small number of
questions that can be included in a test.
Control of
pupil’s
response.
Complete structuring of task
limits pupil to type of response
called for.
Prevents bluffing and avoids
influence of writing skill, through
selection-type items are subject to
guessing.
Freedom to respond in own words
enables bluffing and writing skill to
influence the score, through guessing is
minimized.
Scoring
Objective scoring that is quick,
easy, and consistent.
Subjective scoring that is slow, difficult,
and inconsistent.
Influence
on
learning.
Usually encourages pupil to
develop a comprehensive knowledge
of specific facts and the ability to
make fine discriminations among
them.
Can encourage the development
of understanding, thinking skill, and
other complex outcomes if properly
constructed.
Encourages pupils to concentrate on
larger units of subject matter, with special
emphasis on the ability to organize,
integrate, and express ideas effectively.
May encourage poor writing habits if
time pressure is a factor (it almost always
is).
ReliabilityHigh reliability is possible and is
typically obtained with well-
constructed tests.
Reliability is typically low, primarily
because of inconsistent scoring.
Advantages of objective type of evaluation
A well-designed test can quickly evaluate a wide variety of skills and deal with a large amount of subject
matter
Objective type of questions presents tasks for which the solutions are predetermined and presented among
pairs or groups of responses
In objective type of questions which don’t require students to write out the answers, their writing skills,
including grammar and spelling, and neatness doesn’t influence the grade of students receive
The result of the test can be analyzed to enable a teacher to look at the responses pattern to the individual
question, groups of questions and the test as a whole to determine suitability and clarity of the questions,
topic, content or skills which proved too demanding for the students
The making of the answer is unbiased, a teacher’s opinion or preconception of a student’s work will not be a
factor
They avoid the discrepancies often found in evaluation of essay-style answer
They are effective for testing large group in case of limited time
Disadvantages of objective type of evaluation:
Frequently the objectives types test requires only the recall of facts
Over emphasis upon the objectives types test will not allow students to practice and demonstrate writing
skills
Sometimes objective type tests evaluate something other than what is intended
Objective-type tests often require a disproportionate amount of reading time
Objective-type questions are inappropriate for students in Grades 1 to 4
Great care must be taken in composing these questions and their expected responses, because a specific
predetermined answer is expected
Preparation is very time-consuming
Objective type questions often promote uneducated guessing
Many students may be able to select the correct answer without really understanding the response
They should be used only rarely to evaluate students’ language skills; because most objective type questions
don’t measure language acquisition or development as effectively as other forms of evaluation outlined in
this document
Some types of objective-style questions fail to provide clues to the thinking processes of individual students
True–false or alternative-response items:
The True–false or alternative response test items consist of a declarative statement that the pupil is asked
to mark true or false, right or wrong, correct or incorrect, yes or no, fact or opinion, and agree or disagree and
the like. In each case there are only two possible answers. The true-false item is essentially a two-response
multiple-choice item in which only one of the propositions (answer) is presented and the student judges the
truth or falsity of the statement and selects one of the two possible answers. Variations, however, deviate
considerably from the simple true-false pattern and have their own characteristics. For this reason, some prefer
the more general category, alternative-response item. We shall retain the more commonly used true-false
designation. This type of item is used for measuring simple knowledge outcome when only two alternatives are
possible or the ability to identify the correctness of statements of important fact. It is also adaptable to
measuring the ability to distinguish fact from opinion and the ability to recognize cause and affect relationships.
Types of true-false questions:
True and false
Cluster variety
Correction variety
Versatility of the true-false item:
Testing for factual knowledge
Testing for comprehension or understanding
Testing for application
Testing for deductive skill
Testing for problem-solving ability
Criticisms of true-false items:
They tend to test almost exclusively the ability to memorize rather than to apply more complex thinking
skills
The result may be unreliable, because a uniformed student with weak skills or a lack of knowledge has a
50/50 chance of guessing the correct response
Absolutely true or false statements, with no qualifications or exceptions, are difficult to design. The
statements should be qualified and, therefore, suspect a trick or a trap
Limitations:
Susceptibility to guess
Guidelines for constructing true-false questions:
Word the statements simply and clearly. Vague or ambiguous wording may confuse the students
Avoid over-generalizing, because generalization are seldom unqualifiedly true
Avoid trick questions. They promote mistrust and resentment
Don’t use trivial statements in order to “pad out” the number of questions and marks to arrive at a
predetermined total
Don’t take exact statements from texts or notes
Statement should be entirely true or entirely false
Avoid universal descriptors such as “never, none, always, and all”
Avoid negative words, because they are often overlooked by students. Double negatives often increase
confusion and really test logic rather the knowledge
Don’t use long, complicated sentences or unnecessarily difficult words
Don’t include two ideas in one statement unless you are evaluating students’ understanding of cause and
effect relationship
Provide a “T” and “F” besides each statement and ask the students to circle the one they consider to be
correct
Students can be asked not only to indicate the true or false, but also to correct by changing word, phrases or
clauses in them
Suggestions for constructing true-false items:
Avoid broad general statements if they are to be judged true-false
Avoid trivial statements
Avoid the use of negative statements, especially double negatives
Avoid long, complex sentences
Avoid including two ideas in one statement, unless cause-effect relationship are being measured
If opinion is used, attribute it to some source, unless the ability to identify opinion is being specifically
measured
True statements and false statements should be approximately equal in length
The number of true and false statements should be approximately equal
Avoid ambiguous words and sentences
True-false items must be based on statements that are clearly true or false
Avoid trick questions
When the true-false item is used to test for cause-and-effect relationship, we strongly recommend that the
first proposition in the statement always be true, with the subordinate clause being written as either true or
false
Word the item so that superficial knowledge suggests a wrong answer
Make the wrong answer consistent with a popular misconception that is totally irrelevant to the item
Avoid specific determiners
Avoid matching true statements consistently longer than false statements
For the correction type of true-false item, underline the word(s) to be corrected
Uses of the true-false items:
The use of true-false items is limited by the difficulty of constructing clue-free items that measure
significant learning outcome, the susceptibility of this type to guessing, the low reliability to each item, and the
general lack of diagnostic value. They may well be restricted to those areas for which other type of items are
inappropriate. When used, special efforts must be made to formulate statements that are free from ambiguity,
specific determiners, and clues
The most common use is in the ability to identify the correctness of statements of fact, definitions of terms,
statements of principles, and the like
In measuring abilities of the pupils to distinguish fact from option, superstition from scientific belief,
relevant from irrelevant information, valid from invalid conclusions
In measuring pupils’ ability to identify cause and effect relationship
In measuring knowledge and understanding
To measure simple aspects of logic
Checklist for reviewing true-false items
S. No. Review Questions Yes No
1. Is the most appropriate type of item to use?
2. Has the text book language been avoided?
3. Can each statement be clearly judged as true or false?
4. Have the trivial statements been avoided?
5. Have negative statements (especially double negative) been avoided?
6. Have the items been stated in simple, clear language?
7. Are opinion statements attributed to some source?
8. Are the true and false items approximately equal in length?
9. Are there an approximately equal number of true and false items?
10. Has a detectable pattern of answer (e.g., T, F, T, F) been avoided?
11. If revised, are the items still relevant to the intended learning outcome?
12. Have the items been set aside for a time before reviewing them?
13. Have the specific determiners (e.g., usually, always) been avoided?
Advantages of true-false items:
There is an apparent ease of construction
A considerable amount/ breath of course/subject material can be tested / asked / scored in a given time
period than other type of objective items. A student can answer about three true-false items for every two
multiple-choice items
Bi-thinking (two thinking) skills lend themselves to the use of true-false questions as distinguishing between
fact and opinion and identifying cause and effect relationship
The perceived advantage is that the use of this type of questions is a way of modifying and differentiating
evaluation for some students
True-false questions can be scored quickly and objectively
True-false items are good for young children and / or pupils who are poor readers
True-false questions provide high reliability per unit of testing time
True-false questions can be scored quickly, reliably, and objectively by clerks
They are suitable for testing beliefs in popular misconception and superstition
True-false questions are adaptable to most content areas
True-false questions can measure the higher mental processes of understanding, application, and
interpretation
Disadvantages of true-false items:
Pupils’ score true-false tests may be influenced by good or bad luck in guessing
They lend themselves most easily to cheating
They tend to be less discriminating, item for item, than multiple-choice tests
They are susceptible to an acquiescence set; and subjects tend to develop a pattern of responding in a
somewhat automatic form without really giving thought to the item
There are many instances when statements are not unequivocally true or false; rather there are degrees of
correctness
Specific determiners are more prevalent in true-false items than in any other objective-item
The apparent ease of construction is based on the frequent practice of lifting statements from the text and
changing some of them to false statements. The correct statements, and those that have been made incorrect
statements, then become the test items. Such a practice can cause ambiguity and promotes guessing
Large areas in all subjects can’t be phrased in absolutely true or false statements. In most cases only the
most trivial statements can be reduced to absolute terms
For most part, this type of question tests only simple knowledge recall. Even distinguishing between fact
and opinion, and identifying cause and effect relationships can be measured more effectively by other means
They create problems for the students as:
The test creates a more stressful atmosphere for the already insecure students because it concentrates on
very specific items
The way the items worded makes the test more a reading test than a test of knowledge
They don’t provide diagnostic information and may be answered correctly on the basis of
misinformation or answered incorrectly as a result of the student misreading or misinterpretation
This form of questioning promotes random guessing
Matching Exercises:
In its traditional form, the matching exercise consists of two parallel columns. Matching questions are made up
of list of words, phrases, statements, symbols, or numbers which are to be matched to another list. Items
which the student is asked to match are usually in the left hand column, and are called premises. Items from
which the selection is to be made, usually present at the right hand column and are called responses.
Uses of matching Exercises:
The typical matching exercise is limited to measuring factual information based on simple associations
In measuring the ability to identify the relationship between two things
The matching exercise has also been used with pictorial materials in relating pictures and words and to
identify positions on maps, charts, and diagrams
Limitations:
It is difficult to find out homogeneous material that is significant from the viewpoint of our objectives and
learning out comes
It is restricted to the measurement of factual information based on rote learning and it is highly susceptible
to the presence of irrelevant clues
Good matching items require a high degree of skills
Suggestions for constructing matching Items {
Use only homogeneous material in a matching exercise
Include an unequal number of responses and premises, and instruct the pupil that responses may be used
once, more than one, or not at all
Keep the list of items to be matched brief, and place the shorter responses on the right
Arrange the list of responses in logical order
Indicate in the directions the basis for matching the responses and premises
Place all the items for one matching exercise on the same page
The responses list should consist of short phrases, single words, or numbers
Each matching exercise should consist of homogeneous items
Keep each list relatively short
Avoid having an equal number of premises and responses
Arrange the answers in some systematic fashion
Avoid giving extraneous irrelevant clues
Explain clearly the basis on which the match is to be made
Advantages of matching questions:
• Matching questions can evaluate a large amount of related factual material quickly due to their compact
form
• These questions seen to be easy to construct
• These questions are easy to score
• As they contain the correct answers and assisted the student in answering correctly
• They require relatively little reading time for the solution of many questions
• Like other objective type they are amenable to machine scoring. Even they are hand-scored, they can be
scored more easily than essay and short-answer
• This is a mixed blessing, leads to the excessive use of matching exercises and a corresponding overemphasis
on the memorization of simple relationships
Disadvantages of matching questions:
• The matching lists may encourage serial memorization rather than association, if sufficient care is not taken
• It is difficult to get clusters of questions that are sufficiently alike that a common set of responses can be
used
• They are usually limited to the evaluation of factual recall
• They often require more reading, organizing, and thinking skills than assumed and may not be appropriate
for students experiencing difficulties
• The format of these questions can be confusing to students
• If matching questions have too many items, they may provide more of an exercise in searching than an
evaluation of knowledge
• They emphasize rote memorization rather than thinking
• It is difficult to develop a list of items in which all the choices are plausible match, but only one of which is
correct
• It is difficult to find enough significant and homogeneous material to construct a suitable question
Guidelines for constructing matching questions:
• Provide clear instructions on how to indicate the correct answers
• Indicate whether the same response may be used more than once
• Maintain a grammatical consistency within and between columns
• Ensure that any matching question appears entirely on one page
• Provide an unequal number of premises and responses. In general the number of responses will be greater
than the number of premises
• Avoid designing questions in which the students are asked to draw connecting lines or arrows from the
premise to the response
• Make sure that the lists are homogeneous
• Make the wording of the premises longer then the wording of the responses. The students read and select
quickly to the longer premise first
• When using two lists, it is helpful to identify the items in one list with numbers and those in the second list
with letters
• Consider designing three-columns matching questions in order to encourage higher level of thinking. This
type of question is best suited for students in grades 11 and 12
Checklist for reviewing matching items
S. No. Review Questions Yes No
1. Is the most appropriate type of item to use?
2. Is the material in the two lists homogeneous?
3. Is the list of responses longer or shorter than the list of premises?
4. Are the responses brief and on the right-hand side?
5. Have the responses been placed in alphabetical or numerical order?
6. Do the directions indicate the basis for matching?
7. Do the directions indicate that each response may be used more than
once for matching?
8. Is all of each matching item on the same page?
9. Has the text book language been avoided?
10. Can each statement be clearly judged as matching item?
11. Have the trivial statements been avoided?
12. Have the items been stated in simple, clear language?
13. Are there an approximately equal number of premises and responses?
14. Has a detectable pattern of answer (e.g., 1=A, 2=B,3=C) been avoided?
15. If revised, are the items still relevant to the intended learning outcomes?
16. Have the items been set aside for a time before reviewing them?
17. Can each item be answered in a word, a phrase, with a symbol, formula,
or short sentence?
18. Are all irrelevant clues avoided? Grammatical? Length of blank?
19. Do computational problems indicate the degree of precision required?
20. Do the blanks occur near the end of the sentence?
21. Have only key words been omitted?
22. Was excessive mutilation kept to a minimum?
23. Are the items technically correct?
24. Has the scoring key been prepared?
25. Is this format most efficient for testing the instructional objectives?
26. Are both lists between 5 and 12 entries?
Multiple –choice question:
A multiple-choice item consists of a problem and a list of suggested solutions. The problem may be
stated as a direct question or an incomplete statement and is called the stem of the item. The list of suggested
solutions may include words, numbers, and symbols and are called alternatives (also called choice or options).
The pupil is typically requested to read the stem and the list of alternative and to select the one correct, or best,
alternative. The correct alternative in each item is called the answer, and the remaining alternatives are called
distracters. These incorrect alternatives receive their name from their intended function- to distract those pupils
who are in doubt about the correct answer.
The multiple–choice items are generally recognized as the most widely applicable and useful type of
objective test. It can more effectively measure many of the simple learning outcomes measured by short-answer
item, the true-false item, and the matching exercise. In addition, it can measure a variety of the more complex
outcome in the knowledge, understanding, and application areas. This flexibility, plus the higher quality items
usually found in the multiple-choice form, has led to its extensive use in achievement testing.
Types of Multiple –choice question:
Direct question form / one correct answer
Incomplete statement form
Best answer form / reverse multiple-choice type
Analogy
Uses of Multiple –choice question
Measuring knowledge outcome:
The knowledge of terminology
The knowledge of specific facts
The knowledge of principles
The knowledge of methods and procedures
Measuring outcome at the understanding and application levels:
Ability to identify application of facts and principles
Ability to interpret cause-and-effect relationship
Ability to justify methods and procedures
Suggestions for constructing multiple-choice of questions:
The stem of the item should be meaningful and should present a definite problem
Use a negatively stated item stem only when significant learning outcome require
All of the alternatives should be grammatically consistent with the stem of the item
An item should contain only correct or clearly best answer
Items used to measure understanding should contain some novelty
All distracters should be plausible
Verbal associations between the stem and the answer should be avoided
The relative length of the alternatives should not provide a clue to the answer
Use sparingly special alternatives such as “none of the above” or “all of the above”
Don’t use multiple-choice items when other item types are more appropriate
Break any of these rules when you have a good reason for doing so
How to write multiple-choice items:
The essence of the problem should be in the stem
Avoid repetition of word in the options
Avoid superfluous wording
When the incomplete statement format is used, the options should come at the end of the statement
Arrange the alternatives as simply as possible
Avoid highly technical distracters
Avoid using true-false distracters in them, as it will affect the test’s reliability
Avoid making the correct answer consistently longer than the incorrect one
Avoid giving irrelevant clues to the correct answer
`Consider providing an “I don’t know” option
The number of distracters to be used should be governed such factors, as age of children and nature of the
material
Limitations:
Its limitations are as the selection-type paper-and-pencil test and measures problem-solving behavior at
the verbal level only. Because it requires selection of the correct answer, it is inappropriate for measuring
learning outcome requiring the ability to recall, organize, or present ideas.
Checklist for reviewing multiple-choice items
S. No. Review Questions Yes No
1. Is the most appropriate type of item to use?
2. Does each item stem present a meaningful problem?
3. Are the item stems free of irrelevant material?
4. Are the item stems stated in positive (if possible)?
5. If used, has negative wording been given special emphasis as capitalized?
6. Are the alternatives grammatically consistent with the item stem?
7. Are the alternative answers brief and free of unnecessary words?
8. Are the alternatives similar in length and form?
9. Is there only one correct or clearly best answer?
10. Are the distracters plausible to non-achievers?
11. Are the items free of verbal clues to the answer?
12. Are verbal alternatives in alphabetical order?
13. Are numerical alternatives in numerical order?
14. Have “none of the above” and “all of the above” been avoided (or used
sparingly and appropriately)?
15. If revised, are the items still relevant to the intended learning outcomes?
16. Have the items been set aside for a time before reviewing them?
17. Do the directions indicate the basis for multiple-type of items?
18. Has the text book language been avoided?
19. Have the trivial statements been avoided?
20. Have the items been stated in simple, clear language?
21. Has a detectable pattern of answer been avoided?
Advantages of multiple-choice items:
• Possibly the outstanding advantage is their versatility
• They can be scored quickly and accurately by machines, clerks, teacher aides, and even students themselves
• The degree of difficulty of the test can be controlled by changing the degree of homogeneity of the
responses
• Compared to true-false items, multiple-choice questions have a relatively small susceptibility to score
variations due to guessing because the probability of guessing a correct answers depends upon the number
of options
• They can provide the teacher with valuable diagnostic information, especially if all the responses vary only
in their degree of correctness
• They are easier to respond to and are better liked by students than true-false items
• The multiple-choice item has the tendency for pupil to give a different answer when the same content is
presented in a different form
• They have a directness which some other types of questions lack
• The form of multiple-choice questions, in which the question is followed by a number of responses, may
help the students to determine the appropriate response
• They can be analyzed and marked quickly and easily
Disadvantages of multiple-choice items:
• They are very difficult to construct
• There is a tendency for teachers to write multiple-choice items demanding only factual recall
• Of all the selection-type objective items, the multiple-choice item requires the most time for the students to
respond, especially when very fine discriminations have to be made
• Research has shown that test-wise students perform better on multiple-choice item than do non-test-wise
students, and that multiple-choice tests favor the high-risk-taking student (Rowley, 1974)
• It takes considerable time to construct good questions, especially those that test higher level of thinking
• They can test a student’s reading ability more than any other skill
• Wording of multiple-choice question clearly is a demanding task
Guidelines for constructing multiple-choice questions:
• Put as mush of the pertinent information in the stem as possible in order to clarify the problem and reduce
the reading time for the choices
• When marking, always be prepared for unforeseen but valid student interpretations of questions and
responses
• Avoid repetition of key words in the correct response
• Avoid giving away the correct response by building too obvious a connection between the stem and the
correct response
• Multiple-choice questions usually include either four or five choice
• Don’t use “all of the above” or “none of the above” as throw-away destructors, only to get a fourth or fifth
possible choice
• Before administering the test to students, try out questions on an informed colleague to ascertain if they are
clear, accurate and fair
• By composing a few questions after each lesson or two, teachers can reduce the time spent putting the test
together when it is required, and can facilitate the construction of good questions because the material is
fresh
• Avoid designing questions containing complicated choices within choices
• Don’t give a series of responses in which two or more choices are correct
• Don’t use destructors which are obviously wrong or frivolous
• Don’t include responses which don’t parallel in grammatical structure
• Don’t vary appreciably the length of the responses
• Don’t base a question on a trivial piece of information merely
• Don’t use a correct but incidental detail as a required response
• Don’t use a negative in the stem, if used then emphasize it by capital letter
• Do use a complete and positive statement as the stem
• Do provide one response which is absolutely correct
• Do provide responses which are similar in grammatical structure and in length
Short answer Items:
The short answer items and the completion item both are supply- type test items that can be answered by
a word, phrase, number, or symbol. They are essentially the same, differing only in the method of presenting the
problem. The short answer item uses a direct question whereas the completion item consists of an incomplete
statement. More complex interpretation can be made when the short-answer item is used to measure the ability
to interpret diagrams, charts, graphs, and pictorial data. To obtained correct answer, pupils must actually solve
problems, manipulate mathematical symbols, and complete and balance equations.
Uses of short answer items:
When short-answer items are used, the question must be stated clearly and concisely, be free from irrelevant
clues, and require an answer that is both brief and definite. Common uses are:
• Knowledge of terminology
• Knowledge of specific facts
• Knowledge of principles
• Knowledge of method or procedure
• Knowledge of simple interpretation of data
• The ability to solve a problem
Suggestions for constructing short answer items.
• Word the item so that the required answer is both brief and specific
• Don’t take statements directly from textbooks to use as a basis for short answer items
• A direct question is generally more desirable than an incomplete statement
• If the answer is to be expressed in numerical units, indicate the type of answer wanted and blanks for answer
should be equal in length and in the right of the question
• When completion items are used, do not include too many blanks at the column’s end
• For computational problems, the teacher should specify the degree of precision and the units of expression
in the answer and omit important word only
• Use a direct question in which the term is given and a definition is asked for, to test the knowledge of
definitions and comprehensions of technical terms
• Don’t skimp on the answer space provided
• Avoid giving irrelevant clues
Checklist for reviewing short answer items
S. No. Review Questions Yes No
1. Is the most appropriate type of item to use for the intended learning
outcomes?
2. Has the text book language been avoided?
3. Can the items be answered with a number, symbol, word, or brief phrase?
4. Are the answer blanks equal in length?
5. Have the items been stated so that only response is correct?
6. Are the answer blanks equal at the end of the items?
7. Are the items free of clues (such as a or an)?
8. Have the units been indicated when numerical answers are expected?
9. Has the degree of precision been indicated for numerical answers?
10. Have the items been phrased so as to minimize spelling errors?
11. If revised, are the items still relevant to the intended learning outcomes?
12. Have the items been set aside for a time before reviewing them?
Checklist for reviewing short answer (supply-type) items
S. No. Review Questions Yes No
1. Can each item be answered in a word, a phrase, with a symbol, formula,
or short sentence?
2. Do the items avoid the use of verbatim textbook language?
3. Is each item specific, clear, and unambiguous?
4. Are all irrelevant clues avoided? Grammatical? Length of the blank?
5. Do computational problems indicate the degree of precision required?
6. Do the blanks occur near the end of the sentence?
7. Have only key words been omitted?
8. Was excessive mutilation kept to a minimum?
9. Have direct questions been used where feasible?
10. Are the items technically correct?
11. Is there one correct or agreed-upon correct answer?
12. Has a scoring key been prepared?
13. Have the test been reviewed independently?
14. Is this format most efficient for testing the instructional objectives?
References:
• Willian A. Mehrens.(1994). Measurement and Evaluation In education. 3rd Edition. The Dryden Press.
USA.
• Norman E. Gronlund. (1985). Measurement and Evaluation In Teaching. 5th Edition. Macmillan Publishing
Company. USA.
• Lois White.(2001). Foundations of Nursing. Delmar. USA.
• Nursing and Midwifery Council (NMC) 2002 Guidelines for Evaluation. NMC, London.
• Ellen Thomas E. (1994). Methods of evaluation in Nursing. Lippincott. USA.
http://www.aft.org/topics/nclb/MN.htm
• http://www.ericdigests.org/1996-3/portfolios.htm
• http://www.ericdigests.org/1998-1/norm.htm
• http://education.state.mn.us/html/intro_acad_prof_grad.htm
Objective Test Essay Test
Learning
Outcomes
measured
It is efficient for measuring
knowledge of facts. Some types (e.g.
multiple choices) can also measure
understanding, thinking skills, and
other complex outcomes.
Inefficient or inappropriate for
measuring ability to select and
organize ideas, writing abilities, and
some types of problem-solving skills.
It is inefficient for measuring
knowledge of facts. Can measure
understanding, thinking skills, and other
complex outcomes especially useful where
originality of response is desired).
It is appropriate for measuring ability
to select and organize ideas, writing
abilities, and problem-solving skills
requiring originality.
Preparation
of
questions
A relatively large number of
questions is needed for a test.
Preparation is difficult and time-
consuming.
Only a few questions are needed for a
test.
Preparation is relatively easy (more
difficult than generally assumed).
Sampling
of course
content.
Provides an extensive sampling
of course content because of
questions that can be included in a
test.
Sampling of course content is usually
limited because of the small number of
questions that can be included in a test.
Control of
pupil’s
response.
Complete structuring of task
limits pupil to type of response
called for.
Prevents bluffing and avoids
influence of writing skill, through
selection-type items are subject to
guessing.
Freedom to respond in own words
enables bluffing and writing skill to
influence the score, through guessing is
minimized.
Scoring
Objective scoring that is quick,
easy, and consistent.
Subjective scoring that is slow, difficult,
and inconsistent.
Influence
on
learning.
Usually encourages pupil to
develop a comprehensive knowledge
of specific facts and the ability to
make fine discriminations among
them.
Can encourage the development
of understanding, thinking skill, and
other complex outcomes if properly
constructed.
Encourages pupils to concentrate on
larger units of subject matter, with special
emphasis on the ability to organize,
integrate, and express ideas effectively.
May encourage poor writing habits if
time pressure is a factor (it almost always
is).
ReliabilityHigh reliability is possible and is
typically obtained with well-
constructed tests.
Reliability is typically low, primarily
because of inconsistent scoring.