Assessment and Evaluation in Nursing Education System

Objective # 08

Identify Different Methods of Evaluation during Teaching

Liaquat University of Medical and Health Sciences

Jamshoro Sindh

College of Nursing JPMC Karachi

Senior Elective Practicum in Nursing Education

Identify Different Methods of Evaluation during Teaching

Mahmood Ahmed

Preceptor: Associate Professor Dr. M. Iqbal Afridi

Assessment

Assessment is only useful if it contributes to sound educational decision-making. In order to use this data

effectively, beginning teachers must have knowledge and skills in the areas of measurement fundamentals,

standardized tests and their interpretation, validity and reliability, constructing and using the results of formal

and informal assessments, and utilizing assessment information for the purposes of grading. Additionally,

educators in Minnesota must be knowledgeable about the Minnesota Academic Standards and how to measure

student progress toward meeting them. The paragraphs that follow describe the knowledge and skills we believe

our students must obtain in order to make effective assessment decisions.

Measurement Fundamentals: Educators must understand the basic terminology and concepts related to

assessment, evaluation, and measurement. The following concepts and definitions are of central importance to

this understanding:

Assessment vs. Evaluation: Though the terms assessment and evaluation are often used interchangeably

(Cooper, 1999), the Minnesota Department of Education differentiates between the two terms. They define

assessment as gathering information or evidence while evaluation involves using that information or

evidence to make judgments (Aune, 1999)

Teacher-Made vs. Standardized Assessments: In the broadest sense, assessments may be classified into

two categories: teacher-made assessments and standardized assessments. Teacher-made assessments are

constructed by an individual teacher or a group of teachers in order to measure the outcome of classroom

instruction. Standardized assessments, on the other hand, are commercially prepared and have uniform

procedures for administration and scoring. They are meant for gathering information on large groups of

students in multiple settings (Karmel and Karmel, 1978)

Criterion-Referenced vs. Norm-Referenced Assessment: Standardized assessments may be norm-

referenced or criterion referenced. Norm-referenced assessments compare individual students’ scores to

those of a norm-reference group, generally students of the same grade or age. They are designed to

demonstrate "differences between and among students to produce a dependable rank order" (Bond, 1996)

and are often used to classify students for ability grouping or to help identify them for placement in special

programs. They are also used to provide information to report to parents. Criterion-referenced tests, on the

other hand, determine the specific knowledge and skills possessed by a student. Thus, this form of testing

"uses as its interpretive frame of reference a specified content domain, rather than a specified population of

persons" (Anastasi, 1976). Competency tests, such as the Minnesota Basic Skills Tests, represent a specific

sub-category of criterion-referenced assessment and are used to insure that students possess minimal basic

skills (Biehler and Snowman, 1997)

Formative vs. Summative Evaluation: Formative evaluation involves "collecting, synthesizing, and

interpreting data for the purpose of improving learning or teaching (Airasian, 1997, p. 402). Thus, formative

assessment is used to provide feedback and not for grading. It typically occurs while instruction is ongoing.

Summative evaluation, on the other hand, involves "collecting, synthesizing, and interpreting information

for the purpose of determining pupil learning and assigning grades" (Airasian, 1997, p. 404). It typically

occurs at the end of instruction

Types of Standardized Tests: A description of all forms of standardized tests is far beyond the scope of this

document. Therefore, only those standardized tests most central to educational decision-making, individual

and group tests designed to measure intelligence and academic achievement will be included in this brief

summary:

Intelligence Tests: Intelligence tests are often classified into two categories: individual and group.

Individual intelligence tests, such as the Stanford-Binet and Wechsler Scales, are given in a one-to-one

setting by a trained examiner,. They are most frequently given as part of an overall psychological

evaluation, often to determine if a student is eligible for special education. Though group intelligence tests,

such as the Otis-Lennon Mental Ability Tests, are "no longer administered to all students everywhere"

(LeFrancois, 1999), some school districts still include them in their annual testing programs (Biehler and

Snowman, 1997). Results of intelligence tests, whether group or individual, are often reported as

intelligence quotients or IQ scores. Though IQ scores used to be determined by calculating a ratio (mental

age divided by chronological age multiplied by 100, thus the term intelligence "quotient"), they are now

typically calculated as standard scores with a mean of 100 and a standard deviation of about 15. Though the

results of individual intelligence tests tend to be more valid and reliable than those of group tests, teachers

must be aware of the following limitations of intelligence tests noted by Biehler and Snowman (1997):

They generally only sample abilities that relate to classroom achievement rather than overall intellectual

functioning. Therefore, many educators prefer the term scholastic aptitude test

Intelligence test results provide only an estimate of a child’s abilities to deal with "certain kinds of

problems at a particular point in time". Because of this, their results can vary over multiple

administrations

The tests may not provide a valid estimate of the abilities of minority and low income children.

Therefore, teachers must exercise caution when making educational decisions based on the results of

these tests

Traditional Tests of Academic Achievement: As is the case for intelligence tests, individual achievement

tests such as the Wide Range Achievement Test and Peabody Individual Achievement Test, are most

commonly administered to students who have been referred for possible placement in special education or

remedial programs. Group achievement tests, such as the California Test of Basic Skills, on the other hand,

are administered either annually or at planned intervals as part of the district-wide testing program in order

to certify students’ achievement and provide information to parents (LeFrancois, 1999)

Assessment Data Interpretation: To understand and appropriately use the results of standardized

assessments, educators must have a working knowledge of descriptive statistics, including measures of

central tendency, measures of dispersion, norms, and standard scores

Measures of central tendency include the mean, median, and mode. The mean is the arithmetic average

and is obtained by adding all scores and dividing by the total number of scores. It is especially important

because it is a necessary statistic for the calculation of standard scores. The median is the middle score of a

distribution. In cases where there are extreme scores (or outliers), the median may be a better measure of

central tendency than the mean. The mode is the most frequent score in a distribution. In large, normally-

distributed populations, the mode does not differ greatly from the mean and median. However, in

distributions with small numbers, it is typically the least useful of these three statistics (Glass and Stanley,

1970)

Measures of dispersion include the range and standard deviation. The range is the spread of scores in a

distribution (Vogt, 1993). It may be calculated by subtracting the lowest score from the highest and adding

one. It is, at best, a rather crude statistic because it is based on only the two most extreme scores in the

distribution. The standard deviation is a more precise and useful measure of dispersion. It is calculated by

finding the average of the absolute distance scores vary from the mean. Calculating the standard deviation is

essential for determining standard scores such as those described below (Anastasi, 1988)

Norms: The results of standardized tests are often reported as norms, which are statistics that allow one to

interpret the score of an individual student in comparison to others of the same age or grade level. They

include percentiles (the percentage of scores in the norm-reference group that fall at or below that of a

particular student), grade equivalents (which describe a student’s performance in terms of school grade

levels), and standard scores such as Z-scores, T-scores, stanines, and I.Q. scores (all of which indicate how

far a student’s score varies from the mean in standard deviation units). It is important for teachers to

understand the meaning of norms so that they are able to use the results of standardized tests to interpret

them to parents and effectively plan instruction

Validity and Reliability: In order to make effective decisions based on assessment data, the instruments

used to collect that data must be valid and reliable

Validity is defined as "the extent to which a test measures what it intends to measure" (Lefrancois,

1999). The validity of a test may be determined in different ways including carefully examining the

test’s content in regard to the curriculum (content validity), determining the degree to which test results

reflect a theory or construct (construct validity), and examining the extent to which assessment results

can be used to predict student performance (predictive validity). Of the various types of validity,

content validity is by far the most important for classroom assessments, as it is essential that the

content of teacher-made assessments accurately reflects the instructional outcomes being assessed

Reliability refers to the consistency or stability of scores yielded by a test (Airasian, 1997). In other

words, a reliable test yields consistent scores over repeated administrations (repeated-measures

reliability). A concrete estimate of a test’s reliability is provided by the standard error of measurement,

which is an index of the amount of error or unreliability in the scores yielded by a test. Though

reliability and standard error of measurement are important considerations in choosing standardized

tests, it is often difficult to determine the reliability and standard error of a teacher-made test. However,

teachers should be knowledgeable regarding ways to increase the reliability of classroom assessments

such as those described by Aiarasian (1997)

Constructing Formal Assessments: Formal, teacher-made assessments can be classified as traditional or

alternative. Traditional assessments are typically paper-and-pencil tests and are often categorized as objective

or essay. Alternative assessments are most often performance-based. The type of assessment one chooses is

dependent on many factors including grade level, content, and time constraints. Regardless of the assessment

format, good assessments have three features in common:

The assessment exercises and questions are related to the teacher’s objectives and instruction

The exercises and questions cover a representative sample of what students were taught

The items, directions, and scoring procedures are clear and appropriate" (Airasian, 1997). Additionally,

as noted by Aune (2000), effective teachers typically employ multiple assessments, including both

traditional and alternative

Traditional Assessments: Objective assessments are traditional assessments on which students are

expected to provide the one, correct answer. Typical objective assessment formats include multiple choice,

true-false, matching, completion, and short answer. All of these assessments have the advantage of

sampling students’ knowledge of a wide range of content (and, therefore, have the potential for good

content validity) in a minimal amount of time. A major disadvantage of objective test items is that they

typically measure only student learning at the knowledge and comprehension levels of Bloom’s

Taxonomy. However, multiple choice items can be constructed to sample higher cognitive levels (Airasian,

1997). In order to construct effective objective assessments, teachers must be familiar with and apply

guidelines for effective test construction such as those described by Nitko (1996), Oosterhof (1999), and

Airasian (1997)

The Essay Test is another form of traditional, paper-and-pencil assessment. It is an excellent format for

assessing students’ abilities to communicate ideas in writing. Other advantages of essay tests include

measuring higher cognitive levels and directly measuring behaviors specified by performance objectives

(Ooosterhof, 1999). However, essay tests have disadvantages as well in that they sample less content than

objective tests and, therefore, may have poorer content validity. Additionally, essay tests are time-

consuming to score, and their scoring tends to be less reliable (Oosterhof, 1999). Because of these potential

disadvantages, it is especially important for teachers to follow appropriate guidelines for constructing and

scoring essay tests such as those described by Nitko (1996), Oosterhof, 1999), and Aarasian (1997)

Alternative Assessments: The most common form of alternative assessment is performance assessment,

which can be defined as an assessment activity requiring students to use their knowledge and skills to

perform complex tasks or solve problems (Biehler and Snowman, 1997). Though the term authentic

assessment is often used interchangeably with performance assessment, Oosterhof (1999) differentiates

between the two, defining authentic assessments as tasks that require "a real application of a life skill

beyond the instructional context” Therefore, according to Oosterhof, "All authentic assessments are

performance assessments, but the inverse is not true". A major advantage of performance assessments is

that they allow evaluation of skills that cannot be easily measured by paper-and-pencil tests. Additionally,

they allow for evaluation of the process as well as the product. They are, however, time-consuming to

administer and, therefore, typically do not allow one to sample a wide range of outcomes. Consistency of

scoring (reliability) is also problematic (Oosterhof, 1999). Like essay tests, performance assessments can

be scored analytically or holistically. Analytical scoring involves breaking the desired response into its

component parts and using checklists or rating scales to calculate a score (Oosterhof, 1999). Holistic

scoring is typically done using a scoring rubric which allows for comparisons of students’ work to

"descriptions of performances that range from higher to lower". Wiggin’s has noted, "scoring rubrics must

be based on a careful analysis of existing performances of varying quality"

Portfolios: According to Arter (1995), "a portfolio is a purposeful collection of student work that tells the

story of student achievement or growth". Though portfolios are often considered to be a form of authentic

or performance assessment, as Oosterhof (1999) noted, they typically include materials that are not

authentic or performance-based. Portfolios are particularly useful for tracking change or growth in student

performance over time and are most effective when they "are characterized by a clear vision of the student

skills to be addressed, student involvement in selecting what goes into the portfolio, use of criteria to

define quality performance and provide a basis for communication, and self-reflection through which

students share what they think about their work, their learning environment, and themselves" (Arter, 1995)

Informal Assessment: As noted by Oosterhof (1999), the majority of classroom assessments are informal

in nature. They typically take place during instruction and allow the teacher to monitor student learning

and make any necessary adjustments. These informal assessments most often take the form of observations

and questions. Though these techniques are efficient and adaptable, their technical quality "tends to be

inferior to techniques associated with formal assessments". However, teachers can improve their use of

informal questions by basing them on instructional goals, allowing sufficient wait-time, and recognizing

the importance of teacher reactions to student answers. The effectiveness of observations can be improved

by keeping anecdotal records, using informal checklists and recognizing one’s own bias (Good and

Brophy, 1984)

Reporting Grades: Grades serve the important purposes of communicating the extent to which learners

have achieved classroom goals and communicating this information to students and their parents (Tombari

and Borich (1999) suggest eight steps for teachers to follow when constructing a grading system:

Identify district policy

Determine the meaning of each grading symbol

Distinguish between reporting and grading factors

Identify grade components

Decide on component weights

Determine how components will be combined

Choose a method for calculating grades

Decide how to deal with borderline grades. Methods for calculating grades include norm-referenced

grading, criterion-referenced grading, and individual-referenced grading

Norm-referenced (or relative) grading involves comparing "a pupil’s performance to that of other pupils in

the class" (Airasian, p. 301, 1997). Grading "on the curve" is an example of norm-referenced grading

Criterion-referenced grading involves comparing a "pupil’s performance to a predetermined standard

of mastery" (Airasian, 1997). Calculating grades based on fixed ranges of cumulative scores is an

example of criterion-referenced grading (Tombari and Borich, 1999)

Individual-referenced grading involves comparing pupils’ performance to their perceived abilities.

Though this form of grading is sometimes used for students with disabilities, Airasian, (1997)

recommends that other grading approaches such as contract grading, IEP-based grading, or narrative

grading be used for determining these students’ grades

Assessment of Performance Toward education Standards: Teachers must be knowledgeable about the

Standards so that they can facilitate and assess student progress toward their attainment. The graduation

requirements consist of two parts: the Basic Standards and the Academic Standards

Basic Standards: The Basic Standards specify the minimum skills students must possess in order to

graduate from a public high school . These skills are assessed by the Basic Standards Tests, which are

competency tests designed to assure that students have basic skills in mathematics, reading, and

composition. Students take the math and reading tests during 8th grade and the writing test during 10th

grade. They must pass these tests in order to obtain a high school diploma. Those who not pass a test on

their first attempt may retake it annually until they achieve a passing score

The Academic Standards define a new core of five academic content standards areas: language arts,

mathematics, science, social studies and the arts. Standards for Mathematics, Language Arts, and Arts.

Each of the academic standards will be supplemented by grade-level benchmarks. These benchmarks will

specify the academic knowledge and skills that students must achieve to complete a state standard and will

determine the content of Comprehensive Assessments, which are used to assess compliance with the

standards of education

Measuring scales:

The application of statistics in research is well documented. Before choosing a statistical method for

your own research project, knowledge regarding scales of measurement is a prerequisite. Scales of measurement

have to do with the allocation of numerical values to characteristics according to certain rules. Measurement can

thus either be quantitative or qualitative. The quantitative level of measurement includes among other things,

aspects such as interpretation and paragraph analysis, whilst the quantitative level of measurement focuses on

measures such as nominal, ordinal, internal and ratio levels of measurement. The latter are basic scales of

measurement and will be briefly outlined.

Measuring scales

Nominal measurement: Nominal measurement includes the awarding of a numeral value to a specific

characteristic. This type of measurement is the most basic form of measurement, because it measures the

lowest level that can be measured and is therefore considered a scale of measurement with limitations. The

following serves as an example of nominal measurement: A researcher wants to determine the profile of the

academic background of his students. For this he/she might need information regarding the specific level

(HG, SD, LG) his students passed during their matriculation examination

Ordinal measurement: Ordinal measurement is applicable in cases where a criterion/characteristic is

awarded to numeral value in terms of a specific order. The ordinal scale implies that the entity being

measured is quantified in terms of higher or lower, greater or lesser without specifying the size of the

intervals (Leedy 1993). The numerical 1 can be the highest, whilst 3 could be the lowest

Interval measurement: The interval scale of measurement is characterized by two features, namely: equal

units of measurement (equal intervals); and a zero point which has been established arbitrarily (Leedy

1993). The latter indicates that there is not an absolute zero point. There is therefore a specific: relationship

between the distance of the numerical value and the different sizes of a characteristic. Because of the before

mentioned characteristics, this measure scale is considered to be a more advanced type of measuring scale.

An increase or decrease of the one characteristic goes hand in hand with an increase or decrease of the other.

The interval level of measurement enables the researcher to compromise between aspects and to indicate

clearly how much more the one has of a characteristic than the other. The interval scale of measurement is

therefore suitable to calculate arithmetic mean averages, do standard deviations and determine correlation

studies, provided that the researcher takes care that the preconditions set for each scale of measurement, are

abided by

Ration measurement: This is considered the highest order of measurement that exist, because of the fixed

proportions (ratio) between the number and the amount of the characteristic; that it represents. What should

be mentioned is that, when ration levels are measured, a fixed (absolute) zero point exists. Ration level of

measurement thus enables researchers to determine whether aspects possess something of a characteristic or

not

Characteristics of measuring scales: With any type of measurement, two considerations are important -

validity on the one hand and reliability on the other hand

Reliability is the term used to deal with accuracy. A scale measurement is considered reliable if it measures

that which it is supposed to measure. Further refining of the term reliable is that, when a test is repeated by

the same researcher with a different group representing the original group, the same results should be

obtained

Validity is concerned with the soundness and the effectiveness of the measuring instrument (Leedy 1 993).

Four types of validity stand out, namely: content validity, prognostic validity, simultaneous validity; and

construct validity

Classification of statistical methods: Statistical methods in the broadest sense are classified into two main

group’s namely descriptive and inferential statistics

Descriptive statistics: Smit (1983) sees descriptive statistics as the formulation of rules and procedures

according to which data can be placed in useful and significant order. Landman (1988) states that

descriptive statistics deals with the central tendency, variability (variation) and relationships (correlations)

in data that are readily at hand. The basic principle for using descriptive statistics is the requirement for

absolute representation of data. The most important and general methods used are: ratios, percentages,

frequency tables, and distribution tendency

The histogram. The histogram is a graphic representation of frequency distribution and is being used to

represent simple frequency distribution. Characteristic is a vertical line (the y axis/ordinate) at the left

sideline of the figure and the horizontal line (x axis) at the bottom. The two lines meet at a 90 grade angle.

Because frequencies should be divided into class intervals, the benefit of graphic presentation is that data

can be observed immediately

Frequency polygon. The frequency polygon does not differ basically from the histogram, but is only used

for continual data. Instead of drafting bars for the complete histogram, a dot indicating the highest score is

placed in the middle of the class interval. When the dots are linked up, the frequency polygon is formed.

Usually an additional class is added to the end of the line in order to form an anchor

Cumulative frequency curve. The frequency on the frequency table is added, starting from the bottom of the

class interval, and adding class by class. The cumulative frequency in a specific class interval can then

clearly indicate how many persons/ measurements perform below or above the class intervals

Percentile curve. The cumulative frequency can also be converted into percentages or proportions of

distribution

Line graphic. During graphic presentations the historical line (X axis) indicated the scale of measurement,

whilst the vertical line (Y axis) indicated the frequency. In the case of a line graphic, both axes (X and Y)

are used to indicate the scale of measurement with the aim of indicating a comparison between two

comparable variables (Smit 1983)

Central tendency is defined as the central point around which data revolve. The following techniques can

be employed

The mode is defined as the score (value or category) of the variable which is observed most frequently

Median indicates the middle value of a series of sequentially ordered scores. Because the median divides

frequencies into two equal parts, it can also be described as being the fiftieth percentile

Arithmetic mean refers to a measure of central tendencies found by adding all scores and dividing them by

the number of scores

Standard deviation is a measure of the spread of dispersion of a distribution of scores. The deviation of

each score from the mean is squared; the squared deviations are then summed, the result divided by N-1,

and the square root taken (Landman 1988)

Inference statistics: apart from descriptive statistics that deal with central tendencies, statistical methods

enabling researchers to go from the known to the unknown data also exist. This is to say to make deductions

or statements regarding the broad population as the samples from which the 'known' data are drawn. These

methods, according to literature are called inferential or inductive statistics (Landman 1989). These methods

include estimation, predictions, hypothesis testing and so forth

In conclusion the role of statistical methods in research is to enable the researcher to accurately utilize the

gathered information and to be more specific in describing his findings. For more details on statistical

calculations you are referred to Huysamen (1976)

Objectives types of questions:

The most distinguishing characteristic of objective-style questions is that they have highly specific,

predetermined answers requiring a very brief response. Evaluation should be a learning experience for both

students and the teachers. Considerable effort needs to be made to develop and use objective-style questions in a

more versatile manner to include questions which demand higher level thinking. The objective item includes a

variety of different types, but they can be classified into two types, as supply (that require the pupil to supply

the answer) and Select (that require the pupil to select the answer from a given number of alternatives).

Objective Type Questions

Supply type Question Select Type Question

Short answer True -False

Completion Matching Items

Multiple choice questions

Factors affecting objectives types of questions:

Objectives types of questions are often used for the wrong reasons that is, they are faster and easier to mark

than other types of evaluation

Well designed objectives types of questions, can quickly and efficiently evaluate a wide variety of skills and

deal with a large amount of subject matters

The most effective use of well-designed objectives types of questions involves the evaluation of students’

ability to use a wide variety of skills including skills of higher levels of thinking

The teacher should be aware that the objectives types of questions may be inappropriate in certain

situations, as it is inappropriate in Grades 1 to 4

Important considerations in writing objective items:

Test for important facts and knowledge

Tailor the questions to fit the examinees’ age and ability levels as well as the purpose of the test

Write the items as clearly as possible

Clarity can be improved by using good grammar and sentence structure

If the purpose of the test item is to measure understanding of a principle rather than computational

skill, use simple numbers and have the answer come out a whole number.

Avoid lifting statements verbatim from the text

Avoid using interrelated items

There should be only one correct (best) answer

Avoid negative questions whenever possible

Don’t give the answer away

Get an independent review of your test item

Construction of objective Test Items:

The construction of good test items is an art. Needed a though grasp of subject matter, a clear conception

of the desired learning outcomes, a psychological understanding of pupils, sound judgment, persistence, and a

touch of creativity. The only additional requisite for constructing good test items is the skillful application of an

array of simple but important rules and suggestions.

Checklist for reviewing objective test items

S. No. Review Questions Yes No

1. Are the instructional objectives clearly defined?

2. Did you prepare a test blueprint? Did you follow it?

3. Did you formulate well-defined, clear test items?

4. Did you employ “correct” English in writing the items”

5. Is the most appropriate type of item to use for the intended learning

outcomes?

6. Has the text book language been avoided?

7. Did you avoid giving clues to the correct answer? For example,

grammatical clues, length of correct response clues?

8. Did you test the important ideas rather than the trivial?

9. Did you adapt the test’s difficulty to your students?

10. Did you cast the items in positive form?

11. If negative items were used, did you draw the students’ attention to

them?

12. Did you prepare a scoring key? Does each and every item have a single

correct answer?

13. Did you review your items? Yourself? Another teacher?

14. Have the units been indicated when numerical answers are expressed in

units?

15. Has the degree of precision been indicated for numerical answers?

16. Have the items been phrased so as to minimize spelling errors?

17. If revised, are the items still relevant to the intended learning outcomes?

18. Have the items been set aside for a time before reviewing them?

Comparative advantages of objective and essay Tests

Objective Test Essay Test

Learning

Outcomes

measured

It is efficient for measuring

knowledge of facts. Some types (e.g.

multiple choices) can also measure

understanding, thinking skills, and

other complex outcomes.

It is inefficient for measuring

knowledge of facts. Can measure

understanding, thinking skills, and other

complex outcomes especially useful where

originality of response is desired).

It is appropriate for measuring ability

Inefficient or inappropriate for

measuring ability to select and

organize ideas, writing abilities, and

some types of problem-solving skills.

to select and organize ideas, writing

abilities, and problem-solving skills

requiring originality.

Preparation

of

questions

A relatively large number of

questions is needed for a test.

Preparation is difficult and time-

consuming.

Only a few questions are needed for a

test.

Preparation is relatively easy (more

difficult than generally assumed).

Sampling

of course

content.

Provides an extensive sampling

of course content because of

questions that can be included in a

test.

Sampling of course content is usually

limited because of the small number of

questions that can be included in a test.

Control of

pupil’s

response.

Complete structuring of task

limits pupil to type of response

called for.

Prevents bluffing and avoids

influence of writing skill, through

selection-type items are subject to

guessing.

Freedom to respond in own words

enables bluffing and writing skill to

influence the score, through guessing is

minimized.

Scoring

Objective scoring that is quick,

easy, and consistent.

Subjective scoring that is slow, difficult,

and inconsistent.

Influence

on

learning.

Usually encourages pupil to

develop a comprehensive knowledge

of specific facts and the ability to

make fine discriminations among

them.

Can encourage the development

of understanding, thinking skill, and

other complex outcomes if properly

constructed.

Encourages pupils to concentrate on

larger units of subject matter, with special

emphasis on the ability to organize,

integrate, and express ideas effectively.

May encourage poor writing habits if

time pressure is a factor (it almost always

is).

ReliabilityHigh reliability is possible and is

typically obtained with well-

constructed tests.

Reliability is typically low, primarily

because of inconsistent scoring.

Advantages of objective type of evaluation

A well-designed test can quickly evaluate a wide variety of skills and deal with a large amount of subject

matter

Objective type of questions presents tasks for which the solutions are predetermined and presented among

pairs or groups of responses

In objective type of questions which don’t require students to write out the answers, their writing skills,

including grammar and spelling, and neatness doesn’t influence the grade of students receive

The result of the test can be analyzed to enable a teacher to look at the responses pattern to the individual

question, groups of questions and the test as a whole to determine suitability and clarity of the questions,

topic, content or skills which proved too demanding for the students

The making of the answer is unbiased, a teacher’s opinion or preconception of a student’s work will not be a

factor

They avoid the discrepancies often found in evaluation of essay-style answer

They are effective for testing large group in case of limited time

Disadvantages of objective type of evaluation:

Frequently the objectives types test requires only the recall of facts

Over emphasis upon the objectives types test will not allow students to practice and demonstrate writing

skills

Sometimes objective type tests evaluate something other than what is intended

Objective-type tests often require a disproportionate amount of reading time

Objective-type questions are inappropriate for students in Grades 1 to 4

Great care must be taken in composing these questions and their expected responses, because a specific

predetermined answer is expected

Preparation is very time-consuming

Objective type questions often promote uneducated guessing

Many students may be able to select the correct answer without really understanding the response

They should be used only rarely to evaluate students’ language skills; because most objective type questions

don’t measure language acquisition or development as effectively as other forms of evaluation outlined in

this document

Some types of objective-style questions fail to provide clues to the thinking processes of individual students

True–false or alternative-response items:

The True–false or alternative response test items consist of a declarative statement that the pupil is asked

to mark true or false, right or wrong, correct or incorrect, yes or no, fact or opinion, and agree or disagree and

the like. In each case there are only two possible answers. The true-false item is essentially a two-response

multiple-choice item in which only one of the propositions (answer) is presented and the student judges the

truth or falsity of the statement and selects one of the two possible answers. Variations, however, deviate

considerably from the simple true-false pattern and have their own characteristics. For this reason, some prefer

the more general category, alternative-response item. We shall retain the more commonly used true-false

designation. This type of item is used for measuring simple knowledge outcome when only two alternatives are

possible or the ability to identify the correctness of statements of important fact. It is also adaptable to

measuring the ability to distinguish fact from opinion and the ability to recognize cause and affect relationships.

Types of true-false questions:

True and false

Cluster variety

Correction variety

Versatility of the true-false item:

Testing for factual knowledge

Testing for comprehension or understanding

Testing for application

Testing for deductive skill

Testing for problem-solving ability

Criticisms of true-false items:

They tend to test almost exclusively the ability to memorize rather than to apply more complex thinking

skills

The result may be unreliable, because a uniformed student with weak skills or a lack of knowledge has a

50/50 chance of guessing the correct response

Absolutely true or false statements, with no qualifications or exceptions, are difficult to design. The

statements should be qualified and, therefore, suspect a trick or a trap

Limitations:

Susceptibility to guess

Guidelines for constructing true-false questions:

Word the statements simply and clearly. Vague or ambiguous wording may confuse the students

Avoid over-generalizing, because generalization are seldom unqualifiedly true

Avoid trick questions. They promote mistrust and resentment

Don’t use trivial statements in order to “pad out” the number of questions and marks to arrive at a

predetermined total

Don’t take exact statements from texts or notes

Statement should be entirely true or entirely false

Avoid universal descriptors such as “never, none, always, and all”

Avoid negative words, because they are often overlooked by students. Double negatives often increase

confusion and really test logic rather the knowledge

Don’t use long, complicated sentences or unnecessarily difficult words

Don’t include two ideas in one statement unless you are evaluating students’ understanding of cause and

effect relationship

Provide a “T” and “F” besides each statement and ask the students to circle the one they consider to be

correct

Students can be asked not only to indicate the true or false, but also to correct by changing word, phrases or

clauses in them

Suggestions for constructing true-false items:

Avoid broad general statements if they are to be judged true-false

Avoid trivial statements

Avoid the use of negative statements, especially double negatives

Avoid long, complex sentences

Avoid including two ideas in one statement, unless cause-effect relationship are being measured

If opinion is used, attribute it to some source, unless the ability to identify opinion is being specifically

measured

True statements and false statements should be approximately equal in length

The number of true and false statements should be approximately equal

Avoid ambiguous words and sentences

True-false items must be based on statements that are clearly true or false

Avoid trick questions

When the true-false item is used to test for cause-and-effect relationship, we strongly recommend that the

first proposition in the statement always be true, with the subordinate clause being written as either true or

false

Word the item so that superficial knowledge suggests a wrong answer

Make the wrong answer consistent with a popular misconception that is totally irrelevant to the item

Avoid specific determiners

Avoid matching true statements consistently longer than false statements

For the correction type of true-false item, underline the word(s) to be corrected

Uses of the true-false items:

The use of true-false items is limited by the difficulty of constructing clue-free items that measure

significant learning outcome, the susceptibility of this type to guessing, the low reliability to each item, and the

general lack of diagnostic value. They may well be restricted to those areas for which other type of items are

inappropriate. When used, special efforts must be made to formulate statements that are free from ambiguity,

specific determiners, and clues

The most common use is in the ability to identify the correctness of statements of fact, definitions of terms,

statements of principles, and the like

In measuring abilities of the pupils to distinguish fact from option, superstition from scientific belief,

relevant from irrelevant information, valid from invalid conclusions

In measuring pupils’ ability to identify cause and effect relationship

In measuring knowledge and understanding

To measure simple aspects of logic

Checklist for reviewing true-false items


1. Is the most appropriate type of item to use?


3. Can each statement be clearly judged as true or false?

4. Have the trivial statements been avoided?

5. Have negative statements (especially double negative) been avoided?

6. Have the items been stated in simple, clear language?

7. Are opinion statements attributed to some source?

8. Are the true and false items approximately equal in length?

9. Are there an approximately equal number of true and false items?

10. Has a detectable pattern of answer (e.g., T, F, T, F) been avoided?

11. If revised, are the items still relevant to the intended learning outcome?


13. Have the specific determiners (e.g., usually, always) been avoided?

Advantages of true-false items:

There is an apparent ease of construction

A considerable amount/ breath of course/subject material can be tested / asked / scored in a given time

period than other type of objective items. A student can answer about three true-false items for every two

multiple-choice items

Bi-thinking (two thinking) skills lend themselves to the use of true-false questions as distinguishing between

fact and opinion and identifying cause and effect relationship

The perceived advantage is that the use of this type of questions is a way of modifying and differentiating

evaluation for some students

True-false questions can be scored quickly and objectively

True-false items are good for young children and / or pupils who are poor readers

True-false questions provide high reliability per unit of testing time

True-false questions can be scored quickly, reliably, and objectively by clerks

They are suitable for testing beliefs in popular misconception and superstition

True-false questions are adaptable to most content areas

True-false questions can measure the higher mental processes of understanding, application, and

interpretation

Disadvantages of true-false items:

Pupils’ score true-false tests may be influenced by good or bad luck in guessing

They lend themselves most easily to cheating

They tend to be less discriminating, item for item, than multiple-choice tests

They are susceptible to an acquiescence set; and subjects tend to develop a pattern of responding in a

somewhat automatic form without really giving thought to the item

There are many instances when statements are not unequivocally true or false; rather there are degrees of

correctness

Specific determiners are more prevalent in true-false items than in any other objective-item

The apparent ease of construction is based on the frequent practice of lifting statements from the text and

changing some of them to false statements. The correct statements, and those that have been made incorrect

statements, then become the test items. Such a practice can cause ambiguity and promotes guessing

Large areas in all subjects can’t be phrased in absolutely true or false statements. In most cases only the

most trivial statements can be reduced to absolute terms

For most part, this type of question tests only simple knowledge recall. Even distinguishing between fact

and opinion, and identifying cause and effect relationships can be measured more effectively by other means

They create problems for the students as:

The test creates a more stressful atmosphere for the already insecure students because it concentrates on

very specific items

The way the items worded makes the test more a reading test than a test of knowledge

They don’t provide diagnostic information and may be answered correctly on the basis of

misinformation or answered incorrectly as a result of the student misreading or misinterpretation

This form of questioning promotes random guessing

Matching Exercises:

In its traditional form, the matching exercise consists of two parallel columns. Matching questions are made up

of list of words, phrases, statements, symbols, or numbers which are to be matched to another list. Items

which the student is asked to match are usually in the left hand column, and are called premises. Items from

which the selection is to be made, usually present at the right hand column and are called responses.

Uses of matching Exercises:

The typical matching exercise is limited to measuring factual information based on simple associations

In measuring the ability to identify the relationship between two things

The matching exercise has also been used with pictorial materials in relating pictures and words and to

identify positions on maps, charts, and diagrams

Limitations:

It is difficult to find out homogeneous material that is significant from the viewpoint of our objectives and

learning out comes

It is restricted to the measurement of factual information based on rote learning and it is highly susceptible

to the presence of irrelevant clues

Good matching items require a high degree of skills

Suggestions for constructing matching Items {

Use only homogeneous material in a matching exercise

Include an unequal number of responses and premises, and instruct the pupil that responses may be used

once, more than one, or not at all

Keep the list of items to be matched brief, and place the shorter responses on the right

Arrange the list of responses in logical order

Indicate in the directions the basis for matching the responses and premises

Place all the items for one matching exercise on the same page

The responses list should consist of short phrases, single words, or numbers

Each matching exercise should consist of homogeneous items

Keep each list relatively short

Avoid having an equal number of premises and responses

Arrange the answers in some systematic fashion

Avoid giving extraneous irrelevant clues

Explain clearly the basis on which the match is to be made

Advantages of matching questions:

• Matching questions can evaluate a large amount of related factual material quickly due to their compact

form

• These questions seen to be easy to construct

• These questions are easy to score

• As they contain the correct answers and assisted the student in answering correctly

• They require relatively little reading time for the solution of many questions

• Like other objective type they are amenable to machine scoring. Even they are hand-scored, they can be

scored more easily than essay and short-answer

• This is a mixed blessing, leads to the excessive use of matching exercises and a corresponding overemphasis

on the memorization of simple relationships

Disadvantages of matching questions:

• The matching lists may encourage serial memorization rather than association, if sufficient care is not taken

• It is difficult to get clusters of questions that are sufficiently alike that a common set of responses can be

used

• They are usually limited to the evaluation of factual recall

• They often require more reading, organizing, and thinking skills than assumed and may not be appropriate

for students experiencing difficulties

• The format of these questions can be confusing to students

• If matching questions have too many items, they may provide more of an exercise in searching than an

evaluation of knowledge

• They emphasize rote memorization rather than thinking

• It is difficult to develop a list of items in which all the choices are plausible match, but only one of which is

correct

• It is difficult to find enough significant and homogeneous material to construct a suitable question

Guidelines for constructing matching questions:

• Provide clear instructions on how to indicate the correct answers

• Indicate whether the same response may be used more than once

• Maintain a grammatical consistency within and between columns

• Ensure that any matching question appears entirely on one page

• Provide an unequal number of premises and responses. In general the number of responses will be greater

than the number of premises

• Avoid designing questions in which the students are asked to draw connecting lines or arrows from the

premise to the response

• Make sure that the lists are homogeneous

• Make the wording of the premises longer then the wording of the responses. The students read and select

quickly to the longer premise first

• When using two lists, it is helpful to identify the items in one list with numbers and those in the second list

with letters

• Consider designing three-columns matching questions in order to encourage higher level of thinking. This

type of question is best suited for students in grades 11 and 12

Checklist for reviewing matching items



2. Is the material in the two lists homogeneous?

3. Is the list of responses longer or shorter than the list of premises?

4. Are the responses brief and on the right-hand side?

5. Have the responses been placed in alphabetical or numerical order?

6. Do the directions indicate the basis for matching?

7. Do the directions indicate that each response may be used more than

once for matching?

8. Is all of each matching item on the same page?


10. Can each statement be clearly judged as matching item?



13. Are there an approximately equal number of premises and responses?

14. Has a detectable pattern of answer (e.g., 1=A, 2=B,3=C) been avoided?



17. Can each item be answered in a word, a phrase, with a symbol, formula,

or short sentence?

18. Are all irrelevant clues avoided? Grammatical? Length of blank?

19. Do computational problems indicate the degree of precision required?

20. Do the blanks occur near the end of the sentence?

21. Have only key words been omitted?

22. Was excessive mutilation kept to a minimum?

23. Are the items technically correct?

24. Has the scoring key been prepared?

25. Is this format most efficient for testing the instructional objectives?

26. Are both lists between 5 and 12 entries?

Multiple –choice question:

A multiple-choice item consists of a problem and a list of suggested solutions. The problem may be

stated as a direct question or an incomplete statement and is called the stem of the item. The list of suggested

solutions may include words, numbers, and symbols and are called alternatives (also called choice or options).

The pupil is typically requested to read the stem and the list of alternative and to select the one correct, or best,

alternative. The correct alternative in each item is called the answer, and the remaining alternatives are called

distracters. These incorrect alternatives receive their name from their intended function- to distract those pupils

who are in doubt about the correct answer.

The multiple–choice items are generally recognized as the most widely applicable and useful type of

objective test. It can more effectively measure many of the simple learning outcomes measured by short-answer

item, the true-false item, and the matching exercise. In addition, it can measure a variety of the more complex

outcome in the knowledge, understanding, and application areas. This flexibility, plus the higher quality items

usually found in the multiple-choice form, has led to its extensive use in achievement testing.

Types of Multiple –choice question:

Direct question form / one correct answer

Incomplete statement form

Best answer form / reverse multiple-choice type

Analogy

Uses of Multiple –choice question

Measuring knowledge outcome:

The knowledge of terminology

The knowledge of specific facts

The knowledge of principles

The knowledge of methods and procedures

Measuring outcome at the understanding and application levels:

Ability to identify application of facts and principles

Ability to interpret cause-and-effect relationship

Ability to justify methods and procedures

Suggestions for constructing multiple-choice of questions:

The stem of the item should be meaningful and should present a definite problem

Use a negatively stated item stem only when significant learning outcome require

All of the alternatives should be grammatically consistent with the stem of the item

An item should contain only correct or clearly best answer

Items used to measure understanding should contain some novelty

All distracters should be plausible

Verbal associations between the stem and the answer should be avoided

The relative length of the alternatives should not provide a clue to the answer

Use sparingly special alternatives such as “none of the above” or “all of the above”

Don’t use multiple-choice items when other item types are more appropriate

Break any of these rules when you have a good reason for doing so

How to write multiple-choice items:

The essence of the problem should be in the stem

Avoid repetition of word in the options

Avoid superfluous wording

When the incomplete statement format is used, the options should come at the end of the statement

Arrange the alternatives as simply as possible

Avoid highly technical distracters

Avoid using true-false distracters in them, as it will affect the test’s reliability

Avoid making the correct answer consistently longer than the incorrect one

Avoid giving irrelevant clues to the correct answer

`Consider providing an “I don’t know” option

The number of distracters to be used should be governed such factors, as age of children and nature of the

material

Limitations:

Its limitations are as the selection-type paper-and-pencil test and measures problem-solving behavior at

the verbal level only. Because it requires selection of the correct answer, it is inappropriate for measuring

learning outcome requiring the ability to recall, organize, or present ideas.

Checklist for reviewing multiple-choice items



2. Does each item stem present a meaningful problem?

3. Are the item stems free of irrelevant material?

4. Are the item stems stated in positive (if possible)?

5. If used, has negative wording been given special emphasis as capitalized?

6. Are the alternatives grammatically consistent with the item stem?

7. Are the alternative answers brief and free of unnecessary words?

8. Are the alternatives similar in length and form?

9. Is there only one correct or clearly best answer?

10. Are the distracters plausible to non-achievers?

11. Are the items free of verbal clues to the answer?

12. Are verbal alternatives in alphabetical order?

13. Are numerical alternatives in numerical order?

14. Have “none of the above” and “all of the above” been avoided (or used

sparingly and appropriately)?



17. Do the directions indicate the basis for multiple-type of items?




21. Has a detectable pattern of answer been avoided?

Advantages of multiple-choice items:

• Possibly the outstanding advantage is their versatility

• They can be scored quickly and accurately by machines, clerks, teacher aides, and even students themselves

• The degree of difficulty of the test can be controlled by changing the degree of homogeneity of the

responses

• Compared to true-false items, multiple-choice questions have a relatively small susceptibility to score

variations due to guessing because the probability of guessing a correct answers depends upon the number

of options

• They can provide the teacher with valuable diagnostic information, especially if all the responses vary only

in their degree of correctness

• They are easier to respond to and are better liked by students than true-false items

• The multiple-choice item has the tendency for pupil to give a different answer when the same content is

presented in a different form

• They have a directness which some other types of questions lack

• The form of multiple-choice questions, in which the question is followed by a number of responses, may

help the students to determine the appropriate response

• They can be analyzed and marked quickly and easily

Disadvantages of multiple-choice items:

• They are very difficult to construct

• There is a tendency for teachers to write multiple-choice items demanding only factual recall

• Of all the selection-type objective items, the multiple-choice item requires the most time for the students to

respond, especially when very fine discriminations have to be made

• Research has shown that test-wise students perform better on multiple-choice item than do non-test-wise

students, and that multiple-choice tests favor the high-risk-taking student (Rowley, 1974)

• It takes considerable time to construct good questions, especially those that test higher level of thinking

• They can test a student’s reading ability more than any other skill

• Wording of multiple-choice question clearly is a demanding task

Guidelines for constructing multiple-choice questions:

• Put as mush of the pertinent information in the stem as possible in order to clarify the problem and reduce

the reading time for the choices

• When marking, always be prepared for unforeseen but valid student interpretations of questions and

responses

• Avoid repetition of key words in the correct response

• Avoid giving away the correct response by building too obvious a connection between the stem and the

correct response

• Multiple-choice questions usually include either four or five choice

• Don’t use “all of the above” or “none of the above” as throw-away destructors, only to get a fourth or fifth

possible choice

• Before administering the test to students, try out questions on an informed colleague to ascertain if they are

clear, accurate and fair

• By composing a few questions after each lesson or two, teachers can reduce the time spent putting the test

together when it is required, and can facilitate the construction of good questions because the material is

fresh

• Avoid designing questions containing complicated choices within choices

• Don’t give a series of responses in which two or more choices are correct

• Don’t use destructors which are obviously wrong or frivolous

• Don’t include responses which don’t parallel in grammatical structure

• Don’t vary appreciably the length of the responses

• Don’t base a question on a trivial piece of information merely

• Don’t use a correct but incidental detail as a required response

• Don’t use a negative in the stem, if used then emphasize it by capital letter

• Do use a complete and positive statement as the stem

• Do provide one response which is absolutely correct

• Do provide responses which are similar in grammatical structure and in length

Short answer Items:

The short answer items and the completion item both are supply- type test items that can be answered by

a word, phrase, number, or symbol. They are essentially the same, differing only in the method of presenting the

problem. The short answer item uses a direct question whereas the completion item consists of an incomplete

statement. More complex interpretation can be made when the short-answer item is used to measure the ability

to interpret diagrams, charts, graphs, and pictorial data. To obtained correct answer, pupils must actually solve

problems, manipulate mathematical symbols, and complete and balance equations.

Uses of short answer items:

When short-answer items are used, the question must be stated clearly and concisely, be free from irrelevant

clues, and require an answer that is both brief and definite. Common uses are:

• Knowledge of terminology

• Knowledge of specific facts

• Knowledge of principles

• Knowledge of method or procedure

• Knowledge of simple interpretation of data

• The ability to solve a problem

Suggestions for constructing short answer items.

• Word the item so that the required answer is both brief and specific

• Don’t take statements directly from textbooks to use as a basis for short answer items

• A direct question is generally more desirable than an incomplete statement

• If the answer is to be expressed in numerical units, indicate the type of answer wanted and blanks for answer

should be equal in length and in the right of the question

• When completion items are used, do not include too many blanks at the column’s end

• For computational problems, the teacher should specify the degree of precision and the units of expression

in the answer and omit important word only

• Use a direct question in which the term is given and a definition is asked for, to test the knowledge of

definitions and comprehensions of technical terms

• Don’t skimp on the answer space provided

• Avoid giving irrelevant clues

Checklist for reviewing short answer items


1. Is the most appropriate type of item to use for the intended learning

outcomes?


3. Can the items be answered with a number, symbol, word, or brief phrase?

4. Are the answer blanks equal in length?

5. Have the items been stated so that only response is correct?

6. Are the answer blanks equal at the end of the items?

7. Are the items free of clues (such as a or an)?

8. Have the units been indicated when numerical answers are expected?

9. Has the degree of precision been indicated for numerical answers?

10. Have the items been phrased so as to minimize spelling errors?



Checklist for reviewing short answer (supply-type) items


1. Can each item be answered in a word, a phrase, with a symbol, formula,

or short sentence?

2. Do the items avoid the use of verbatim textbook language?

3. Is each item specific, clear, and unambiguous?

4. Are all irrelevant clues avoided? Grammatical? Length of the blank?

5. Do computational problems indicate the degree of precision required?

6. Do the blanks occur near the end of the sentence?

7. Have only key words been omitted?

8. Was excessive mutilation kept to a minimum?

9. Have direct questions been used where feasible?

10. Are the items technically correct?

11. Is there one correct or agreed-upon correct answer?

12. Has a scoring key been prepared?

13. Have the test been reviewed independently?

14. Is this format most efficient for testing the instructional objectives?

References:

• Willian A. Mehrens.(1994). Measurement and Evaluation In education. 3rd Edition. The Dryden Press.

USA.

• Norman E. Gronlund. (1985). Measurement and Evaluation In Teaching. 5th Edition. Macmillan Publishing

Company. USA.

• Lois White.(2001). Foundations of Nursing. Delmar. USA.

• Nursing and Midwifery Council (NMC) 2002 Guidelines for Evaluation. NMC, London.

• Ellen Thomas E. (1994). Methods of evaluation in Nursing. Lippincott. USA.

http://www.aft.org/topics/nclb/MN.htm

• http://www.ericdigests.org/1996-3/portfolios.htm

• http://www.ericdigests.org/1998-1/norm.htm

• http://education.state.mn.us/html/intro_acad_prof_grad.htm

http://education.state.mn.us/html/intro_acad_prof_grad.htm

http://www.ericdigests.org/1998-1/norm.htm

http://www.ericdigests.org/1996-3/portfolios.htm

http://www.aft.org/topics/nclb/MN.htm

Objective Test Essay Test

Learning

Outcomes

measured

It is efficient for measuring

knowledge of facts. Some types (e.g.

multiple choices) can also measure

understanding, thinking skills, and

other complex outcomes.

Inefficient or inappropriate for

measuring ability to select and

organize ideas, writing abilities, and

some types of problem-solving skills.

It is inefficient for measuring

knowledge of facts. Can measure

understanding, thinking skills, and other

complex outcomes especially useful where

originality of response is desired).

It is appropriate for measuring ability

to select and organize ideas, writing

abilities, and problem-solving skills

requiring originality.

Preparation

of

questions

A relatively large number of

questions is needed for a test.

Preparation is difficult and time-

consuming.

Only a few questions are needed for a

test.

Preparation is relatively easy (more

difficult than generally assumed).

Sampling

of course

content.

Provides an extensive sampling

of course content because of

questions that can be included in a

test.

Sampling of course content is usually

limited because of the small number of

questions that can be included in a test.

Control of

pupil’s

response.

Complete structuring of task

limits pupil to type of response

called for.

Prevents bluffing and avoids

influence of writing skill, through

selection-type items are subject to

guessing.

Freedom to respond in own words

enables bluffing and writing skill to

influence the score, through guessing is

minimized.

Scoring

Objective scoring that is quick,

easy, and consistent.

Subjective scoring that is slow, difficult,

and inconsistent.

Influence

on

learning.

Usually encourages pupil to

develop a comprehensive knowledge

of specific facts and the ability to

make fine discriminations among

them.

Can encourage the development

of understanding, thinking skill, and

other complex outcomes if properly

constructed.

Encourages pupils to concentrate on

larger units of subject matter, with special

emphasis on the ability to organize,

integrate, and express ideas effectively.

May encourage poor writing habits if

time pressure is a factor (it almost always

is).

ReliabilityHigh reliability is possible and is

typically obtained with well-

constructed tests.

Reliability is typically low, primarily

because of inconsistent scoring.

Documents

Assessment and Evaluation in Nursing Education System