33
Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 1 Measuring what we want to measure Writing excellent questions for College examinations Liz Norman Massey University Why is question writing important? So that we can have the best chance of measuring what we want to measure The process of assessment is a judgement process, and those of you who have ever examined anyone will know that sometimes that judgement seems easy, and at other times it is very challenging. We make inferences based on evidence during that judgement process. The questions are there to collect evidence. We need to be sure they are collecting evidence of the right thing. So today I am going to first discuss how we go about deciding what it is we are trying to measure Then we will look at the advantages and disadvantages of different question types for that purpose We will look at some aspects of designing long answer questions and some of the traps And finally we will look at some aspects of designing MCQs and some of the traps

Measuring what we want to measure, Liz Norman ANZCVS 2013

Embed Size (px)

DESCRIPTION

Measuring what we want to measure: writing excellent questions for College examinations. Plenary lecture at the Australian and New Zealand College of Veterinary Scientists Science Week meeting, 2013 One of the challenges of any examination system is measuring the knowledge skills and judgements that we think are important indicators of achievement. This session will focus on designing and communicating tasks for candidates that let them demonstrate their knowledge, skills and judgement. It will look at different types of questions, including where MCQs fit in, and what to think about when writing them. Liz Norman is a graduate of the University of Sydney who worked in private small animal practice for several years before moving to practice at the University of Melbourne and then the University of Glasgow. She took up an academic position at Massey University in 2001 and is Director of the distance Master of Veterinary Medicine programme. Liz received a national Tertiary Teaching Excellence Award for sustained excellence in 2012 and is currently a Doctoral candidate in Education, researching assessment practices. She has held a position on the Board of Examiners of the Australian College of Veterinary Scientists for 9 years, including 5 years as Assistant Chief Examiner and has been involved in all aspects of the College examination system.

Citation preview

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 1

Measuring what we want to measure

Writing excellent questions for College examinations

Liz NormanMassey University

Why is question writing important?

So that we can have the best chance of measuring what we want to measure The process of assessment is a judgement process, and those of you who have ever examined anyone will know that sometimes that judgement seems easy, and at other times it is very challenging. We make inferences based on evidence during that judgement process. The questions are there to collect evidence. We need to be sure they are collecting evidence of the right thing. So today I am going to first discuss how we go about deciding what it is we are trying to measure Then we will look at the advantages and disadvantages of different question types for that purpose We will look at some aspects of designing long answer questions and some of the traps And finally we will look at some aspects of designing MCQs and some of the traps

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 2

What we want to measure

The candidate will have a detailed knowledge of:

The aetiology, pathogenesis and pathophysiology of cardiac, renal, respiratory, alimentary, musculoskeletal, endocrine, ophthalmological and neurological organ dysfunction in the cat and the dog.

The candidate will be able to, with a detailed level of expertise:

Analyse complex clinical problems and make sound clinical judgements.

The subject guidelines for each subject specify the scope for both knowledge and skills. Some skills are technical and assessed through credentialing. Some are cognitive skills such as this one and are assessed in our examinations.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 3

Scope - breadth

PathophysiologyInvestigation and

diagnosisTreatment and management

Gastrointestinal P1Q1 P1Q1, P2Q4

Cardiovascular P1Q4 P2Q2 P2Q2

Nervous P1Q3, P2Q1

Endocrine P1Q3 P2Q3

Musculoskeletal P2Q5

So these form the topics that will be covered by the questions. Blueprinting is a good way to ensure that the whole subject is sampled from representatively across the 3-4 components of the exam. It is why the whole exam (all 3-4 components) needs to be designed at once. Note that questions often span more than one category.

Knowledge levels:

Detailed knowledge — candidates must be able to demonstrate an in-depth knowledge of the topic including differing points of view and published literature. The highest level of knowledge.

Sound knowledge — candidate must know all of the principles of the topic including some of the finer detail, and be able to identify areas where opinions may diverge. A middle level of knowledge.

Basic knowledge — candidate must know the main points of the topic and the core literature.

Currently the College templates for subject guidelines specify the level of knowledge in this way. In a way this is only showing the level of detail required. Knowledge isn’t all about recalling a level of detail though. Experts are able to use their knowledge in appropriate ways, recall the information in appropriate situations and apply it to those situations to solve problems. It isn’t just about knowledge, but what can be done with the knowledge.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 4

Skill levels:

Detailed expertise — the candidate must be able to perform the technique with a high degree of skill, and have extensive experience in its application. The highest level of proficiency.

Sound expertise — the candidate must be able to perform the technique with a moderate degree of skill, and have moderate experience in its application. A middle level of proficiency.

Basic expertise — the candidate must be able to perform the technique competently in uncomplicated circumstances

These skill levels don’t just apply to technical skills (psychomotor skills) but also to the way knowledge is used.

The candidate will have a detailed knowledge of:

The aetiology, pathogenesis and pathophysiology of cardiac, renal, respiratory, alimentary, musculoskeletal, endocrine, ophthalmological and neurological organ dysfunction in the cat and the dog.

The candidate will be able to, with a detailed level of expertise:

Analyse complex clinical problems and make sound clinical judgements.

Candidates need base knowledge in order to “analyse complex clinical problems and make sound clinical judgements” and so if you aim to assess the cognitive skills you will also be assessing the knowledge base of the candidate

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 5

Fact recall vs applied

Fact recall:Questions capable of being answered by reference to one paragraph in a text or notes (or several paragraphs for questions requiring recall of several facts)

Applied (higher order)Questions that require the use of facts or concepts, the solution of a diagnostic or physiologic problem, the perception of a relationship, or other process beyond recalling discrete fact

From: Peitzman et al. (1990). Academic Medicine, 65(9), S59-60.

Questions that assess cognitive skills are applied or higher order questions, as opposed to fact recall questions. This is a useful operational definition used in a research paper which I find helpful to work out if a question is higher order or fact recall. If the answer can be looked up and appears on one paragraph/page of a textbook then really its fact recall. Note that even complex judgements can become fact recall for candidates once someone writes a review paper or textbook chapter that sums up the complexity.

Level - depth

PathophysiologyInvestigation and

diagnosisTreatment and management

recallhigher order

recallhigher order

recallhigher order

Gastrointestinal P1Q1P1Q1, P2Q4

Cardiovascular P1Q4 P2Q2 P2Q2

NervousP1Q2,P2Q1

Endocrine P1Q3 P2Q3

Musculoskeletal P2Q5

You can also categorise your questions into recall and higher order on your blueprint to check that you mostly have higher order (as is appropriate for membership and fellowship) across the examinations.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 6

What we don’t want to measure

• Ability to take tests• Ability to write legibly and fast• Ability to rote learn whole pages of textbooks or

review articles - prewriting• Ability to write down a huge series of unconnected

facts in no particular order• Ability to research examiner's fields of interest and

rote learn impressive aspects of that• Ability to interpret what examiners are thinking

There is a whole lot of things we don’t want t measure, and these are just some. Ability to take tests Ability to write clearly and fast

this is why it is important to pace examinations so that candidates do have time to write legibly Ability to rote learn whole pages of textbooks or review articles – prewriting (where candidates predict and then prelearn answers)

This is why we need to avoid fact recall questions but get candidates to use their knowledge in examinations

Ability to write down a huge series of unconnected facts in no particular order We need to ensure we don’t reward this in our marking schemes. Importantly as we move away from fact recall Qs to higher order Qs it becomes the quality of the answer that is most important more than the quantity of facts the candidate writes down

Ability to research examiner's fields of interest and rote learn impressive aspects of that This is why examinations need to be blueprinted against the subject guidelines not examiners interests

Ability to interpret what examiners are thinking This is why we need to give clear instructions in our questions and what we will talk about in the next section

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 7

Types of questions

Recall knowledge

Apply knowledge

Stimulus formats

Question types can be categorised broadly by two aspects The stimulus format: fact recall or applied knowledge

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 8

Recall knowledge

Apply knowledge

Selected response

Constructed response

Stimulus formats

Resp

onse

form

ats

The response format: what the candidate does to indicate their response: Selected response: eg MCQs where candidates select their response from prespecified options. Constructed response – eg long and short answer Qs where candidates generate their own response

Recall knowledge

Apply knowledge

Selected response

Selected recall

Selected applied

Constructed response

Constructed recall

Constructed applied

Stimulus formats

Resp

onse

form

ats

Both selected response and constructed response questions can be fact recall and both can be applied knowledge types The literature is clear that it is the stimulus format (fact recall vs applied) that is the most important determiner of what is measured, and the response format is of less importance

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 9

Advantages of constructed response Qs (long answer)

• Non-cued writing – can measure what candidate’s spontaneously think of

• Easy to create• Logic, reasoning, steps in problem solving• Ease of partial credit scoring• In-depth assessment

Non-cued writing – can measure what candidate’s spontaneously think of Cueing means that a candidate can answer a multiple-choice question correctly by recognising the correct option, rather than by generating the answer spontaneously. Cueing clearly exists as you will recognise if think about how you are thinking when you look at an MCQ. Your strategy is often to look at the options first and try and recognise the correct answer rather than working it out.

Easy to create Logic, reasoning, steps in problem solving Ease of partial credit scoring In-depth assessment

A good long answer question asks the candidate to process information or knowledge rather than to reproduce it, by, for example, requiring candidates to set up a reasoning process or summarise information, or asking them to apply a known principle in different contexts, etc

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 10

Limitations of constructed response Qs (long answer)

• Subjective scoring• Reproducibility issues• Limited breadth of content• Inefficient?

– Marking time– Testing time

• Quality control tends to be qualitative

Subjective scoring This is a frequently levelled criticism, but I do not see it as a big problem. Assessment is a process of judgement, similar to diagnosis. Rather than trying to take the judgement out of the equation, we need to ensure that the judgement is made on good, relevant and sufficient evidence, by appropriately qualified judges.

Reproducibility issues This is referring to the agreement between different judges or the same judge looking at it at a different time. This can be a problem, but as above, is one we should acknowledge is inherent in complex judgement, and we need to take other approaches to quality control.

Limited breadth of content This is a definite limitation we need to be aware of. Because the questions take longer than say MCQ questions, we can ask far fewer and therefore we are taking a much smaller sample of the topics in the subject. We need to remember that the smaller the sample, the less able we are to generalise the performance of the candidate to the whole subject area, which is actually the aim of the examinations.

Inefficient? This is often said, but only true when you are talking about large numbers of candidates (like 1000s). Long answer questions definitely take longer to mark than short answer questions, and also you need longer testing time in order to ask a reasonable number of questions to sample the topics. However because good MCQs take at least as long to write as good long answer Qs, and you need more of them, the time trade off only breaks even when you have a large number of candidates, so this is not relevant for the College.

Quality control tends to be qualitative Rather than using statistical quantitative methods of quality assurance, we need to use more qualitative methods, which are aimed at ensuring and documenting the trustworthiness, credibility and dependability of the judgements. This includes having more than one examiner, checking for agreement in decisions, ensuring the expertise of examiners and triangulating evidence across all examination components and credentialing.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 11

Advantages of MCQ

• Can test a wide breadth of subjects in a time-efficient manner

• Sampling across the subject may therefore be more representative

• Less predictability - fosters a deep approach to learning• Reliability• Can construct examinations of known difficulty

(assuming psychometric analysis carried out)• Efficient and cost effective for large numbers of

candidates• Possibility of automated development

Many of these we have already discussed in comparison to long answer questions • Can test a wide breadth of subjects in a time-efficient manner • Sampling across the subject may therefore be more representative • Less predictability - fosters a deep approach to learning

Good MCQs (that test higher order thinking) are not as easily predicted by students and studies have demonstrated that they foster deep learning.

• Reliability Very reliable since there is no judgement involved in scoring

• Can construct examinations of known difficulty (assuming psychometric analysis carried out) • Efficient and cost effective for large numbers of candidates • Possibility of automated development

See next slide

In this study experts developed a schema for decision making in a particular scenario (diagnosis of wound infection) and then this was used to automatically generate 1248 different MCQ questions automatically. Obviously this is still resource intensive to develop the schema, but offers a possibility for future automated generation which is exciting.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 12

Disadvantages of MCQ

Realising the advantages requires procedures which makes them resource intensive and expensive

– Creation of a large question bank– Pretesting and statistical analysis of Qs– Post examination statistical analysis

Realising the advantages requires procedures which makes them resource intensive and expensive • Creation of a large question bank

If you want to draw questions from a bank rather than create new each year, you need the bank to contain 10 times the number of questions you will be drawing. Therefore for a 2 hour, 120 MCQ exam you need a bank of at least 1200 questions written

• Pretesting and statistical analysis of Qs Ideally MCQs should be pretested on a sample similar to candidates – eg existing members or fellows. You need to do this to detect problems with questions because problems with questions are common – some studies have found flaws in 36-65% of questions of which 10-15% are serious enough to influence pass-fail decisions

• Post examination statistical analysis Sophisticated statistical analysis should be performed to set the passing cut point and check the performance of questions

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 13

Problems that can occur with MCQs

• Candidates can’t indicate their interpretation of the Q• Fact recall Qs are easier to write therefore tend to

dominate• Some topics are particularly difficult to write MCQs for• Identification of a correct response requires a

different type of thinking from candidates than generation of a response

• Guessing can be rewarded• What is correct is still a subjective decision• Circulating recall papers may reduce even higher

order Qs to recall Qs

Candidates can’t indicate their interpretation of the Q If a question turns out to not to be well worded or unclear, it is possible for examiners to see this in the answer, or for candidates to say so in the answer, whereas they are not able to in MCQs (which is why it is more critical for MCQs to have pretesting)

Fact recall Qs are easier to write therefore tend to dominate Some topics are particularly difficult to write MCQs for Identification of a correct response requires a different type of thinking from candidates than generation of a response Guessing can be rewarded

This is often of concern but shouldn’t be. Even with 3 option MCQs (where there is a 33% chance of getting the answer right from guessing) the probability of scoring 70% correct on a 30 question test is 0.0000356. In any case, candidates with any degree of preparation will use partial knowledge to select answers rather than guessing strategies.

What is correct is still a subjective decision See the next slide

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 14

Leastcorrect

Mostcorrect

D E AC B

In biological systems we all know that nothing is ALWAYS true and you can never say never. Therefore unless questions are about trivial content (which they should not be) both the correct option and the distractors are likely to have some degree of truth to them. This is especially the case for the type of questions w really want to design which aim to test candidate’s knowledge and decision making in complex situations that require judgement. While it is important that the answer keyed a “correct” is much more correct than the other possible answers, there will always be an element of judgement in this decision.

Problems that can occur with MCQs

• Candidates can’t indicate their interpretation of the Q• Fact recall Qs are easier to write therefore tend to

dominate• Some topics are particularly difficult to write MCQs for• Identification of a correct response requires a

different type of thinking from candidates than generation of a response

• Guessing can be rewarded• What is correct is still a subjective decision• Circulating recall papers may reduce even higher

order Qs to recall Qs

Circulating recall papers may reduce even higher order Qs to recall Qs This is just as much a problem with long answer Qs as with MCQs. But MCQs are so expensive to produce that they tend to be held for reuse – hence the problem.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 15

This shows a screenshot from a website called PasstheFRACP.com where FRACP refers to the Fellowship of the Royal Australian College of Physicians. Several past recall papers are available on this site and undoubtedly there are many others for all sorts of examinations that are not so freely available

General issues with question writing

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 16

CommunicationThe examination questions are the question setter’s expression of the question setter’s task.

The candidate’s answer represents the candidate’s expression of the candidate’s interpretation of the questions.

The marker evaluates the marker’s interpretation of the candidate’s expression of the candidate’s answer.

The marker uses the marker’s interpretation of the setter’s expression of the setter’s task to evaluate the candidate’s answer.

Modified from Pollitt & Ahmed1999

Exams are a communication between 3 people: the candidate, the marker and the question setter. In only some cases is the question setter the same person as the marker. Each of these people has their own interpretation of the communication. • The question setter has a task in mind and expresses this in writing. Anyone reading the question is

interpreting the words in order to arrive at their own understanding of the task the question setter had in mind. As you can imagine, if the task is not specified very precisely and clearly there is plenty of room for things to go wrong at this step.

• The candidate is one of the people interpreting the question. They formulate an answer, and then have to express that in words on the page. Their expression may or may not represent their answer very well. You can imagine that many factors can interfere with this process, not all of which are things we are trying to differentiate candidates on.

• The marker has to interpret the candidate’s expression of their answer on the page and from it, they are making inferences about what the candidate knows and can do. The inferences may be well founded, or perhaps more tenuous. The candidate’s expression can contribute a lot to the interpretations made by the marker.

• In order to make an evaluation of the candidate’s performance, the marker must also interpret the question, using the setters expression of the task the setter had in mind. The evaluation the marker makes of the task may be different to the evaluation the candidate makes of the task.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 17

Expectations and stereotypes

Examples:• male animal case• differential diagnoses candidate would consider• expectation of hard questions• expectation that Qs will ask about what

something is rather than what it is not

All of us develop schemas or stereotypes which help us categorise and process complex information quickly. Particular features of questions trigger certain schemas and hence expectations. Anxiety can make us “close” on a certain schemas too quickly and not look for others, therefore exams can be measures of a propensity to anxiety rather than a measure of what we want to measure. In addition, since having well developed schemas is a mark of expertise, very good candidates may be expected to make much use of schemas. Question writers need to be aware of the existence of such schemas and ensure that they very clearly signal if they want candidates to step out of these schemas. For example a scenario about a male animal will likely trigger schema to do with differential diagnoses of male animals. If you want candidates to discuss differential diagnoses of both male and female animals, you will need to understand that the male animal schema may already have been closed on in the candidate’s mind, and you will need to make it very very clear that you want them to change out of this if this is the case. Or, perhaps better still, redesign the question to account for the likely use of this schema. Other examples of schema that might be elicited by questions include: • All differential diagnoses for a clinical sign vs those only applicable in a particular case • The expectation that questions should be hard, which may prevent candidates from seeing the easy

solution to a question • The expectation that Qs will ask about what something is rather than what it is not Question writers can reduce the negative effects of expectations by • using clear language, • including only relevant and authentic scenarios, • being clear about the kind of answer and level of answer required and • being aware of the kind of implicit expectations that come into play in reading comprehension

processes. • using very very clear signalling if questions contradict expectations (eg using bold font)

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 18

Contextualising Qs

• Context is good because it brings relevance and authenticity

• Allows assessment of concrete or specific examples not abstract concepts or generalisations

• Allows assessment of applied learning (doing not just knowing)

• All these carry with them a potential for bias.

Context is good because it brings relevance and authenticity Allows assessment of concrete or specific examples not abstract concepts or generalisations Allows assessment of applied learning (doing not just knowing) All these carry with them a potential for bias.

Relevance is a personal Concrete examples will be more familiar to some candidates than others Application of knowledge may actually just be recall if candidates have considered that example or a similar one in their learning

Context activates concepts in the mind and therefore may activate the wrong contexts of schemata (as we saw in the last slide)

Writing long answer questions

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 19

Q parts – the task

• For MCQs the task is given at the beginning of the examination or MCQ section:

Choose one BEST answer

• For long answer Qs you need to specify the task

Don’t write questions; write tasks

What is your diagnosis?

State the most likely diagnosis orState the most likely diagnosis and explain your reasoning orDiscuss the differential diagnoses you would consider in this caseor…..

Tell candidates what you want them to do rather than asking them a question They need to know whether you intend for them to write a one word answer, or to explain it or justify it and so on. If you do not make this clear they will give you a “just in case” answer, and their answer may seem to you to be unfocused or off topic. In addition they will waste time on this instead of concentrating on other questions in the paper.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 20

Instructional verb examples

Compare: to find similarities between things, or to look for characteristics and features that resemble each other.

Contrast: to find differences or to distinguish between things.

Discuss: to present a detailed argument or account of the subject matter, including all the main points, essential details, and pros and cons of the problem, to show your complete understanding of the subject.

Define: to provide a concise explanation of the meaning of a word or phrase; or to describe the essential qualities of something.

Explain: to clarify, interpret, give reasons for differences of opinions or results, or analyse causes.

Illustrate: to use a picture, diagram or example to clarify a point.

You need to provide an instructional verb for all questions

Specify boundaries of the answer

Speciese.g. “in both dogs and cats…”

Quantities and amountse.g. “Provide 5 reasons why…”

With reference toe.g. “ With reference to the published research from ..”

You also need to specify the boundaries of the answer required.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 21

List the clinical signs of hypothyroidism in dogs.

List the three most common owner-observed clinical signs of hypothyroidism in dogs andexplain how thyroid hormone deficiency leads to each of these signs.

Q parts – the scope

An example of specifying the boundaries or scope

Examples of problems….

Name two (2) diagnostic tests you would run next to investigate the cause of this dog’s current illness.

This type of question might come after a scenario. While it seems like a perfectly reasonable question think about the words “diagnostic tests” and what they really mean and what sort of schema they elicit. Potential problems for candidates include: • The words diagnostic tests might only elicit schema containing a narrow set of types of investigation

such as laboratory tests, and not include things like imaging or taking an animal’s temperature. • The specification of 2 diagnostic tests is unclear because different candidates and examiners may

interpret what is one test differently. For example is a biochemical panel one test or 17? Is a PCV and TPP, commonly performed together, one test or two? Even if a candidate recognises these issues, they may have trouble deciding what to do and waste time worrying about it, and anxiety may affect their performance so we end up not measuring what we want to measure.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 22

Examples of problems

Outline your approach to confirming the initial clinical diagnosis and a management and prevention plan for this problem. This discussion should include an outline on further observations taken about ….

Here the examiners instructions request an outline (a description of the main features of a sketch general terms or a summary) but then the wording suggests that a discussion should have been completed (an examination of the argument, a sifting of considerations for and against, a debate). Because outline appears first, the candidate may have already “closed” on outline and may not even notice the word discussion. (Did you notice it when you read the question?).

Examples of problems

…list in dot point form: the gross pathological features, the characteristic histopathologicalchanges, and the clinical pathology changes. In your discussion, list one antemortemtest/procedure that can be used to aid in the diagnosis …

Similarly, in this question the first instruction to list the answer is then contradicted by the suggestion that a discussion should actually have been completed.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 23

Examples of problems

A veterinarian asks you for assistance in designing a protocol for the delivery of a vaccine for cats in their practice. What factors would you take into consideration in designing this protocol?

Here is an example of where a technical term that is also used in everyday language “delivery” could elicit everyday schemas rather than technical schemas. For example does the question refer to delivery of the vaccine from the manufacturer to the practice, or the process of injecting the cat, or the recommended intervals for administering vaccinations?

Examples of question problems

Are there any clinical features which can help you determine a patient’s prognosis?

Here is an example appearing after a scenario, where the wording of the question suggests only a yes-no answer. Presumably the answer “yes” is 100% correct in this situation. Candidates will usually supply more detail in these situations regardless, not because the task is clear, but because the schema they will invoke will suggest to them they more explanation of their response is required. However if a simple answer is all that the examiner requires, this will be a waste of their time. Therefore it is important to be very clear.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 24

Examples of question problems

Describe and discuss the following:a) preparedness

Here is an example where a word can have different meanings for different groups of people. For example, in NZ, preparedness has an everyday meaning that is very different to what is likely required of candidates in a veterinary behaviour examination.

Examples of question problems

State what you believe is your most likely diagnosis.

Here the question asks candidates to say what they believe. There is no wrong answer since the candidate’s belief is their belief whether it is true or not. What the examiner really wants to know is whether the candidates belief is the same as their own belief about the answer, or is justifiable in some other manner.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 25

Examples of question problems

Discuss commonly found tumours and tumour-like disorders associated with the oral cavity and dental tissues of the horse.

Here is an example of a question where the scope is potentially endless For example you could discuss clinical signs or the histopathologcal diagnosis including special immunological tests, or the treatment, or…. The scope needs to be clearly defined for candidates.

Examples of question problems

How would you localise the site of the lesion?

This question appeared after a neurological scenario was presented. The wording suggests the required answer would involve the diagnostic methods you would use to arrive at the site of the lesion. However this was the marking scheme provided by the question writer:

Spinal lesion between T3 and L3 Clearly the question does not actually ask for the answer the examiner wants. Note that it is impossible to detect this fault without also seeing the marking scheme and hence the need to always evaluate both together when checking the wording of questions.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 26

Writing MCQs

Focus on a single important concept

• Test application of knowledge not recall• Don’t test “trivial” knowledge• Focus on real life problems• Clinical vignettes are a good basis for a Q

Placing Qs in vignettes does not increase the difficulty for high performing candidates but does increase the difficulty for low performing candidates

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 27

A 7-year-old mare has had intermittent signs of moderately severe colic for the past 48 hours. Heart rate is 56 beats/min. Hydration, acid-base balance, and electrolyte parameters are near normal. On rectal examination, the left dorsal and ventral colon feels distended and is felt coursing in a dorsocranial direction. The spleen is displaced caudomedially. Which of the following is the most likely diagnosis?

A. Cecocolic intussusception or cecal inversionB. Displacement of the left colon over the nephrosplenic

ligamentC. Ileocecal intussusceptionD. Infarction of the large colonE. Volvulus of the large colon and the cecum

A example of a clinical vignette – assesses ability to diagnose – deep question. The stem is relatively long, and each option is relatively short. This is the overall structure to aim for.

Keep options short

Iris prolapse is a common sequel to penetrating corneal wounds or ruptured corneal ulcers. Which of the following steps is NOT appropriate for the treatment of iris prolapse?

A. primary closure of the corneal laceration with 8-0 vicryland treatment with topical antibiotics to control infection.

B. placement of a nictitans flap and treatment with systemic antibiotics to control infection.

C. placement of a corneal graft with an overlying conjunctival pedicle graft and treatment with systemic antibiotics to control infection.

Avoid complex and long options like you see in this one – try to put as much as possible into the stem and leave the options short

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 28

Pose a clear task

Chyle is :-

A. The semifluid mass which empties from the stomach into the duodenum.

B. The lymph containing fat droplets found in the lacteals of the small intestine.

C. The contents of the gall bladder. D. The rounded piece of chewed food which passes

down the oesophagus when the animal swallows.

This is not a good question. You need to pose a clear task. This question has no task. If you covered up the options you would not be able to predict what the answer was by reading the stem.

Make all distractors plausible and homogenousWhich of the following statements regarding hepatic encephalopathy is true?

A. Patients typically present with asymmetrical neurological deficits B. The most effective and appropriate anticonvulsant to use for a

patient that is seizuring due to hepatic encephalopathy is phenobarbital

C. Abdominal radiographs of dogs with portosystemic shunts will often show an enlarged liver

D. Cats with portosystemic shunts often exhibit ptyalism as a clinical sign

E. An appropriate treatment for hepatic encephalopathy is intravenous neomycin

This question also has a problem. The incorrect and correct options are about different things – some are about clinical signs, some about diagnostic investigation and some about treatment. Instead you should aim to have all the options homogenous – all about diagnosis, all about treatment etc. As mentioned before, there is a tendency for only trivial facts to be 100% true or 100% false in biological systems. The types of concepts we really want to examine though are more difficult and subject to exceptions that mean they are not always true. Therefore in this sort of question you are trying to work out which is more true than the others on a scale of trueness. If the options are not all on the same scale of trueness because they are about different things, then the question becomes irrelevantly difficult or even impossible to answer. See the next slide for an example that will help explain this concept.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 29

Include only ONE best answer

Which of the following is true?

A. Bananas are greenB. Motorbikes are faster than carsC. Boys are taller than girls

Here is an extreme example to illustrate the issue raised in the last slide. This question is impossible to answer. The reason is because all the options are true sometimes and false other times. In order to try and answer it, you are trying to work out which is the most true, but you are comparing completely different things. Is “motorbikes are faster than cars” more or less true than “boys are taller than girls”? Impossible to say.

Include only ONE best answer

Leastcorrect

Mostcorrect

D E AC B

This figure shows the concept diagrammatically. All the items on a multiple choice question should lie on a single scale of trueness. While we expect that in non-trivial questions options will not all be 100% true or 100% false, there must be one option that is well separated on the trueness scale from the distractors (incorrect options).

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 30

Avoid none of the above

Which of the following is true regarding ligament injuries?

A. Ligament injuries are appropriately referred to as “strains” or “sprains”.

B. Surgical intervention is indicated for treatment of second-degree sprains with demonstrable instability.

C. The elastic nature of ligaments allows 30% elongation before permanent deformation.

D. Following surgical repair of ligaments, immobilization via ESF or external coaptation is contraindicated, as range of motion is critical to successful repair.

E. None of the above.

This question has a number of problems, but one of them is the use of “none of the above” as an option. This is problematic in questions where judgement is involved (which should be all questions in College exams!) and where the options are not absolutely true or false. Either remove this option all together or fix it by replacing it with an option that is more specific. For example if the options are a list of possible drugs to prescribe, an option of “no drug should be given at this time” would be better than “none of the above.”

Avoid negative framing

Which of the following statements is false regarding arthrotomies?

A. When detachment of a ligament is necessary, this should be performed by osteotomy of the bony origin rather than transection of the ligament.

B. Complete closure of the synovium is necessary to prevent synovial fluid leakage into subcutaneous tissue.

C. Surgical removal of osteophytes is often followed by their relatively rapid regrowth, and has questionable value.

D. Monofilament absorbable suture material has a lower risk for long term infection than does braided nonabsorbable suture.

E. None of the above.

You should avoid questions framed in the negative sense - “Which of the following is false…” - completely. If you do include them, you must never include “none of the above” as an option. The question above illustrates why this is. It you were to choose option E are you saying that “none of the above” are false is true? Or are you saying that “none of the above” are false is false? And if “none of the above” are false is true then A,B,C and D must be all true. Therefore there is no false answer and the question cannot be answered If “none of the above” are false is false, then at least one of A, B, C, D must be true – but isn’t that a given? So these questions pose irrelevant difficulty – irrelevant because the difficulty has nothing to do with the learning outcomes we are trying to assess.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 31

Avoid double options

2-year-old male neutered Border Collie is presented with the following history and neurologic signs: …….. Which one of the following neuroanatomic localizations and diagnoses is CORRECT?

A. Left C1-C5 myelopathy, intervertebral disk ruptureB. Right C6-T2 myelopathy, fibrocartilagenous

embolismC. Right C1-C5 myelopathy, intervertebral disk ruptureD. Left C1-C5 myelopathy, fibrocartilagenous

embolism

This question has “double options”, in that each option contains two different types of fact. It is better to focus on one fact for each MCQ and split this type of Q into two questions.

3 options is enough

A horse suffering from an acute intestinal accident is MOST likely to have

A. primary respiratory acidosis B. primary respiratory alkalosis C. primary metabolic alkalosis D. primary metabolic acidosis

Three options for MCQs have been shown to be sufficient and there is no need to force a question to have 4 or 5 options if there is no natural list of 4 or 5 plausible options. However sometimes you do need to include 4 options, for example in the question above, because it allows a complete set of paired options. It is fine for MCQ s to have different numbers of options within one exam. Although you may worry about candidates guessing in 3 option questions, remember the statistical probability of achieving a score of 70% through random guessing on 30 three option MCQ items is 0.0000356.

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 32

Avoid technical item flaws• Grammatical cues• Logical cues: a subset of the options are collectively exhaustive• Absolute terms: terms such as “always”, “never” or “all” used in

options.• Vague terms such as “usually” and “frequently” used in options.• Long correct answer: correct answer is longer, more specific, or

more complete than other options• Word repeats: a word or phrase is included in the stem and in the

correct answer• Convergence: the correct answer includes the most elements in

common with the other options• Numeric data not stated consistently• Language in the options is not parallel; options are in an illogical

order• Stems (lead-ins) are tricky or unnecessarily complicated

As well as the things we have just gone through, there a whole bunch of technical item flaws you will see listed in guides for MCQ writing. These particular flaws can allow clued-in candidates to see the right answer because of faults in the question structure. I am not going to go through all of these in detail because advice about these is so widely available elsewhere, and these flaws are easy to avoid.

Key points

Liz Norman Massey University, ANZCVS Science Week Plenary 2013 page 33

Key points

• Its important to think about what it is you are looking for evidence of when designing Qs.

• Check that the Qs are going to be collecting evidence of that, and not something else.

• Concentrate on designing Qs that test application rather than fact recall.

“Effective item writers are trained, not born … “

Downing and Haladyna 2006, Handbook of test development ,p. 11

So I hope you all learned something today that will help you be more effective question writers.