13
Soft Comput DOI 10.1007/s00500-013-1141-4 METHODOLOGIES AND APPLICATION Automatic generation of multiple choice questions using dependency-based semantic relations Naveed Afzal · Ruslan Mitkov © Springer-Verlag Berlin Heidelberg 2013 Abstract In this paper, we present an unsupervised dependency-based approach to extract semantic relations to be applied in the context of automatic generation of multi- ple choice questions (MCQs). MCQs also known as multiple choice tests provide a popular solution for large-scale assess- ments as they make it much easier for test-takers to take tests and for examiners to interpret their results. Manual genera- tion of MCQs is a very expensive and time-consuming task and yet they often need to be produced on a large scale and within short iterative cycles. We approach the problem of automated MCQ generation with the help of unsupervised relation extraction, a technique used in a number of related natural language processing problems. The goal of Unsu- pervised relation extraction is to identify the most impor- tant named entities and terminology in a document and then recognise semantic relations between them, without any prior knowledge as to the semantic types of the relations or their specific linguistic realisation. We use these techniques to process instructional texts and identify those facts (termi- nology, entities, and semantic relations between them) that are likely to be important for assessing test-takers’ familiarity with the instructional material. We investigate an approach to learn semantic relations between named entities by employ- ing a dependency tree model. Our findings show that an optimised configuration of our MCQ generation system is Communicated by V. Loia. N. Afzal (B ) Faculty of Computing and Information Technology (FCIT), King Abdulaziz University, North Branch Jeddah, Jeddah, Saudi Arabia e-mail: [email protected] R. Mitkov Research Institute for Information and Language Processing (RIILP), University of Wolverhampton, Wolverhampton, UK capable of attaining high precision rates, which are much more important than recall in the automatic generation of MCQs. We also carried out a user-centric evaluation of the system, where subject domain experts evaluated automati- cally generated MCQ items in terms of readability, useful- ness of semantic relations, relevance, acceptability of ques- tions and distractors and overall MCQ usability. The results of this evaluation make it possible for us to draw conclu- sions about the utility of the approach in practical e-Learning applications. Keywords E-Learning · Automatic assessment · Natural language processing · Information extraction · Dependency tree · Unsupervised relation extraction · Multiple choice questions generation · Biomedical domain 1 Introduction In the modern era of information technology many organisa- tions and institutions offer diverse forms of training to their employees or learners and most of these training options utilise e-Learning. In the last two decades, e-Learning has seen exponential growth mainly due to the progress of the internet, which has made online materials accessible to more people than ever, allowing many corporations, educational institutes, governments and other organisations to profit. E-learning has also been referred to by different terms such as online learning, web-based training and computer-based training. The global market for e-Learning is growing at a rapid rate as many business organisations and educational institutes are seeking to deliver their learning in a smarter and more cost-effective way. E-learning is quite adaptive and has broad potential. E-learning products have a huge market world-wide: the UK e-learning market alone was esti- 123

Automatic generation of multiple choice questions using dependency-based semantic relations

  • Upload
    ruslan

  • View
    225

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Automatic generation of multiple choice questions using dependency-based semantic relations

Soft ComputDOI 10.1007/s00500-013-1141-4

METHODOLOGIES AND APPLICATION

Automatic generation of multiple choice questionsusing dependency-based semantic relations

Naveed Afzal · Ruslan Mitkov

© Springer-Verlag Berlin Heidelberg 2013

Abstract In this paper, we present an unsuperviseddependency-based approach to extract semantic relations tobe applied in the context of automatic generation of multi-ple choice questions (MCQs). MCQs also known as multiplechoice tests provide a popular solution for large-scale assess-ments as they make it much easier for test-takers to take testsand for examiners to interpret their results. Manual genera-tion of MCQs is a very expensive and time-consuming taskand yet they often need to be produced on a large scale andwithin short iterative cycles. We approach the problem ofautomated MCQ generation with the help of unsupervisedrelation extraction, a technique used in a number of relatednatural language processing problems. The goal of Unsu-pervised relation extraction is to identify the most impor-tant named entities and terminology in a document and thenrecognise semantic relations between them, without any priorknowledge as to the semantic types of the relations or theirspecific linguistic realisation. We use these techniques toprocess instructional texts and identify those facts (termi-nology, entities, and semantic relations between them) thatare likely to be important for assessing test-takers’ familiaritywith the instructional material. We investigate an approach tolearn semantic relations between named entities by employ-ing a dependency tree model. Our findings show that anoptimised configuration of our MCQ generation system is

Communicated by V. Loia.

N. Afzal (B)Faculty of Computing and Information Technology (FCIT),King Abdulaziz University, North Branch Jeddah,Jeddah, Saudi Arabiae-mail: [email protected]

R. MitkovResearch Institute for Information and Language Processing (RIILP),University of Wolverhampton, Wolverhampton, UK

capable of attaining high precision rates, which are muchmore important than recall in the automatic generation ofMCQs. We also carried out a user-centric evaluation of thesystem, where subject domain experts evaluated automati-cally generated MCQ items in terms of readability, useful-ness of semantic relations, relevance, acceptability of ques-tions and distractors and overall MCQ usability. The resultsof this evaluation make it possible for us to draw conclu-sions about the utility of the approach in practical e-Learningapplications.

Keywords E-Learning · Automatic assessment · Naturallanguage processing · Information extraction · Dependencytree · Unsupervised relation extraction · Multiple choicequestions generation · Biomedical domain

1 Introduction

In the modern era of information technology many organisa-tions and institutions offer diverse forms of training to theiremployees or learners and most of these training optionsutilise e-Learning. In the last two decades, e-Learning hasseen exponential growth mainly due to the progress of theinternet, which has made online materials accessible to morepeople than ever, allowing many corporations, educationalinstitutes, governments and other organisations to profit.E-learning has also been referred to by different terms suchas online learning, web-based training and computer-basedtraining. The global market for e-Learning is growing at arapid rate as many business organisations and educationalinstitutes are seeking to deliver their learning in a smarterand more cost-effective way. E-learning is quite adaptiveand has broad potential. E-learning products have a hugemarket world-wide: the UK e-learning market alone was esti-

123

Page 2: Automatic generation of multiple choice questions using dependency-based semantic relations

N. Afzal, R. Mitkov

mated at between £500 m and £700 m in 2009.1 The future ofe-Learning depends on the development of IT technologies.In this paper, we present an alternative to the lengthy andtime-consuming activity of developing multiple choice testsmanually by proposing a natural language processing (NLP)-based approach that relies on semantic relations extractedusing Information Extraction to automatically generate mul-tiple choice tests.

Multiple choice questions (MCQs) are a popular formof objective assessment in which a user selects one answerfrom a set of alternative choices for a given question. MCQsare straightforward to conduct, instantly offer an effectivemeasure of the test-takers’ performance and feedback testresults to the learner. The emergence of e-Learning has cre-ated even higher demand for multiple choice tests (MCTs)as it is one of the most effective ways for an e-learner to getfeedback. The fast developments of e-Learning technologieshave stimulated research into methods for automatic gener-ation of MCQs, in particular in the NLP field. Still, the workdone in the area of automatic generation of MCQs does nothave a long history (see Sect. 2 for more details). Most ofthe NLP approaches to this task take an instructional text asa starting point and identify important topical expressionsin it using either manually written rules or statistical tech-niques. Assuming such expressions refer to important factsconveyed by the texts, questions about these expressions aregenerated by the syntactic transformation of statements con-taining these expressions. After that, several other semanti-cally or linguistically similar phrases are selected as distrac-tor answers to produce a complete MCQ item.

In this paper, we present a new approach to automaticMCQs generation, where we first identify important con-cepts and the relationships between them in the input textsusing a dependency tree model. In order to achieve this, westudy unsupervised information extraction methods with thepurpose of discovering the most significant concepts andrelations in the domain text, without any prior knowledgeof their types or their exemplar instances (seeds). Informa-tion extraction (IE) is a vital problem in many informationaccess applications. The aim is to identify instances of spe-cific semantic relations between named entities of interest inthe text. Named entities (NEs) are generally noun phrases inthe unstructured text e.g. names of persons, posts, locationsand organisations while relationships between two or moreNEs are described in a pre-defined way e.g. “interact with”is a relationship between two biological objects (proteins).We will employ this approach for the automatic generationof MCQ task, where it will be used to find relations and NEsin educational texts that are essential for testing students’familiarity with key facts contained in the texts. In order to

1 http://www.e-learningcentre.co.uk/Reviews_and_resources/Market_Size_Reports_/The_UK_e_learning_market_2009.

achieve this, we need an IE method that has a high precisionand at the same time works with unrestricted semantic typesof relations (i.e. without reliance on seeds), while recall is ofsecondary importance to precision.

The specifics of the problem we address is that relevanttypes of semantic relations cannot be not known in advance—we would like to generate MCQs that cover a potentiallyunrestricted range of semantic relations between notions con-tained in instructional text. To achieve this, we investigateunsupervised IE methods, which aim to recognise entitiesand semantic relations between them, without any manuallyencoded prior knowledge as to their types or their annotatedexemplar instances.

The use of unsupervised IE for MCQ generation offers anumber of important advantages. First, because the approachfinds significant semantic relations between concepts, ratherthan just individual concepts, it is able to capture a widerrange of important facts contained in instructional texts anddo so with a greater accuracy, and eventually achieve greaterquality of MCQs. Second, in contrast to approaches that makeuse of manually encoded extraction rules, seed patterns orannotated examples, our approach has a potentially unre-stricted coverage, as it does not target any pre-defined typesof semantic relations. Third, since unsupervised IE learnsextraction patterns from unannotated text, our approach toMCQ generation makes it suitable to be applied in situa-tions where manually annotated text is unavailable or is veryexpensive to create, which is a common scenario in manye-Learning applications.

To validate this approach we employed two modes of eval-uation. In the intrinsic evaluation we examined the ability ofthe method to extract the most relevant semantic relationsfrom text by comparing automatically extracted relationswith a gold standard—manually annotated relations con-tained in a publicly available domain corpus. Having exper-imented with different configurations of the IE method, wewere able to establish that the approach is capable of achiev-ing high precision rates, albeit at the expense of recall, whichis particularly suitable for automatic assessment withine-Learning settings: generating high quality questions is usu-ally more important than ensuring that all significant facts arecovered by the test, as tests are commonly concerned onlywith a small subset of facts that test-takers are supposed to befamiliar with. In the extrinsic evaluation, the overall MCQgeneration system was evaluated in settings that simulatedactual use of the system by an expert designing MCQ items.Domain experts were asked to judge the quality of the finalMCQ items that our system generated in terms of such fac-tors as readability, relevance, and overall usability of ques-tions and distractors. The results of the extrinsic evaluationmake it possible for us to draw conclusions about the prac-tical utility of the use of unsupervised IE methods for MCQgeneration.

123

Page 3: Automatic generation of multiple choice questions using dependency-based semantic relations

Automatic generation of MCQs using dependency-based semantic relations

The main advantage of our approach is that it can covera potentially unrestricted range of semantic relations whilemost supervised and semi-supervised approaches can learnto extract only those relations that have been exemplified inannotated text seed patterns. Moreover, our approach is suit-able in situations where a lot of unannotated text is availableas it does not require manually annotated text or seeds. Theseproperties of the method can be useful, specifically, in suchapplications as MCQs generation or a pre-emptive approachin which viable IE patterns are created in advance withouthuman intervention.

2 Related work

Even though NLP has made significant progress in recentyears, NLP methods, and the area of automatic generationof MCT items in particular, have started being used in e-Learning applications only very recently.

The most significant study in this area was publishedby Mitkov et al. (2003, 2006), who presented a computer-aided system for the automatic generation of MCQ items.Their system mainly consists of three phases: term extrac-tion, stem generation and distractors selection. In the termextraction phase, source text is parsed by a parser. Theparser labelled each word in a source text with its part-of-speech and syntactic category. After the part-of-speechidentification nouns are sorted by their frequencies. The sys-tem uses certain rules and frequency thresholds for eachnoun and if any noun exceeds that threshold then that nounis regarded as a key term. In the stem generation phase,stems are generated from the eligible clauses of sentencesfrom the source text. A clause is considered eligible if it isfinite and has SVO (Subject–Verb–Object) or SV (Subject–Verb) structure. The system makes use of several rules inorder to generate a stem. In order to produce plausible dis-tractors, the system uses WordNet and retrieves hypernymsand coordinates of key terms from WordNet. The systemused a linguistic textbook in order to generate MCT itemsand found that 57 % of automatically generated MCT itemswere judged worthy of keeping as test items, of which 94 %required some level of post-editing. The main advantage ofthis approach is that it has given a completely new alterna-tive solution to the time-consuming and laborious activityof the manual construction of MCT items, which is at thepresent moment the most extensively used method for stu-dents’ knowledge evaluation. The main disadvantage of thissystem is its reliance on the syntactic structure of sentences toproduce MCT items as it produces questions from sentenceswhich have SVO or SV structure. Moreover, the identificationof key terms in a sentence is also an issue as identificationof irrelevant concepts (key terms) results in unusable stemgeneration.

Sumita et al. (2005) presented a system which automat-ically generated questions in order to measure test-takers’proficiency in English. The method described in this papergenerates Fill-in-the-Blank Questions (FBQs) using a cor-pus, a thesaurus and the Web. The FBQs are created byreplacing verbs with gaps in an input sentence. The possi-ble distractors are retrieved from a thesaurus and then newsentences are created by replacing each gap in the input sen-tence with a distractor. They conducted their experiments onnon-native speakers of the English Language and found thattheir method is quite effective in measuring proficiency ofEnglish in non-native speakers. The major shortcoming ofthis approach is that the selection of wrong input sentencesresults in FBQs which even native speakers are unable toanswer. Moreover, the quality of generated FBQs is evalu-ated by a single English native speaker and it needs to beevaluated further.

Brown et al. (2005) used an approach that tests the knowl-edge of students by automatically generating test items forvocabulary assessment. Their system produced six differenttypes of questions for vocabulary assessment by making useof a lexical database, WordNet. The six different types ofquestions include: definition, synonym, antonym, hypernym,hyponym and cloze questions. In order to produce the defin-ition question, the system made use of the WordNet glossesto select the first definition which did not include the tar-get word. In synonym questions, it requires the matchingof a target word to its synonym, which is extracted fromWordNet. An antonym question requires a word to match itsantonym which is also obtained from WordNet while hyper-nym and hyponym questions require the matching of a wordto its hypernym and hyponym respectively. The cloze ques-tion requires the use of a target word in a specific context.In order to produce cloze questions the system made useof the WordNet glosses. The experimental results suggestedthat automatically generated questions produced using thisapproach provides an efficient way to automatically assessword knowledge. The approach presented in this paper reliedheavily on WordNet and is unable to produce any questionsfor words which are not present in WordNet.

Chen et al. (2006) presented an approach for the semi-automatic generation of grammar test items by employingNLP techniques. Their approach was based on manuallydesigned patterns which were further used to find authen-tic sentences from the Web and were then transformed intogrammatical test items. Distractors were also obtained fromthe Web with some modifications in manually designed pat-terns e.g. changing part of speech, adding, deleting, replac-ing or reordering of words. The experimental results of thisapproach revealed that 77 % of the generated MCQs wereregarded as worthy (i.e. can be used directly or needed onlyminor revision). The disadvantage of this approach is thatit requires a substantial amount of effort and knowledge to

123

Page 4: Automatic generation of multiple choice questions using dependency-based semantic relations

N. Afzal, R. Mitkov

manually design patterns which can later be employed by thesystem to generate grammatical test items.

A semi-automatic system to assist teachers in order to pro-duce cloze tests based on online news articles was presentedby Hoshino and nakagawa (2007). In cloze tests, questionsare generated by removing one or more words from a pas-sage and the test takers have to fill in the missing words.According to this paper, one of the reasons for selectingnewspaper articles is that they are usually grammatically cor-rect and suitable for English education. The system focuseson multiple-choice fill-in-the-blank tests and generates twotypes of distractors: vocabulary distractors and grammar dis-tractors. For vocabulary distractors the system employs afrequency-based method while for grammar distractors thesystem makes use of ten grammar targets based on Tateno’s(2005) research. The system mainly consists of two compo-nents: pre-processed component and graphical user interface(GUI). The input documents are first pre-processed and thengo through various sub-processes which include: text extrac-tion, sentence splitting, tagging and lemmatisation, synonymlookup, frequency annotation, inflection generation, gram-mar target mark-up, grammar distractor generation and selec-tion of vocabulary distractors. The GUI allows the user tointeract with the system. User evaluation reveals that 80 %of the generated items were deemed to be suitable.

A system for automatic generation of MCT items whichmakes use of domain ontologies was presented by Papasa-louros et al. (2008). Ontologies contain the domain knowl-edge of important concepts and relationships among theseconcepts. Ontologies contain knowledge which can beinferred, i.e. facts which are not explicitly defined. In orderto generate MCTs, this paper utilised three different strate-gies: class-based strategies (based on hierarchies), property-based strategies (based on roles between individuals) andterminology-based strategies. The MCTs generated by thisapproach were evaluated in terms of quality, syntactic cor-rectness and a number of questions were produced for dif-ferent domain specific ontologies. The experimental resultsrevealed that not all questions produced are syntactically cor-rect and in order to overcome this problem more sophisticatednatural language generation (NLG) techniques are required.Moreover, property-based strategies produced a greater num-ber of questions than class-based and terminology-basedstrategies but the questions produced by the property-basedstrategies are difficult to manipulate syntactically. Soft com-puting techniques can further enhance the classical versionof ontology and can prove more beneficial in practical deci-sion making (see De Maio et al. 2009; Carlsson et al. 2012for further details).

Most of the previous approaches to automatically gener-ating MCTs have been used for vocabulary and grammaticalassessments of English. Primarily most of the approachesgenerate questions by replacing some words from input text

and mostly rely on syntactic transformations (e.g. Mitkov etal. 2003, 2006), generating questions by transforming declar-ative sentences into questions. The main drawback of theseapproaches is that generated MCTs are mostly based onrecalling facts, so the major challenge is to automaticallygenerate MCTs which will allow the examiner/instructor toevaluate test takers not only on superficial memorisation offacts but also on higher levels of cognition.

3 Our approach

This paper solves this problem by extracting semantic ratherthan surface-level or syntactic relations between key con-cepts in a text via IE methodologies and then generating ques-tions from such semantic relations. The research in this paperis mainly focused on automatic generation of MCQs fromthe biomedical domain but the presented approach is quiteflexible and can easily be adapted to generate MCQs fromother domains as well. Many NLP technologies which deliverpromising results in the newswire or business domain do notyield good results in the biomedical domain due to inherentcomplexity of biomedical domain (Cohen and Hersh 2005).Moreover there is a lot of interest in techniques which canidentify, extract, manage, integrate and discover new hiddenknowledge from the biomedical domain. Moreover, Kara-manis et al. (2006) conducted a pilot study to use Mitkovet al. (2006) system in a medical domain and their resultsrevealed that some questions were simply too vague or toobasic to be employed as MCQ in a medical domain. Theyconcluded that further research is needed regarding questionquality and usability criteria.

There is a large body of research devoted to the problem ofextracting relations from texts of diverse domains. Most pre-vious work focused on supervised methods and tried to bothextract relations and assign labels describing their semantictypes. As a rule, these approaches required a manually anno-tated corpus, which is very lengthy and time-consuming toproduce.

Semi-supervised and unsupervised approaches rely onseed patterns and/or examples of specific types of relations(Agichtein and Gravano 2000; Stevenson and Greenwood2005, 2009). An unsupervised approach based on clusteringof candidate patterns for the discovery of the most impor-tant relation types among NEs from the newspaper domainwas presented by Hasegawa et al. (2004). In the biomedicaldomain, most approaches were supervised and relied on reg-ular expressions to learn patterns (Corney et al. 2004) whilesemi-supervised approaches exploited pre-defined seed pat-terns and cue words (Huang et al. 2004; Martin et al. 2004).

All of the aforementioned approaches mostly rely on pat-tern matching and require a large number of patterns inorder to extract the desired information. Overall, there has

123

Page 5: Automatic generation of multiple choice questions using dependency-based semantic relations

Automatic generation of MCQs using dependency-based semantic relations

Unannotated corpus

Named Entity Recognition

Semantic Relations

Extraction of Candidate Patterns

Patterns Ranking Evaluation

Rules

Distractors Generation

Distributional Similarity

Output (MCQ)

Question Generation

Fig. 1 System architecture

been little work on fully unsupervised approaches to rela-tion extraction, ones that would be able to locate significantrelations in a particular collection of texts. Semi-supervisedapproaches, while offering substantial savings on the prepa-ration of training data, are still inadequate to pre-definedtypes of relations that have to be instantiated in either seedextraction patterns, seed pairs of related named entities, orannotated examples. relation extraction in the biomedicaldomain has been addressed primarily with either supervisedapproaches or those based on manually written extractionrules, which are rather inadequate in scenarios where rela-tion types of interest are not known in advance.

Our assumption for relation extraction is that it is betweenNEs stated in the same sentence and that presence or absenceof relations is independent of the text prior to or succeedingthe sentence. According to the system architecture shownin Fig. 1, our system consists of three main components:IE, question generation and distractor generation. In an IEcomponent, unannotated text is first processed by NER andafter that candidate patterns are extracted from the text. Thecandidate patterns are then ranked according to their domainrelevance and we then intrinsically evaluate the candidatepatterns in terms of precision, recall and F-score. In automaticquestion generation components these extracted semanticrelations are automatically transformed into questions bytraversing the dependency tree of a sentence while in auto-matic distractor generation, distractors are generated using adistributional similarity measure.

3.1 Information extraction component

The IE component of our system relies on the adapted ver-sion of the linked chain pattern model for the unsupervisedextraction of semantic patterns. In the IE component, we havetreated every NE as a chain in a dependency tree if it is less

than 5 dependencies away from the verb root and the wordlinking the NEs to the verb root are from the group of con-tent words (Noun, Verb, Adverb and Adjectives) along withprepositions. We consider only those chains in the depen-dency tree of a sentence which contain NEs as it allows us toextract more meaningful patterns from the dependency treeof a sentence. The extracted semantic patterns are then rankedbased on their significance in the domain corpus. In order toscore semantic patterns for domain-relevance, we measurethe strength of association of a semantic pattern with thedomain corpus as opposed to the general corpus. We used var-ious information-theoretic concepts as well as statistical testsof associations for semantic patterns ranking. The IE compo-nent of the system is discussed in detail in Afzal et al. (2011).

3.2 Question generation

Automatic question generation is an important and emerg-ing area of research in NLP. Automatic question generationhas the potential to be employed in various areas such asintelligent tutoring systems, dialogue systems (Walker et al.2001) and educational technologies (Graesser et al. 2005).It is well-known that generating/asking good questions is acomplicated task (Graesser and Person 1994). Vanderwende(2007, 2008) emphasised the need for generating importantquestions from a given text.

Ruminator (Ureel et al. 2005) is a computer system whichgenerates questions from simplified input sentences but thissystem relies heavily on simplified input sentences and itdoes produce quite a large number of obvious or easy ques-tions. Due to this the quality of the generated questions is notparticularly good and moreover the generated questions arenot informative enough. Another question generation sys-tem presented by Schwartz et al. (2004) generates questionsin order to help the learning process. This system depends on

123

Page 6: Automatic generation of multiple choice questions using dependency-based semantic relations

N. Afzal, R. Mitkov

summarisation as a pre-processing step for the identificationof important questions in a given text. The authors noted thatquestion selections created by the system can be difficult toprocess.

Gates (2008) presented an approach that could automat-ically generate fact-based reading comprehension questionsby using a look-back strategy i.e. re-reading the text tofind the answer of a given question. The system presentedin this paper makes use of several existing NLP resourcesi.e. BBN’s IdentiFinder (Bikel et al. 1998) for recognisingnamed entities and specific Prop-Bank (Palmer et al. 2005)semantic arguments (e.g. ARG0, ARG1) using ASSERT(Pradhan et al. 2005). The system uses CBC4Kids cor-pus (news texts for children) and produces a reading pas-sage along with 5 randomly selected questions and clickableanswers in the text. The system measures the accuracy ofreading comprehension questions in terms of grammatical-ity, semantic correctness and practicality of the questionsproduced from the text. The system was able to generate81 % of acceptable questions from reading comprehensions.The drawback of this system is that most of the questions arequite obvious and too easy to answer.

Chen et al. (2009) presented an approach to generate self-questioning instructions automatically from any given infor-mational text, specially focusing on children’s text (childrenin grades 1–3). Previous work (Mostow and Chen 2009) auto-matically generated self-questioning instructions from nar-rative text by first generating questions from the text andthen augmenting the questions into strategy questions. Nar-rative text focuses on characters, their behaviours and theirmental states (e.g. happy, sad, think, regret) while informa-tional text places emphasis on descriptions and rationali-sations of certain objective phenomena. Due to the differ-ent nature of narrative text and informational text the sameapproach cannot be applied to both of them. The informa-tional text does not contain many mental states so the sys-tem has to make use of discourse markers which indicatecausal relationships (conditional and temporal contexts suchas if, after), modality (i.e. possibility and necessity) and infer-ence rules to generate questions from informational text. Thesystem evaluated the generated questions in terms of theirgrammatical correctness and how the generated questionsmade sense in the context of the text. From 444 total sen-tences in test corpus, the system generated 180 questions intotal, 15 questions about conditional contexts (86.7 % accept-able), 88 questions about temporal information (65.9 %acceptable) and 77 questions about modality (87.0 % accept-able).

Kalady et al. (2010) presented an approach to automat-ically generated questions based on syntactic and keywordmodelling. Their approach mainly relied on parse tree manip-ulation, named entity recognition and Up-keys (significantphrases in a document) to automatically generate factoid

and definitional questions from input documents. The fac-toid questions are generated from a single sentence and arevery simple (e.g. yes/no questions and wh-questions from thesubject, object, adverbials and prepositional phrases in thesentence). The process of generating definitional questionsis quite different as compared to factoid questions as theyhave descriptive answers and they used the concept of Up-keys that are keywords relating to the input document (Dasand Elikkottil 2010). The authors of this paper only evaluatedthe factoid-based questions by preparing a gold-standard ofquestions from a set of documents and comparing the auto-matically generated questions with them. They reported theresults in terms of precision, recall and F-score and theirsystem achieved a precision score of 0.46, recall 0.68 andF-score of 0.55. The main drawback of this approach is itsinability to handle lengthy and complex sentences, as well asthe fact that the automatically generated questions are verysimple and easy to answer.

It still remains a great challenge in the field of NLP todecide which part of the text is important in a given textas identification of key concepts present in a text is a criti-cal sub task during automatic question generation (Nielsen2008). Moreover, it is also important for the automati-cally generated questions to be semantically well-formed.Our research enables us to generate questions regarding theimportant concepts present in a domain. This is done byrelying on the unsupervised relation extraction approach asextracted semantic relations allow us to identify key infor-mation in a sentence. In this Section, we will describe theway we automatically transform those extracted semanticrelations (patterns) into questions. The questions automat-ically generated by our approach are more accurate as itautomatically generates questions from important conceptspresent in the given domain by relying on the semantic rela-tions. Our approach for automatic generation of questionsdepends upon accurate output of the NE tagger and theparser.

In order to automatically generate questions fromdependency-based patterns, we first assume that the user hassupplied a set of documents on which students will be tested.We will refer to this set of documents as “evaluation cor-pus” (e.g. in this research, we used a small subset of GENIAEVENT Annotation corpus2 as an evaluation corpus). As wefound in (Afzal et al. 2011) that NMI and CHI-score rank-ing methods are the best performing ranking methods so weselect semantic patterns attaining higher precision/ higher F-score at certain score thresholds using the score-thresholdingmethod. We match a learned relevance-ranked dependency-based pattern (GENIA corpus) with a dependency-based pat-tern of the evaluation corpus and the relative sentence is

2 http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/home/wiki.cgi?page=Event+Annotation.

123

Page 7: Automatic generation of multiple choice questions using dependency-based semantic relations

Automatic generation of MCQs using dependency-based semantic relations

then extracted from the evaluation corpus. The extracted sen-tence is then automatically transformed into a question. Theautomatic question generation process can be explained bythe following example:

Consider the following pattern expressing a semantic rela-tion between two types of proteins:

<NEID = ′′0′′ func = ′′SUBJ ′′ Dep

= ′′3′′> ′′PROTEIN ′′ </NE>

<WID = ′′1′′ f unc = ′′PREP′′ Dep

= ′′0′′>′′ of ′′ </W>

<NEID = ′′2′′ func = ′′ P ′′ Dep

= ′′1′′> ′′PROTEIN ′′ </NE>

<WID = ′′3′′ f unc = ′′ + FMAINV ′′ Dep

= ′′none′′>′′ contain′′ </W>

This pattern is matched with the following sentence, whichcontains its instantiation:

The predicted periplasmic domain of the PhoQ proteincontained a markedly anionic domain that could interact withcationic proteins and that could be responsible for resistanceto defensin.

As our dependency-based patterns always include a mainverb, in order to automatically generate questions we tra-verse the whole dependency tree of the extracted sentenceand extract all of the words which rely on the main verbpresent in the dependency pattern as shown in Figs. 2, 3:

So from Fig. 2, we extracted the following part from thesentence based on the presence of the main verb from thedependency pattern:

The predicted periplasmic domain of the PhoQ proteincontained a markedly anionic domain.

This part of the sentence is then transformed into the fol-lowing question by removing unnecessary information fromthat part of the sentence:

Which protein of the PhoQ protein contained a markedlyanionic domain?

3.3 Distractor generation

Distractors play a vital role in a MCQ as good quality dis-tractors ensure credible development of the learners’ knowl-edge. The automatic generation of plausible distractors is avery important task in the automatic generation of MCQs.During the process of automatic generation of distractors,the purpose is to find words which are semantically similarto the correct answer but incorrect in the given context.

In order to generate distractors, our approach relieson a distributional similarity measure. Distributional sim-ilarity is based on the distributional hypothesis whichstates that words occurring in similar contexts tend tohave similar meanings (Harris 1954; Firth 1957; Harsh-man 1970). Distributional similarity is a useful measure andis used in many NLP applications such as language mod-elling, information retrieval, automatic thesaurus generation(e.g. Grefenstette 1994; Hatzivassiloglou 1996; Lin 1998;Caraballo 1999) and word sense disambiguation. We preferto use distributional similarity measures in order to automat-ically generate distractors compared to other taxonomic sim-ilarity measures (such as WordNet) as they require a detailedmanually compiled ontology or a resource containing high

Fig. 2 Automatic question generation process from dependency trees

123

Page 8: Automatic generation of multiple choice questions using dependency-based semantic relations

N. Afzal, R. Mitkov

Fig. 3 Screenshot of extrinsic evaluation interface

123

Page 9: Automatic generation of multiple choice questions using dependency-based semantic relations

Automatic generation of MCQs using dependency-based semantic relations

Table 1 Examples ofautomatically generateddistractors

Correct answer Distractors

K562 cells M1 cells Yin-Yang 1 Alpha-tubulin NGF

STAT1 JAK3 NF-kappa B transcription factor STAT3

CD40 IL-2 IL-4 T lymphocytes TCR

Monocytes IFN-gamma IL-2 NF-kappa B IL-4

LMP1 HIV-1 Tat T lymphocytes NF-kappa B-mediated gene Fas ligand

ETS transcriptionfactors

Beta-promoter Gammac basalpromoter

Human alpha-globinpromoter

Transgenicthymocytes

quality definitions of all possible terms. Another drawbackof these taxonomic similarity measures is their limited cov-erage as they require all candidate NEs and terms foundin the instructional material to be recorded in the ontologywhich itself is a time-consuming and labour-intensive task.Once created, updating the ontology is again an expansiveand time-consuming task. Moreover, in these manually builtlexical resources matching the measure to the resource is aresearch problem itself as highlighted by Weeds (2003).

In order to produce distractors from a corpus, we carry outlinguistic processing using GENIA tagger.3 GENIA taggerprovides us with tokenised text along with the part-of-speech(PoS) information. In order to handle the data sparsenessissue, we build a pool of various biomedical corpora includ-ing GENIA, GENIA EVENT, BioInfer,4 YPD (Hodges et al.1999), Yapex,5 MIPS,6 WEB7 corpus and BioMed8 corpusin order to generate distractors from these corpora. Afterthe linguistic processing, we build a frequency matrix whichinvolves scanning sequential semantic classes (NEs) alongwith a notional word (Noun, Verb, Adverb and Adjective) inthe corpus and recording their frequencies in a database. Inthis way, we are able to construct distributional models of allcandidate NEs found in the text. Once accurate and informa-tive contextual representation of each semantic class has beenextracted along with their frequencies then semantic classesare compared using the distributional hypothesis that similarwords appear in similar context. The distractors to a givencorrect answer are then automatically generated by measur-ing its similarity to entire candidate named entities. At theend, we select the top 4 similar candidate named entities as thedistractors. We used Jensen-Shannon divergence (Rao 1983;Lin 1991) also known as information radius in order to mea-sure the distributional similarity between two NEs. It is a pop-

3 http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/tagger/.4 http://mars.cs.utu.fi/BioInfer/.5 http://www.sics.se/humle/projects/prothalt/#data.6 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC146421/.7 http://www.ncbi.nlm.nih.gov/.8 http://www.biomedcentral.com/info/about/datamining/.

ular distributional similarity measure based on a smoothedversion of Kullback–Leibler’s divergence measure (Kullbackand Leibler 1951; Cover and Thomas 1991; Pereira 1993) andhas been frequently employed in word clustering and nearestneighbour techniques (e.g. Dagan et al. 1999; Lapata et al.2001; Dhillon et al. 2002). Dagen et al (1997) performed acomparative study based on various distributional similaritymeasures and found that Jensen-Shannon consistently per-forms better than other distributional similarity measures.

Table 1 shows some examples of correct answers and auto-matically generated distractors by our approach. As our aim isto automatically generate plausible distractors, if the correctanswer is a protein then our approach automatically generatesall protein distractors that are involved in similar processesor belongs to the same biological category.

4 Extrinsic evaluation

In the extrinsic/user-centred evaluation process the real appli-cation users have a crucial role to play and the involvement ofreal users in the evaluation process may be different depend-ing upon the nature of application. According to Paroubeket al. (2007), user-centred evaluation is a paradigm in whichthe goal is the analysis of the utilisation of the NLP applica-tion and its various functionalities by users in their environ-ment. User-centred evaluation is quite frequently employedby the information retrieval (IR), machine translation (MT),natural language generation (NLG) and Automatic Sum-marisation research community (Hirschman and Mani 2003;Paroubek et al. 2007). We intrinsically evaluated the Infor-mation Extraction component of our MCQ system by usingthe automatic/gold-standard evaluation (Afzal et al. 2011).Here, we will evaluate the MCQ system as a whole in auser-centred fashion. The quality of automatically generatedMCQs is generally evaluated by human evaluators. Both eval-uators were vastly experienced, one evaluator’s main areaof research focuses on isolation, characterising and grow-ing stem cells from Keloid and Dupuytren’s disease andis currently working at Plastics and Reconstructive Surgery

123

Page 10: Automatic generation of multiple choice questions using dependency-based semantic relations

N. Afzal, R. Mitkov

Table 2 Extrinsic evaluation results

QR (1–3) DR (1–3) USR (1–3) QRelv (1–3) DRelv (1–3) QA (0–5) DA (0–5) MCQ usability (1–4)

Evaluator 1 2.42 2.98 2.38 2.37 2.31 3.25 3.73 3.37

Evaluator 2 2.25 2.15 2.46 2.23 2.06 3.27 3.15 2.79

Average 2.34 2.57 2.42 2.30 2.19 3.26 3.44 3.08

Research while the other biomedical expert is a bio-curatorwith a PhD in molecular biology and is currently workingfor the Hugo Gene Nomenclature Committee (HGNC). Theevaluation used in our approach is mainly concerned withthe adequate and appropriate generation of MCQs as well asthe amount of human intervention required. In other words,we want to evaluate our system in terms of its robustness andefficiency.

The extrinsic evaluations of the MCQ system follow sim-ilar criteria to those used by Farzindar and Lapalme (2004)for the evaluation of LetSum (an automatic legal text sum-mariser). In LetSum, extrinsic evaluations were based onlegal expert judgement. They defined a series of specific ques-tions for the judgement, which covers the main topics of thedocument. If a user is able to answer the questions correctlyby reading only the summary, it means that the summarycontains all of the necessary information from the sourcejudgement. Extrinsic evaluation can measure to what extenta specific NLP application can benefit from employing acertain method or measure.

We generated 52 MCQs using a small subset of GENIAEVENT Annotation corpus using NMI score > 0.01. In orderto evaluate the quality of the automatically generated MCQs,we followed the criteria:

Readability of automatically generated questions and distrac-tors are evaluated by asking whether it is clear, rather clearor incomprehensible.

Usefulness of semantic relation as questions are automati-cally generated by relying on semantic relations, it is impor-tant to evaluate the usefulness of semantic relations presentin a question by asking whether it is clear, rather clear orincomprehensible.

Relevance automatically generated questions should be rel-evant to the extracted sentence from which the question isgenerated automatically; similarly for automatically gener-ated distractors it is also important for them to be relevantto the automatically generated question and its answer. Bothautomatically generated questions and distractors are evalu-ated in terms of relevance by asking whether it is very rele-vant, rather relevant or not relevant.

Acceptability in order to evaluate the acceptability of auto-matically generated questions and distractors the evaluators

are asked to evaluate them from a scale of 0 to 5 (where 0means unacceptable and 5 means totally acceptable).

Overall MCQ usability at the end of this evaluation the eval-uators are asked to evaluate the overall usability of automati-cally generated MCQs by selecting one option from directlyusable, needs minor revision, needs major revision or unus-able.

In the extrinsic evaluation, two biomedical experts (bothpost-doc) were asked to evaluate MCQs according to theaforementioned criteria. Both evaluators were asked to givea scoring value for the readability of questions and distractorsfrom 1 (incomprehensible) to 3 (clear), usefulness of seman-tic relation from 1 (incomprehensible) to 3 (clear), questionand distractor relevance from 1 (not relevant) to 3 (very rel-evant), question and distractor acceptability from 0 (unac-ceptable) to 5 (acceptable) and overall MCQ usability from1 (unusable) to 4 (directly usable).

Table 2 shows the results obtained for a dependency-basedMCQ system where QR, DR USR, QRelv, DRelv, QA, DAand MCQ Usability represents Question Readability, Dis-tractors Readability, Question Relevance, Distractors Rele-vance, Question Acceptability, Distractors Acceptability andOverall MCQ Usability respectively.

We used weighted kappa Cohen (1968) to measure theagreement across major sub-categories in which there is ameaningful difference. K = 1 when there is a completeagreement among the evaluators while K = 0 when thereis no agreement. For example, in question readability we hadthree sub-categories: ‘Clear’, ‘Rather Clear’ and ‘Incompre-hensible’. In this case we may not care whether one evaluatorchooses question readability as ’Clear’ while another evalu-ator chooses ‘Rather Clear’ in regard to the same question.We might care if one evaluator chooses question readabilityas ‘Clear’ while another evaluator chooses question readabil-ity as ‘Incomprehensible’. In weighted kappa, we assigneda score of 1 when both of the evaluators agree while a scoreof 0.5 is assigned when one evaluator chooses the questionreadability of a question as ‘Clear’ while the other evalu-ator chooses it as ‘Rather Clear’. We used similar criteriaduring distractor readability, usefulness of semantic relation,question relevance and distractors relevance. In question anddistractor acceptability, we assigned an agreement score of 1when both evaluators agree completely while a score of 0.5was assigned when both of the evaluators choose question

123

Page 11: Automatic generation of multiple choice questions using dependency-based semantic relations

Automatic generation of MCQs using dependency-based semantic relations

and distractor acceptability between ‘0’ and ‘2’. A score of0.5 was also assigned when both of the evaluators choosequestion and distractor acceptability between ‘3’ and ‘5’. Inoverall MCQ usability, we assigned a score of 1 when bothof the evaluators agreed and a score of 0.5 was assignedwhen one of the evaluators assigned an MCQ as ‘DirectlyUsable’ while the other evaluators marked the same MCQas ‘Needs Minor Revision’. An agreement score of 0.5 wasassigned when an MCQ was assigned by one of the evaluatoras ‘Needs Major Revision’ while the other evaluator markedthe same MCQ as ‘Unusable’. We were able to attain a mod-erate agreement between the two evaluators.

5 Conclusions and future work

In this paper, we have presented an approach for automaticgeneration of MCQs based on unsupervised dependency-based semantic relations. Our approach consisted of threemain components: in the first component we used IE method-ologies to extract semantic relations and in the second com-ponent we automatically generated questions using thesesemantic relations. In the third component distractors wereautomatically generated using a distributional similaritymeasure.

In our previous work, we explored different information-theoretic and statistical measures to rank candidate seman-tic patterns by domain relevance as well as meta-ranking(a method that combines multiple pattern-ranking meth-ods). The domain ranking methods were used to selectthose patterns that capture the most important semantic rela-tions between key notions discussed in domain text. Theexperimental results revealed that the CHI and NMI rank-ing methods obtained higher precision than the other rank-ing methods. We employed two techniques to select patterns:rank-thresholding and score-thresholding and found that thescore-thresholding method performs better.

These extracted semantic relations allowed us to automat-ically generate better quality questions by focusing on theimportant concepts present in a given text. As dependency-based patterns always include a main verb, so we traversedthe whole dependency tree of the extracted sentence andextracted all words which rely on the main verb present in thedependency-based pattern in order to automatically generatequestions.

The plausible distractors were automatically generated byusing a distributional similarity measure. Distributional sim-ilarity is known to adequately model the semantic similaritybetween lexical expressions and it is used quite frequentlyin many NLP applications. There exist several distributionalsimilarity measures and previous studies suggest that Infor-mation Radius is one of the best performing distributionalsimilarity measures. Distributional similarity measures are

corpus-driven and have a broad coverage compared withthesaurus-based methods which have a limited coverage.

We extrinsically evaluated the whole MCQ system interms of question and distractor readability, usefulness ofsemantic relation, relevance, acceptability of question anddistractor and overall MCQ usability. Two domain expertsevaluated the system according to the aforementioned cri-teria and the results revealed that our approach is able toautomatically generate good quality MCQs and moreoverour approach is quite portable and can easily be extended toother domains too.

According to our knowledge, no body has so far usedsemantic relations based on information extraction method-ologies in the context of automatic generation of multiple-choice questions and so there is no direct comparison withother approach is possible. We were unable to carry outextrinsic evaluation of our system on a broader scale dueto the unavailability of resources and time restrictions butin future, we are planning to carry out extrinsic evaluationusing item response theory (Gronlund 1982) as conductedby Mitkov et al. (2006) and compare our results with theirapproach on a same dataset.

In the future, we would like to extend our approach inother domains. A further direction of research is to demon-strate its portability to other specialist domains and to studyits dependence on the amount and quality of corpora fromwhich IE patterns are learned. The Web, the biggest availablecorpus to the research community is quite frequently used inmany NLP applications today, so it would be interesting toinvestigate the use of the Web as a source for automatic dis-tractors generation. Wikipedia is another useful resource thatcan also be employed in automatic distractors generation.

References

Afzal N, Mitkov R, Farzindar A (2011) Unsupervised Relation extrac-tion using dependency trees for automatic generation of multiple-choice questions. In: Butz C, Lingras P (eds) Proceedings of theCanadian AI 2011, LNAI 6657. Springer, Heidelberg, pp 32–43

Agichtein E, Gravano L (2000) Snowball: Extracting Relations fromLarge Plaintext Collections. In: Proceedings of the 5th ACM inter-national conference on digital libraries

Bikel DM, Miller S, Schwartz R, Weischedel R (1998) Nymble: a high-performance learning name-finder. In Proceedings of theconferenceon applied natural language processing

Brown J, Frishkoff G, Eskenazi M (2005) Automatic question gener-ation for vocabulary assessment. In: Proceeding of HLT/EMNLP.Vancouver, BC

Caraballo SA (1999) Automatic construction of a hypernym-labelednoun hierarchy from text. In: Proceedings of 37th annual meeting ofthe association for computational linguistics, pp 120–126

Carlsson C, Brunelli M, Mezei J (2012) Decision making with a fuzzyontology. Soft Comput 16(7):1143–1152

Chen C-Y, Liou H-C, Chang JS (2006) FAST—an automatic genera-tion system for grammar tests. In: Proceedings of COLING/ACLinteractive presentation sessions, Sydney

123

Page 12: Automatic generation of multiple choice questions using dependency-based semantic relations

N. Afzal, R. Mitkov

Chen W, Aist G, Mostow J (2009) Generating questions automaticallyfrom informational text. In: Proceedings of the 2nd workshop onquestion generation. Brighton

Cohen AM, Hersh WR (2005) A survey of current work in biomedicaltext mining. Brief Bioinform 6(1):57–71

Cohen J (1968) Weighted kappa: nominal scale agreement with provi-sion for scaled disagreement or partial credit. Psychol Bull

Corney DP, Jones D, Buxton B, Langdon W (2004) BioRAT: extract-ing biological information from full-length papers. Bioinformatics20:3206–3213

Cover T, Thomas J (1991) Elements of information theory. Wiley,New York

Dagan I, Lee L, Pereira F (1997) Similarity-based methods forword sense disambiguation. In: Proceedings of the 35th annualmeeting of the association for computational linguistics, Madrid,p 56.63

Dagan I, Lee L, Pereira F (1999) Similarity-based models of word cooc-currence probabilities. Mach Learn J 34(1–3):43–69

Das R, Elikkottil A (2010) Auto-summarizer to aid a Q/A system. Int JComput Appl 1(1):113–117

De Maio C, Fenza G, Loia V, Senatore S (2009) Towards an automaticfuzzy ontology generation. In: Proceedings of IEEE internationalconference on fuzzy systems, pp 1044–1049

Dhillon IS, Mallela S, Kumar R (2002) Enhanced word clustering forhierarchical text classification (Tech. Rep. Nos. TR-02-17). Austin:Department of Computer Sciences, University of Texas

Farzindar A, Lapalme G (2004) LetSum, an automatic Legal Text Sum-marizing system. In: Gordon Thomas F (ed) Legal Knowledge andInformation Systems, Jurix 2004: the 7th annual conference. IOSPress, Berlin, pp 11–18

Firth JR (1957) A synopsis of linguistic theory 1930–1955. Studies inLinguistic Analysis. Blackwell, Oxford, pp 1–32

Gates D (2008) Generating Look-Back Strategy Questions from Expos-itory Texts. In: Workshop on the question generation shared task andevaluation challenge. NSF, Arlington

Graesser A, Person N (1994) Question asking during tutoring. Am EducRes J 31:104–137

Graesser AC, Chipman P, Haynes BC, Olney A (2005) Autotutor:an intelligent tutoring system with mixed-initiative dialogue. IEEETrans Educ 48(4):612–618

Grefenstette G (1994) Explorations in automatic Thesaurus discovery,vol. 278 of Kluwer International Series in Engineering and ComputerScience. Kluwer, Boston

Gronlund N (1982) Constructing achievement tests. Prentice Hall,New York

Harris Z (1954) Distributional structure. Word 10(23):146–162Harshman R (1970) Foundations of the parafac procedure: Models

and conditions for an “explanatory” multi-modal factor analysis. In:UCLA Working Papers in Phonetics, vol 16

Hasegawa T, Sekine S, Grishman R (2004) Discovering relations amongnamed entities from large corpora. In: Proceedings of ACL’04

Hatzivassiloglou V (1996) Do we need linguistics when we have statis-tics? A comparative analysis of the contributions of linguistic cuesto a statistical word grouping system. In: Judith K, Philip R (eds)The balancing act: combining symbolic and statistical approaches tolanguage, chapter 4. MIT Press, Cambridge, pp 67–94

Hirschman L, Mani I (2003) Evaluation. In: Mitkov R (ed) The OxfordHandbook of Computational Linguistics. Oxford University Press,UK, pp 414–429

Hodges PE, McKee AH, Davis BP, Payne WE, Garrels JI (1999) TheYeast Proteome Database (YPD): a model for the organization andpresentation of genomewide functional data. Nucleic Acids Res27(1): 69–73

Hoshino A, Nakagawa H (2007) Assisting cloze test making with a webapplication. In: Proceedings of society for information technologyand teacher education international conference, Chesapeake

Huang M, Zhu X, Payan GD, Qu K, Li M (2004) Discovering patterns toextract protein-protein interactions from full biomedical texts. Bioin-formatics, pp 3604–3612

Kalady S, Elikkottil A, Das R (2010) Natural language question genera-tion using syntax and keywords. In: Proceedings of the 3rd workshopon question generation

Karamanis N, Ha LA, Mitkov R (2006) Generating multiple-choicetest items from medical text: A pilot study. In: Proceedingd ofthe 4th international natural language generation conference, (July),pp 111–113

Kullback S, Leibler R (1951) On information and sufficiency. Ann MathStat 22:79–86

Lapata M, Keller F, McDonald S (2001) Evaluating smoothing algo-rithms against plausibility judgements. In: Proceedings of the 39thannual meeting of the association for computational linguistics(ACL-2001), Toulouse, pp 346–353

Lin D (1998) Automatic retrieval and clustering of similar words. In:Proceedings of international conference on computational linguis-tics and the annual meeting of the association for ComputationalLinguistics

Lin J (1991) Divergence measures based on the Shannon entropy. IEEETrans Inform Theory 37(1):145–151

Martin EP, Bremer E, Guerin G, DeSesa M-C, Jouve O (2004) Analy-sis of protein/protein interactions through biomedical literature: textmining of abstracts vs. text mining of full text articles. Springer,Berlin, pp 96–108

Mitkov R, An LA (2003) Computer-aided generation of multiple-choice tests. In: Proceedings of the HLT/NAACL 2003 workshop onbuilding educational applications using natural language processing,Edmonton, pp 17–22

Mitkov R, Ha LA, Karamanis N (2006) A computer-aided envi-ronment for generating multiple-choice test items. Natural Lan-guage Engineering 12(2). Cambridge University Press, Cambridge,pp 177–194

Mostow J, Chen W Generating Instruction Automatically for the Read-ing Strategy of Self-Questioning. In: Proceedings of the 14th inter-national conference on artificial intelligence in Education, Brighton

Nielsen R (2008) Question generation: Proposed challenge tasks andtheir evaluation. In: Proceedings of the workshop on the questiongeneration shared task and evaluation, challenge

Palmer M, Kingsbury P, Gildea D (2005) The proposition bank:an annotated corpus of semantic roles. Comput Linguist 31(1):71–106

Papasalouros A, Kanaris K, Konstantinos K (2008) Automatic gen-eration of multiple choice questions from domain ontologies. In:Proceeding of IADIS international conference e-learning

Paroubek P, Chaudiron S, Hirschman L (2007) Principles of evaluationin natural language processing. TAL 48(1/2007):7–31

Pereira F, Tishby N, Lee L (1993) Distributional clustering of similarwords. In: Proceedings of the 31st annual meeting of the associationfor computational linguistics (ACL-1993), Columbus, pp 183–190

Pradhan S, Hacioglu K, Krugler V, Ward W, Martin JH, Jurafsky D(2005) Support vector learning for semantic argument classification.Mach Learn 60(1):11–39

Rao CR (1983) Diversity: its measurement, decomposition, apportion-ment and analysis. Indian J Stat 44(A):1–22

Schwartz L, Aikawa T, Pahud M (2004) Dynamic language learningtools. In: Proceedings of the of the 2004 In-STIL/ICALL Symposium

Stevenson M, Greenwood M (2005) A semantic approach to IE patterninduction. In: Proceedings of ACL’05, pp 379–386

Stevenson M, Greenwood M (2009) Dependency pattern models forinformation extraction. Res Lang Comput

Sumita E, Sugaya F, Yamamoto S (2005) Measuring non-native speak-ers’ proficiency of English using a test with automatically-generatedfill-in-the-blank questions. In: Proceedings of the 2nd workshop onbuilding educational applications using NLP, pp 61–68

123

Page 13: Automatic generation of multiple choice questions using dependency-based semantic relations

Automatic generation of MCQs using dependency-based semantic relations

Tateno J, Sano H, Aizawa H, Nakamura T, Morita Y (2005) Producingenglish Educational materials form the BNC and releasing them onthe Web, IEICE Technical report, TL2005-1826, Tokyo, pp 7–12

Ureel L, Forbus K, Riesbeck C, Birnbaum L (2005) Question generationfor learning by reading. In: Proceedings of the AAAI workshop ontextual question answering, Pittsburgh

Vanderwende L (2007) Answering and questioning for machine read-ing. In: Proceedings of the 2007 AAAI spring symposium onmachine reading, Stanford

Vanderwende L (2008) The importance of being important: questiongeneration. In: Proceedings of the workshop on the question gener-ation shared task and evaluation challenge, Arlington

Walker MA, Rambow O, Rogati M (2001) Spot: a trainable sentenceplanner. In: Proceedings of NAACL

Weeds J (2003) Measures and applications of lexical distributional sim-ilarity. Ph.D. thesis, University of Sussex

123