Problem solving learning environments and assessment: A knowledge space theory approach

Computers & Education 64 (2013) 183–193

Contents lists available at SciVerse ScienceDirect

Computers & Education

journal homepage: www.elsevier .com/locate/compedu

Problem solving learning environments and assessment: A knowledgespace theory approach

Peter Reimann a,*, Michael Kickmeier-Rust b, Dietrich Albert b

aUniversity of Sydney, Centre for Research on COmputer-supported Learning and COgnition – CoCo, Faculty of Education and Social Work, Education Building A35, Sydney,NSW 2006, Australiab Technical University Graz, Knowledge Technologies Institute, Inffeldgasse 13, 8010 Graz, Austria

a r t i c l e i n f o

Article history:Received 12 May 2012Received in revised form22 November 2012Accepted 29 November 2012

Keywords:Architectures for educational technologysystemsIntelligent tutoring systemsEvaluation methodologies

* Corresponding author. Tel.: þ61 2 9351 6365.E-mail address: [email protected] (P. R

0360-1315/$ – see front matter � 2012 Elsevier Ltd. Ahttp://dx.doi.org/10.1016/j.compedu.2012.11.024

a b s t r a c t

This paper explores the relation between problem solving learning environments (PSLEs) and assessmentconcepts. The general framework of evidence-centered assessment design is used to describe PSLEs interms of assessment concepts, and to identify similarities between the process of assessment design andof PSLE design. We use a recently developed PSLE, the ProNIFA system, to illustrate the concepts ofstudent model, evidence model, and task model, concepts that provide for a close link between problemsolving teaching software and assessment concepts. We also introduce ProNIFA because it uses a math-ematical method developed in psychometric test theorydCompetency-based Knowledge Space Theory(CbKST)dfor building a student model based on observations on students’ problem solving performance.The experiences made with methods such as CbKST lead us to the conclusion that the time has come tomore frequently integrate assessment components into PSLEs, and to use problem solving and simulationenvironments as part of assessment environments. This will contribute to making assessment moreauthentic and less obtrusive, and making PSLEs more relevant in formal educational settings.

� 2012 Elsevier Ltd. All rights reserved.

“. assessment is the weakest link in learning to solve problems...” (Jonassen, 2011, p. 353).

1. Introduction

Problem solving has always played a central role in Jonassen’s work, as the opening sentence of his recently published book Learning toSolve Problems makes clear: “I argue that the only legitimate cognitive goal of education (formal, informal, or other) in every educationalcontext (public schools, universities and (especially) corporate training) is problem solving” (Jonassen, 2011, p. xvii). Problem solving isconsidered as an important, if not the most essential, feature of learning by many instructional models like Cognitive Apprenticeship(Collins, Brown, & Newman, 1989), Constructivist Learning Environments (Jonassen, 1999), or Problem-based Learning (Albanese, 1993).According to Merrill (2002), problem-based learning involves (i) the activation of prior knowledge or experiences regarding a certain topicas foundation for new knowledge, (ii) the demonstration of concepts in order to provide learners with mental models that allow the solvingof novel problems, (iii) application of newly developed knowledge or competencies, and finally (iv) the integration of new knowledge intoexisting knowledge and skills. The importance placed on problem solving can be justified not only empiricallydthat problem solving leadstomore learningdbut also theoretically by general theories of problem solving (Anderson,1982; Newell,1990; Newell & Simon,1972). Thesefoundational theories support the view that in order to become part of a person’s cognitive repertoire, new information needs to be used tosolve problems.

Given the importance placed on problem solving for education, Jonassen’s statement on assessment as the weakest link must besobering, particularly in light of the strong influence assessment has on the educational process: “Irrespective of stated goals, objectives,missions, curricula, or any other description of learning outcomes, what is ‘on the test’ is what is important to students” (Jonassen, 2011,p. 353). Why is it, then, that problem solving is often not appropriately assessed?

eimann).

ll rights reserved.

Delta:1_given name

Delta:1_given name

Delta:1_surname

Delta:1_given name

Delta:1_given name

Delta:1_surname

mailto:[email protected]

www.sciencedirect.com/science/journal/03601315

http://www.elsevier.com/locate/compedu

http://dx.doi.org/10.1016/j.compedu.2012.11.024

http://dx.doi.org/10.1016/j.compedu.2012.11.024

P. Reimann et al. / Computers & Education 64 (2013) 183–193184

Jonassen provides the answer on the same pages: the meaningful assessment of complex (problem solving and other) skills is hard workthat requires specialized knowledge and skills. Most educators do not have the requisite competencies, and even if they had, most of themwould not have the time to develop authentic, yet rigorous assessments. Jonassen argues further that the main reason why assessmentdevelopment is difficult and requires huge effort is that assessing meaningful learning, such as through problem solving, requires multipleforms of assessment. Using just one form of assessment inevitably flattens the multi-dimensional nature of problem solving, thus reducinglearners’ ways of knowing and understanding. This to the extent that their learning is oriented on assessment, as we sadly have to take asa given. Jonassen elaborates on four different forms of assessing problem solving competence: assessment of (1) knowledge about problemschemas, (2) problem-solving performance, (3) the component cognitive skills (e.g., problem representation, causal reasoning), and (4) theability to construct arguments in support of the solutions to problems (Jonassen, 2011, p. 354). In this paper we focus on assessment form (2)from Jonassen’s list. Our goal is to introduce a framework that can be used to describe Problem Solving Learning Environments (PSLEs) interms of assessment concepts, capitalizing on the fact that in computer-based PSLEs detailed information on learners’ problem solvingperformance is recorded. A second goal is to describe a new computer-based PSLEs in terms of this framework, in order to illustrate theconcepts and their scope.

With this paper, we attempt to make two contributions to the research on computers in education: a conceptual one, by clarifying towhat extent computer-based PSLEs solve the assessment challenge raised by Jonassen regarding how to authentically and validly assessproblem solving competence. The second contribution is introducing a new computer-based PSLE that employs a diagnostic algorithm thatis grounded in psychometric assessment theory. This PSLE can hence be seen as providing a direct connection between psychometricresearch on assessment on the one side, and PSLE design on the other.

2. The relation between PSLEs and assessment concepts

Competencies can be assessed in multiple forms, and for multiple purposes. When the purpose is classification or selection, thencompetency is usually conceptualized as a uni-dimensional construct: as a single variable, or a small set of variables, to characterize overallproficiency. For instance, the computer-based version of the Graduate Record Exam (GRE, www.ets.org/gre/) test, which is used in the US toselect for graduate and business school, has a student model consisting of just three variables (Verbal skills, Quantitative skills, and AnalyticWriting skills), each with a probability distribution that is updated whenever the student finishes a test item.

When the purpose is to monitor and help with learning, however, single variable competency models are less appropriate. Jonassen(2011) for instance speaks of ‘assessing problem solving’ in the sense of assessing the knowledge gained about problems and how tosolve them in a specific domain, such as physics, history, or business. For this purpose, a general problem solving proficiency score cannotexpress what learners can do and know, and what they cannot do and do not know. Characterizing achievement or proficiency by a singlevariable might suffice only for basic fail/pass decisions, as argued byMislevy, Steinberg, and Almond (2003) andMislevy, Steinberg, Almond,Haertel, and Penuel (2003). Another example for the limitationsdfor pedagogical purposesdof this psychometric approach is the I.Q.,which attempts to characterize all the various abilities, strength and weaknesses of a person in many categories and disciplines (math,language, cognition, memory, etc.) with a single numerical value. While perhaps useful for selection purposes, such a value does not providea teacher with any information as to how help students with their learning, that is, with formative assessment.

Problem-solving environments, on the other hand, naturally lend themselves to being used formatively (to foster learning by providingfeedback and guidance). While to some extent the distinction between formative and summative (Black & Willam, 1998) regards howthe information gained from students is used, rather than demarcating a strong difference in assessment design (Wiliam, 2010), it cannotbe ignored that some forms of assessment yield more information than others. While summative assessment has received extensivemethodological attention because of the high-stakes decisions that are often coupled with it (e.g., Downing & Haladyna, 2006), we wouldargue that any assessment that is used for decision making should adhere to quality standards such as validity, reliability, and fairness,independent of the question if the assessment is used for formative or summative purposes (Brookhart, 2003).

Computer-based PSLEs contain many of the elements needed for the formative assessment of problem solving. In cases where a PSLEmaintains an explicit model of students’ knowledge, the relation to assessment is almost one to one: the diagnosis the PSLE performs tomaintain the student model is a kind of (formative) assessment. An example are the Cognitive Tutors developed at Carnegie MellonUniversity (Koedinger & Corbett, 2006) that monitor and guide a student through (e.g., algebra) problems step by step, and use students’performance in solving these problems (number and kind ofmistakes, level of guidance needed, etc.) tomaintain an overall model in form ofnumeric values on a number of higher-order skills. If a PSLE supports students in solving individual problems (or cases), but does not relatethis local problem solving performance to a more general characterization of mastery (proficiency, competency), “all” that is needed toachieve that second function is to aggregate across the set of problems tackled by the student. Of course, a simple ‘score’de.g., by countingnumber of correct solutions, taking into account perhaps number of attempts, need for help, or time parametersdwill not suffice asameasure of students’ knowledge or competence; one cannot equate performancewith competence. However, as wewill show, chances arethat in most PSLEs the information for rigorous assessment is available, and it needs only the addition of a measurement method to achievesound assessment.

2.1. Evidence centered assessment design

One of the challenges with test-based educational assessment is that it tends to get over-rated and trivialized at the same time. Bothreactions have to do with an overly restricted view of assessment as psychometric testing. Psychometrically constructed tests can becomeover-rated when the information they provide is taken as “all there is to know” about students’ learning; they get trivialized when thebusiness of educational assessment is seen as “just doing the number work”. A test or other forms of assessment can produce a (seeminglyprecise) measurement value for a competency, but it does not mean that this value is valid, or useful, even if the measurement is highlyreliable. And developing an assessment cannot be reduced to determining a scoring procedure plus a measurement model to linkperformance to conclusions about student variables. In order to clarify the relation between PSLEs and assessment, we need a compre-hensive framework for assessment design.

http://www.ets.org/gre/

P. Reimann et al. / Computers & Education 64 (2013) 183–193 185

Evidence-centered assessment design (ECD, Mislevy & Riscontente, 2006) will be used as the framework for assessment here because itis well grounded in assessment methodology, yet broad enough to encompass computer-based learning and problem solving environ-ments. ECD sees assessment as an evidentiary argument that connects observations of students (evidence) to hypotheses about theirknowledge, skills, and aptitudes (KSAs). In order to validly assess KSAs, the assessment developer needs to have a clear understanding ofthe nature of knowledge to be assessed, how students learn it, and how they use their knowledge (Mislevy, Steinberg, Almond, Haertel,et al., 2003, p. 2). ECD specifies a number of steps assessment design goes through, and associated design products, depicted in the leftcolumn of Table 1 (see Mislevy, 2011; for a recent description of these steps). The right column of this table shows the correspondingelements of the PSLE design process. What is worth taking note of is the high degree of correspondence between the two designprocesses.

We want to make these assessment concepts more concrete in the context of a real PSLE called ProNIFA. Before we introduce ProNIFAitself, however, we need to sketch the theory behind the measurement model it employs.

2.2. Competency-based Knowledge Space Theory (CbKST)

We introduce in this section a formal, set-theoretic approach based on Knowledge Space Theory (KST), founded by Doignon and Falmagne(1985, 1999), and extensions such as Competence-based Knowledge Space Theory (CbKST) that provides a link to adaptive tutoring andproblem solving systems. While KST was originally developed for test-based assessment, and hence took the test item as the basic unit,extensions such as CbKST allow for applications to situations where the student response is not (only) an answer to an item, but can also bea step in problem solving process, or a “move” in an educational game, etc. Hence, the approach offers not only a theoretical underpinningfor assessment but also well-elaborated technical solutions for integrationwith computer-based learning environments, in particular PSLEs.As such, it is an alternative to the frequently used Bayesian Belief Networks (Mill�an, Loboda, & Perez-de-la-Cruz, 2010).

Competency-based Knowledge Space Theory (CbKST) has its origins in psychometric test development, namely the (performance-based)Knowledge Space Theory (KST) developed by Doignon and Falmagne (1985, 1999). The idea behind KST was to broaden the ideas of linearItem Response Theory scaling, where a number of items are arranged on a single, linear dimension of “difficulty” (Van der Linden &Hambleton, 1997). In essence, KST provided a basis for structuring a domain of knowledge and for representing the knowledge based onprerequisite relations (see Fig. 1 for an example). This means that the focus of scaling is not on the estimated difficulty, but on the rela-tionships among the test items. More concretely, KST makes use a more or less natural prerequisite structure: frommastering one test item,one can surmise that also another, may be simpler, test item can be solved correctly. This establishes a surmise relation or, equally,a prerequisite relation, which states that being able to master one test item is the prerequisite of mastering another. The early theory focusedon performance (for example, solving a test item) in a deterministic way. The advancements of the theory accounted for a probabilistic viewof test performance and introduced a separation of observable performance and the underlying abilities and knowledge of a person. Suchdevelopments lead to a variety of theoretical, competence-based approaches (e.g., Albert & Lukas, 1999; Doignon, 1994; Düntsch & Gediga,1995, 1998). An empirically well-validated approach to CbKSTwas introduced by Korossy (1997,1999). Basically, the idea of the Competence-Performance Approach is to assume a finite set of more or less atomic competencies (in the sense of some well-defined, small scaledescriptions of some sort of aptitude, ability, knowledge, or skill) and a prerequisite relation between those competences.

In a first step, CbKST requires the assessment designer to develop a model of the learning domain, e.g., algebra. This model essentiallyconsists of a set of competencies C {a, b, c, d}. Examples for such competencies might be the knowledge what an integer is or the ability toadd two positive integers and so on. The level of granularity to which a domain is broken down depends on the envisaged application andmight range from a very course-grained level on the basis of lessons (for example to plan a school term) to a very fine-grained level of atomicentities of knowledge/ability (for example as the basis of an intelligent problem solving support application).

In a second step, the designer attempts to identify a natural course of learning and development and logical dependencies betweencompetencies. Usually, learning and the development of new abilities as well as the stabilization of skills occur along developmentaltrajectories (learning progressions). On such basis, a prerequisite relation a � b claims that a competency a (e.g., to multiply two positiveintegers) is a prerequisite to acquire another competency b (e.g., to divide two positive integers). Vice versa, if a learner has competency b,one can assume (surmise) that this person also has competence a. To account for the fact that more than one set of competencies can be

Table 1Correspondences between assessment design steps and PSLE design steps.

Assessment design layers Corresponding elements in PSLE design

Domain analysis: identification of central concepts and skills. Largely identical.Domain model: representations of key aspects of the domain for making claims

about students’ competencies.Representations of key aspects of the domain for the purpose of learning.

Student model: representation of claims (and of the strength of belief inthese claims) about a student’s knowledge and skills.

Not included if the purpose of the PSLE is only to help with individual problemsolving steps. Realized in other cases in a variety of formats, from simpleincremental models to sophisticated quantitative (e.g., Bayesian networks) orsymbolic models.

Task model: description of the environment in which students say, do, or makesomething to produced evidence; determines how students’ performanceswill be captured

Corresponds to a model of the problem solving environment.

Evidence model: evaluation component (“scoring method”) – description of howto identify and evaluate assessment-relevant aspects of the work products

Corresponds to deciding on the correctness of a solution (step).

Evidence model: measurement model component – method used to relate workproduct evaluations to values in the student model, such as classical testtheory, item response theory, Bayesian updating.

Either not included, or realized as described under “Student Model” above.

Assessment implementation Has partial corresponds in the problem authoring step.Assessment delivery Corresponds to the PSLE software.

Fig. 1. (From left to right) a set of competencies of a domain, a Hasse diagram showing the prerequisite relation between those competencies and a Hasse diagram illustrating theresulting competence structure. See text for more explanations.


a prerequisite for another competency (e.g., competency a or competence b are a prerequisites for acquiring competence c), prerequisitefunctions have been introduced, relying on and/or-type relations.

On the basis of a set of competencies and a set of prerequisite relationships between them, we can formally derive a collection ofso-called competence states. Fig. 1 gives an example. In the left panel, the competencies of the domain are shown, and in the middle panela so-calledHasse diagram visualizes a prerequisite relation among those competencies. A Hasse diagram reads from bottom to topwhere thenodes depict conceptual entities and the edges indicate relationships (in our case prerequisite relations). In the example, competencya (knowing integers) is the direct prerequisite for competency b (adding) and c (subtracting). A competence state is a meaningful combi-nation of single competencies. It is, for example, the state {a, b, c}whichmeans a person in this competence state would have the knowledgeof what an integer is, would have the ability to add numbers and to subtract numbers. In turn, a competence state {a, c,} would not bepossible since it would lack competency b as a prerequisite. Finally, deriving all the admissible competence states results in a so-calledcompetence structure (Fig. 1, right panel).

Due to the prerequisite relations between the competencies, not all subsets of competencies (the power set) are possible competencestates, which is a significant advantage. Considering the power set of five competencies 25, one would end up with 32 competence states;due to the logical structure, however, we have only ten. A competence structure also singles out different learning paths for moving from thenaïve state {} (having no competencies of a domain) to the state of having all of a domain’s competencies C. In accordance, a person’s level ofknowledge/ability/proficiency is described by exactly one competence state (at least theoretically).

So far, the structural model focuses on latent, unobservable competencies. By utilizing interpretation and representation functions thelatent competencies are mapped to some sort of evidence or indicators Q ¼ {p, q, r, s,.} relevant for a given domain. Such indicators mightbe test items but can refer to all sorts of performance or behavior (e.g., the concrete steps whenworkingwith a spreadsheet application). Theinterpretation function assigns a set of competencies required to solve a task to each of the indicators in Q. Vice versa, by utilizinga representation function, a set of indicators is assigned to each competence state. This assignment induces a performance structure, which isthe collection of all possible performance states (equal to the competence structure). Due to these functions, latent competencies andobservable performance can be linked in a broad formwhere no one-to-one correspondence is required. This means that an entire series ofindicators can be linked to underlying competencies/competence states.

CbKST accounts for the fact that indictors such as problem solving steps cannot be perfect evidence for the latent knowledge or ability.There is always the possibility that a student makes a lucky guess or exhibits a correct behaviour/activity just by chance. In turn, a personmight fail to solve a problem although the necessary knowledge/ability is actually available due to being inattentive or careless, for example.As a consequence, CbKST considers indicators being related to the underlying competency/competencies with a certain probability.

A further significant advantage of such an approach is that learning is not only considered as a one-dimensional course on a lineartrajectory, equal for all learners. Learning and development rather occur along one of an entire range of possible learning paths. Fig. 2 showsthe possible learning paths (edges in the Hasse diagram); in our example there are five admissible paths.

Recent advancements of CbKST primarily concern the integration of theories of human problem solving (given that most indictors can beinterpreted as solving some sort of problem). This work is driven by the design of adaptive computer games for learning.

Fig. 2. Admissible learning paths derived from a competence structure. See text for more explanations.


3. Using psychometric assessment methods in a computer-based PSLE

In the context of the NEXT-TELL project (www.next-tell.eu), we are working on ways to provide teachers and students with infor-mation about progress in learning and problem solving in order to inform their (pedagogical) decision making (Reimann, Hesse,Avramides, Cierniak, & Vatrapu, 2012; Reimann, Kickmeier-Rust, Meissl-Egghart, Moe, & Utz, 2011). We employ ECD as the methodo-logical framework for developing computer-based formative assessment methods that teachers can use to provide feedback andguidance (feed-forward) to their students. One tool we have developed, called ProNIFA (for PRObabilistic Non-Invasive FormativeAssessment), performs on-line diagnosis of students’ knowledge for solving problems in the area of uni- and multivariate data analysisand data visualization. A typical problem in this domain would be to create a table for measurement data (from a chemistry experiment,say), to answer certain questions about the data (involving calculating mean and variance, say) and to visualize these data intwo-dimensional graphs appropriately.

3.1. The task model

One of the goals in NEXT-TELL is to provide assessment functionality as a service to users, rather than as a software program. This is inreaction to the fact that students nowadays have access to many software tools and environments, and that these are increasingly providedon the Internet as a web application (to be used through a browser) and/or as a service (i.e., they can integrated easily into web and evendesktop applications). The days when the only option to use software was to install it on one’s computer are definitely over. Therefore, wehave developed an implementation of CbKST-based assessment where the problem solving environment that students use is “in the Cloud”(we employ the spreadsheet component of Google Docs, see Fig. 3) while the diagnostic/assessment software is run on a server differentfrom Google’s. The information from the spreadsheets students work with is read off from log files using Google Doc’s ApplicationProgramming Interface (API), and all information needed for the CbKST assessment is stored on a local server (Reimann, Bull, Halb, &Johnson, 2011).

De-coupling the problem-solving environment in this manner from the assessment engine has a number of advantages, amongst themthat teachers and students can use familiar tools and interfaces. Also, Cloud applications such as Google’s (although perpetually in “beta”)are provided very reliably and are highly scalable. Teachers hence need to worry less that an activity cannot be conducted because thetechnology might break down.

Since ProNIFA is not a tutorial system, but confined to supporting formative assessment, the granularity of tracing students’ actions is notas fine-grained as in adaptive tutorial systems, such as Andes (VanLehn et al. 2005). Essentially, for updating the student model, for eachproblem the student attempts to solve, the software needs to know only few aspects of the solution that can largely be read of the logfile thatGoogle Spreadsheets supply for document versioning purposes.

Fig. 3. ProNIFA’s task environment is provided by the Google Docs Spreadsheet web application. The task for the students (here in German) is to visualize a table with results(medals) from the Olympic Games 2010. The students have to calculate the rightmost row of the table by entering the appropriate spreadsheet formula, and to visualize the numberof medals by category and nation in an appropriate format. A student’s solution to this is shown on the right side of the figure.

http://www.next-tell.eu


Feedback is currently not provided directly to students through a computer interface; instead, it is the teacher who has access (througha web interface) to the ProNIFA student model. Various visualizations are possible, such as in form of Hasse diagrams, on various levels ofaggregation (individual students, pairs, groups, class level), as described next.

3.2. Student and evidence model

To create a formative assessment environment with ProNIFA, a CbKST-based authoring process needs to be conducted. As a first step, theassessment designer must specify the competencies of a domain and the prerequisite relations between the competencies. This can beperformed with an authoring tool integrated in ProNIFA, or be done off-line, and imported in a specific file format. In any case, the analysisyields a data structure that can be represented as a pre-requisite graph. An initial task analysis of the tabular data analysis and visualizationdomain yielded 101 sub-skills, which can be combined to 154 knowledge states.

The most crucial aspect of ProNIFA, however, concerns the aggregation and visualization of data. On the basis of the specified set of rulesand heuristics, ProNIFA computes and analyses the sources of evidence, in the current case log files. On this basis the probabilities ofcompetencies are updated and the probability distribution of the competence structure is altered. A highly informative way to represent thestudent model is the Hasse diagram, as introduced before.

Very briefly, seen as a student model, a Hasse diagram (such as displayed in Fig. 4) shows all possible (admissible) competencies orknowledge states. By the logic of CbKST, each learner is, with a certain likelihood, in one of these competence states. This allows coding thestate likelihoods for example by colors and thereby visualizing areas and sets of states with high (or low) probabilities. The simplestapproach would be highlighting the competence state for a specific learner with the highest probability. The same coding principle can beused for multiple learners. This allows for identifying various sub-groups in a class, outliers, the best learners, and so on.

The edges of the graph can also be interpreted from a pedagogic perspective. Since the diagram reads from bottom to top, the edgesindicate the “learning path” of a learner. Depending on the domain, we can monitor and represent each learning step from a first initialcompetence state to the current state.

Finally, a Hasse diagram offers the visualization of two very distinct concepts, the inner and out fringes. The inner fringe indicates whata learner can do/knows at the moment. This is a clear hypothesis of which test/assessment items this learner can master with a certainprobability. Such informationmay be used to generate effective and individualized items or problems. The concept of the out fringe indicateswhat competency should or can be reasonably taught to a specific learner as a next step. This provides a teacher with recommendationsabout future teaching on an individualized basis.

Hasse diagrams are not the only way to depict a student model created by CbKST algorithms. Pixel clouds are a similar concept ofrepresenting ability on an individual or group level. In principle, the pixel clouds depict each competence state (or single competency) asa single pixel. Each of the competence states is assigned a color-coded probability value. The brighter a pixel is the higher the correspondingprobability, and vice versa, the darker a pixel is the lower the corresponding probability. The difficultly (or in other terms, the structurallocation) of a competency or competence state is given by the position in the Euclidean space, ordered from left to right. One major

Fig. 4. Hasse diagram visualization of knowledge and learning. The left panel illustrates the learning progress of an individual learner; the lower box indicates the competence stateat the beginning of a learning episode and the upper the current state. The orange line illustrates the learning path. The right panel is a zoom into the entire structure. The yellowarea indicates those competence states, 75% of the students in a class may have reached. The Hasse diagram allows an intuitive comparison of students – the green circle, as anexample, indicates a high performing student who has already reached a superior competence state. (For interpretation of the references to color in this figure legend, the reader isreferred to the web version of this article.)


advantage of this type of visualization is that huge competency spaces can be grasped with a single sight. Furthermore, even for hugecompetency spaces, important information for teachers can be displayed on a single screenwithout the need for zooming. As show in Fig. 5,temporal information can also be illustrated easily.

3.3. How teachers can use ProNIFA

A teacher’s or instructor’s role comprises setting up a subsumption hierarchy, connecting it to the evidence layer, and analyzingperformance data with the CbKST algorithms. Both tasks can be performed currently from a desktop application, shown in Fig. 6. Authoringa subsumption structure in the interface is rather straightforward, and performed with a text file (not shown). For any “node”, one specifiesthe link to predecessor node, and optionally the initial probability that the competence represented by the node is mastered. While thetechnical side of authoring this aspect is straightforward, the real work will need to be accomplished before: identifying relevantcompetences/skills and their relations. Sometimes teachers may be in the position to perform such a domain analysis, at least for closelycircumscribed domains and a high level of granularity, but for more complex competences and/or a fine-grained level of analysis, it is a taskbetter performed by somebody with experience in instructional design.

In a second step, the nodes in the subsumption structure need to be related to aspects of students’ performance. In the case of ProNIFA,performancemeans changes to cells in a Google Spreadsheet. Creating such connections is usually a highly technical task, one that wewouldnot expect teachers or in general end users to perform. Teachers or instructors will normally only be required to specify what an event in thelearning environment means in terms of updates of one or more probabilities. One approach to move the end user closer to authoring themapping from performance to competence structure is to abstract away from the technical details. The mappings can usually be expressedin a rule format: IF this or that happens in the learning environment THEN increase/decrease the probability of nodes {a, b, .}. Such ruleformulations are easy to grasp for the end user, and require only to express changes to individual nodes. Whether such a rule language ismade available to the end user or not (in a text file) depends on how important it is that the end users can perform themapping themselves.

Once the subsumption structure is in place and the mappings to the performance level have been created, updates to the probabilitiescan be computed within a cycle ranging from real-time to batch processing. This depends entirely on the nature of the interface between theCbKST software (which usually runs on a server) and the learning environment. In the case of ProNIFA, the frequency of updates isdetermined by the teacher: Whenever the teacher wants to have the diagnosis computed for a student, data with the history of changes tothe student’s spreadsheet are fetched via the API from the Google server and made available in the analysis tool (Fig. 6). From the analysistool, data for single students or groups of students are sent (transparently for the user) to the analysis server, and results of the diagnosticcomputations are rendered in the analysis tool interface.

The analysis tool is currently a desktop application that runs on Windows PCs. Users can request analysis of single students or groups ofstudents (Fig. 6, upper left), inspect data in tabular (upper middle) and graphical form (lower left, probabilities for competences plotted formultiple students in a bar chart), and see the results in terms of Hasse diagrams for individual students (right side of Fig. 6).

4. Discussion

Let us look at ProNIFA again in terms of Evidence-Centered Assessment Design, and also compare the CbKST method to currently themost frequently used student modeling method, Bayesian Belief Networks.

4.1. Student model

In the case of dedicated assessment development, the process starts usually from the student modeldfrom the question of what needsassessing and the nature of the constructs to be assessed. Like learning environments, assessment environments should be informed byknowledge about the nature of the knowledge to be acquired, and the nature of learning.

Using ProNIFA, and CbKST in general, learning environment designers will devote a lot of attention to developing the student model, byperforming a domain analysis and setting up a subsumption structure (as in Fig. 4). As is the case with Bayesian modeling (Conati, Gertner,& vanLehn, 2002), a student model in ProNIFA takes the form of a graph rather than of a single number (“analytical reasoning”) or an array.(Array representations are appropriate for multi-dimensional student models, for instance for representing “multiple intelligences”.)

Fig. 5. Pixel cloud visualization of a sequence of a probability-based evidence-centered assessment cycle: in the left panel there is broad basis of competencies with an almostuniform distribution, the red area shows a range where the probability is slightly higher. When more evidence is added to the model, it shows that some areas become more andmore unlikely (the black areas other areas, however, become more likely. Finally, with further evidence added, some competencies reveal a very high probability. (For interpretationof the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 6. The teacher’s graphical interface for conducting analysis tasks in ProNIFA. See text for explanations.


The level of detail that needs to be realized for a student model of the graph kind is dependent on the scope and depth of the learningdomain, of course, but also on the purpose of the educational software. For purely diagnostic purposes, as in ProNIFA, the student model canbe less detailed than for systems that are designed to also provide tutorial explanations and/or advise.

For instance, the Andes tutor, developed by Kurt Vanlehn and co-workers (Conati et al., 2002; VanLehn et al., 2005) has an extremelydetailed student model. It contains representations for all the knowledge elements that are required to solve each problem that Andespresents to students, in the same manner as a physics teacher would like to see such problems solved by a good student. (This is of courserelative to the instructional context, which in this case is college level physics teaching in a military academy.) The knowledge available toAndes is all represented in rule form and comprises knowledge of physics andmathematical principles, andmethods for applying them. Thestudent model that Andes maintains is the subset of these hundreds of rules that it believes the student has mastered at any point in time.This level of detail is needed because Andes gives feedback to students on the level of each problem solving step, and can make suggestionswhich problem solving step to take next, not only which problem to tackle next. Student models that are used only for summativeassessment purposes can be much leaner. Most student models would lie somewhere between these extremes.

4.2. Evidence model

An evidence model describes which behaviors provide diagnostic information for the targeted student attributes, and how to transformthe information on behavior to ‘values’ in the student model. Creating an evidence models involves answering three questions: (i) Whatfrom the observations we have on students (see Task Model below) counts as evidence? (ii) How to evaluate students’ work products?(iii) How to update the student model based on (usually multiple) evaluations?

For problem-solving diagnosers such as ProNIFA, every action taken by the student in the interface is usually diagnostically relevant, andwill lead to an update of the student model. This because we treat as “diagnostic” not only what is done “right” or “wrong”, but alsoredundant steps during problem solving. In other words, in ProNIFA the decision what counts as evidence for the student model has beenanswered when the granularity for skill representation in the student model was decided. The problem solving interface the studentsinteract with was accordingly designed to capture each problem solving step that can be interpreted in terms of the student model.

However, in general there will be differences between student and evidence model decisions. Indeed, the effort that goes into formu-lating evidence models for psychometric tests like the GRE is very extensive, since it is not ‘evident’ howa high level latent construct such as


“Analytic Writing” can be identified by looking at students’ behavior. Hence, the extensive methodological concern for test construction inthe psychometric approach almost exclusively focuses on item construction, i.e., the evidence model (Downing & Haladyna, 2006). For thecase of assessing problem solving skills, the item identification aspect will usually not be so hard, as problem solving is related to observablebehavior rather directly. However, as Jonassen reminds us, problem solving competencies, too, should be assessed withmultiple assessmentforms, and comprise aspectsdsuch as problem perceptiondthat are not directly observable (Jonassen, 2011, Chap. 22).

In general, the usually extensive task analysis that designers of PSLEs have to perform for building a software that can effectively helpwith learning about a domain by solving problems from that domain is an excellent basis for addressing the substantive part of both studentand evidence model: What constitutes knowledge in a domain, how it is learned, and how it can be elicited. In the course of the devel-opment process, designers of PSLEs will also address the evaluation component (question (ii) from above) because the PSLE will minimallyneed to know when a problem was solved correctly, and ideally be able to perform evaluations on the level of individual problem solvingsteps, such as in the case of Andes.

However, there is another aspect of the evidence model that needs to be addressed, the “evidentiary-reasoning” aspect, as Mislevy,Steinberg, and Almond (2003) and Mislevy, Steinberg, Almond, Haertel, et al. (2003) call it, corresponding to question (iii), the measure-ment model specification. The main technical challenge in this respect is how to deal with guessing and with “noise” in the problem solvingbehavior. The “noise” is due to human behavior not being deterministic: even when a student has complete knowledge of how to performa certain problem solving step, or how to solve a certain kind of problem, ever so often a mistake will be made, for instance due to a lack ofattention. Hence, a specific behavior cannot conclusively be linked to a construct: The observation that after a number of successful solutionsa student gets it wrong in the nth instance needs to be interpreted conservatively, as more likely indicating a slip in attention rather than thestudent having lost the capacity to solve the problem.

While for psychometric tests this problem is addressed with concepts from analysis of variance (“error variance”), for non-numericstudent models stochastic methods can be utilized. In ProNIFA, this is CbKST. In Andes, as now in many other educational software,Bayesian networks are employed to address guessing and noise (see Mill�an et al., 2010 for a readable introduction). The principle is thatobservations (evidence) is linked to latent student constructs (causes) by a probabilistic relation, and that these probabilities get updated inlight of new evidence, using Bayes Theorem. This approach also offers a way to handle the attribution problem: How should evidence beinterpreted when multiple constructs can account for the same evidence?

4.3. Task model

The task model describes the concrete student behaviors to record and the context in which these are elicited. For psychometric tests,this is usually a more or less simple rendering of an item, either on paperdor, increasingly, on computer screensdwith very limited (re-)action options for the student:Mark/do notmark an option in amultiple choice item, for instance. In PSLEs, the task environment tends to beconsiderably more complex, and extensive. In ProNIFA, the task environment is formed by the Google Spreadsheet interface. In the Andessystem, students are provided with an interface that allows them to draw vector diagrams, enter text, and enter equations, and are givenadditional point and click options.

A very important aspect of the taskmodel is the granularity level, or the resolution, onwhich student actions get recorded. This will affecthow fine-grained the student model can be. For assessment-oriented systems, the design direction is usually from the student model to thetask model, so that the task model is ‘just right’ for the assessment goal. PSLEs will in general elicit rather too much than too little infor-mation from the learner than is needed for assessment purposes, and hence will need to specify in the evidence model how the informationrelevant for assessment gets selected. But this is clearly advantageous over having too little information provided through the user interface.

5. Conclusions

In conclusion, ProNIFA as seen through the lens of ECD, provides us with a good example for assessment that does not rely on variables asthe object of measurement. Instead, is uses the formalism of (mathematical) graphs of knowledge elements as the language for describingassessment results. While these are mathematical structures, they are not numeric in the same sense as ‘variables’ are. As the ECDframework explicates, assessment must be based on two fundamental bodies of knowledge: substantive features, which concern thecharacteristics of the learning domain and the learning process, and the evidentiary-reasoning aspect, which concerns the informationwe can draw from the learners’ behaviors (Mislevy, Steinberg, & Almond, 2003). Formal frameworks that can link both aspects soundly areoften better based on stochastic methods, such as Item Response Theory (Van der Linden & Hambleton, 1997), Latent Class Models (Collins& Lanza, 2010), or Bayesian inference networks (Jensen, 1996) rather than variable-based “classical” test theory.

We have argued that computer-based PSLEs by design containmost of the elements required for an assessment system, namely a StudentModel, an EvidenceModel, and a TaskModel. What is almost always provided with a PSLE is the substantive information that an assessmentsystem needs to contain, in form of answers to these questions: What constitutes problem-solving skill/knowledge in a domain, how can itbe elicited, and how can learning be supported? Not necessarily part of a PSLE is the second main component of an assessment system, theevidentiary-reasoning component: how can hypotheses about a learner’s knowledge state be conclusively grounded in observations onproblem solving performance? To address this second component, systematic methods for evidentiary reasoning need to be employed. Wehave described one of thesedCompetency-based Knowledge Space Theory (CbKST)dbyway of ProNIFA and referred also the Andes tutor toillustrate the use of Bayesian Belief Networks (BBN).

In this paper, we have talked about assessment mainly in the sense of assessment of learning (and for learning, when combined witha tutorial component, such as Andes). For the assessment of learning, it is a huge advantage when it can be embedded in the activity ofproblem solving, and can hence be un-obtrusive. The still more common practice that teachers engage students with IT to practice a skill orpractice the application of knowledge and afterward let students do a test or quiz to gauge what is learned is problematic from a numberof perspectives, amongst them the fact that such forms of multiple-choice assessment takes away time from learning (Feng, Heffernan,& Koedinger, 2009) and that they lack face-validity (Williamson, Mislevy, & Bejar, 2006).


Applying the framework of evidence-centered assessment design (Mislevy, Steinberg, & Almond, 2003) to both assessment systemsand PSLEs helped us to identify the many similarities regarding design steps and the design artifacts (see Table 1 above). This encouragesus to suggest to the educational technology community to more frequently integrate assessment components into problem solving andsimulation environments (Mislevy, 2011), and to the educational assessment community to more frequently use problem solving andsimulation environments as the assessment environment (Mislevy, Steinberg, Almond, Haertel, et al., 2003). Such a strategy wouldcontribute considerably to moving 21st Century Learning forward, which is arguably largely held back by assessment concerns (Wilsonet al., 2010). The strategy is also realistic since measurement methods, such as the ones introduced here with BBN and CbKST, haveshareable algorithmic solutions (e.g., Mill�an et al., 2010). Furthermore, they are applicable for a wide range of learning and assessmentdomains, beyond well-structured domains such as mathematics and science learning. For instance, BBN has been applied to collaborativediagnostic medical reasoning (Suebnukarn & Haddawy, 2006), and CbKST to game-based learning (Kickmeier-Rust, Mattheiss, Steiner,& Albert, 2011).

The fact that the forms of diagnosis/assessment described here, as afforded by PLSEs, have the important advantage that they can beembedded in the PLSE (be conducted un-obtrusively) and that they yield very specific information on students’ learning does not imply thatsuch assessments will be widely used by educators. For instance, teachers may find non-numeric student models and student models thatare multi-faceted instead of providing a single ‘score’ hard to interpret, and difficult to relate to the single, numeric scores of studentproficiency that come from high-stakes testing (McMillan, 2003; Parr & Timperley, 2008). More research on how teachers and othereducational decision makers interpret and utilize information on students’ learning that is multi-faceted and combines various notationalsystems in addition to numbers is urgently needed (Reimann & Bull, 2011). This is just onedbut a quite importantdaspect of the generaldiscourse on integrated assessment systems, on consequential assessment, and notions of test validity tied to quality of decisions based onthe test (Griffin, McGaw, & Esther, 2012).

In conclusion, while Jonassen’s verdict of “assessment as the weakest link in learning to solve problems” is regrettably still valid, it is alsoa link where we can make rapid progress, by embedding computer-based diagnosis/assessment methods into computer-based problemsolving learning environments. We, hence, feel encouraged to end with an optimistic note, also because the twomethodological approachesintroduced (BBN and CbKST) can help us to tackle the larger challenge Jonassen put forward: the need to assess problem solving in multipleformats, and on multiple dimensions.

Acknowledgements

The research reported here has in parts been funded by the European Commission, Framework 7 Program.

References

Albanese, M. A. M.,S. (1993). Problem-based learning: a review of literature on its outcomes and implementation issues. Academic Medicine, 68(1), 52–81.Albert, D., & Lukas, J. (Eds.), (1999). Knowledge spaces: Theories, empirical research and applications. Mahwah, NJ: Lawrence Erlbaum Associates.Anderson, J. R. (1982). Acquisiton of cognitive skill. Psychological Review, 89, 369–406.Black, P., & Willam, D. (1998). Assessment and classroom learning. Assessment in Education: Principles, Policy & Practice, 5(1), 7–73.Brookhart, S. M. (2003). Developing measurement theory for classroom assessment purposes and uses. Educational Measurement: Issues and Practice, 22(4), 5–12.Collins, A., Brown, J. S., & Newman, S. E. (1989). Cognitive apprenticeship: teaching the craft of reading, writing and mathematics. In L. B. Resnick (Ed.), Knowing, learning and

instruction: Essays in honor of Robert Glaser (pp. 453–494). Hillsdale, NJ: Erlbaum.Collins, L. M., & Lanza, S. T. (2010). Latent class and latent transition analysis for the social, behavioral and health sciences. New York: Wiley.Conati, C., Gertner, A., & vanLehn, K. (2002). Using Bayesian networks to manage uncertainty in student modelling. User Modelling and User-Adapted Interaction, 12(4),

1573-1391.Doignon, J. P. (1994). Probabilistic assessment of knowledge. In D. Albert (Ed.), Knowledge structures (pp. 1–56). Berlin: Springer.Doignon, J. P., & Falmagne, J. C. (1985). Spaces for the assessment of knowledge. International Journal of Man-Machine Studies, 23, 175–196.Doignon, J. P., & Falmagne, J. C. (1999). Knowledge spaces. Berlin: Springer.Downing, S. M., & Haladyna, T. M. (Eds.), (2006). Handbook of test design. Mahwah, NJ: Lawrence Erlbaum.Düntsch, I., & Gediga, G. (1995). Skills and knowledge spaces. British Journal of Mathematical and Statistical Psychology, 48, 9–27.Düntsch, I., & Gediga, G. (1998). Knowledge spaces and their applications in CALL. In S. Jager, J. Nerbonne, & A. van Essen (Eds.), Language teaching and language technology (pp.

177–186). Lisse: Swets and Zeitlinger.Feng, M., Heffernan, N., & Koedinger, K. R. (2009). Addressing the assessment challenge with an online system that tutors as it assesses. User Modeling and User-Adapted

Interaction, 19(3), 243–266.Griffin, P., McGaw, B., & Esther, C. (Eds.), (2012). Assessment and teaching of 21st century skills. Heidelberg: Springer.Jensen, F. V. (1996). An introduction to Bayesian networks. Berlin: Springer.Jonassen, D. H. (1999). Designing constructivist learning environments. In C. M. Reigeluth (Ed.), Instructional design theories and models: A new paradigm of instructional theory,

Vol. II (pp. 215–239). Mahwah, NJ: Lawrence Erlbaum Associates.Jonassen, D. H. (2011). Learning to solve problems. New York: Routledge.Kickmeier-Rust, M. D., Mattheiss, E., Steiner, C. M., & Albert, D. (2011). A psycho-pedagogical framework for multi-adaptive educational games. International Journal of Game-

Based Learning, 1(1), 45–58.Koedinger, K. R., & Corbett, A. (2006). Cognitive tutors. In R. K. Sawyer (Ed.), The Cambridge handbook of the learning sciences (pp. 61–77). New York: Cambride University Press.Korossy, K. (1997). Extending the theory of knowledge spaces: a competence-performance approach. Zeitschrift für Psychologie, 205, 53–82.Korossy, K. (1999). Modelling knowledge as competence and performance. In D. Albert, & J. Lukas (Eds.), Knowledge spaces: Theories, empirical research and applications

(pp. 103–132). Mahwah, NJ: Lawrence Erlbaum Associates.McMillan, J. H. (2003). Understanding and improving teachers’ classroom assessment decision making: implications for theory and practice. Educational Measurement: Issues

and Practice, 34–43.Merrill, M. D. (2002). First principles of instruction. Educational Technology Research and Development, 50(3), 43–59.Mill�an, E., Loboda, T., & Perez-de-la-Cruz, J. L. (2010). Bayesian networks for student model engineering. Computers & Education, 55, 1663–1683.Mislevy, R. J. (2011). Evidence-centered design for simulation-based assessment (CRESST report 800). Los Angeles: The National Center for Research on Evaluation, Standards, and

Student Testing. University of California.Mislevy, R. J., & Riscontente, M. M. (2006). Evidence-centered assessment design. In S. M. Downing, & T. M. Haladyna (Eds.), Handbook of test development (pp. 61–90).

Mahwah, NJ: Lawrence Erlbaum.Mislevy, R. J., Steinberg, L., & Almond, R. G. (2003). On the structure of educational assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3–67.Mislevy, R. J., Steinberg, L., Almond, R. G., Haertel, G. D., & Penuel, W. R. (2003). Leverage points for improving educational assessment (PADI technical report 2). Stanford, CA: SRI.Newell, A. (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice Hall.Parr, J. M., & Timperley, H. S. (2008). Teachers, schools and using evidence: considerations of preparedness. Assessment in Education: Principles, Policy & Practice, 15(1), 57–71.


Reimann, P., & Bull, S. (2011). Facilitating communication with open learner models: a semiotic engineering perspective. In G. Biswas, S. Bull, J. Kay, & A. Mitrovic (Eds.),Artificial intelligence in education (pp. 531–533). Heidelberg: Springer.

Reimann, P., Bull, S., Halb, W., & Johnson, M. (2011). Design of a computer-assisted assessment system for classroom formative assessment. Paper presented at the in 4th special trackon computer-based knowledge skill assessment and feedback in learning settings (CAF 2011) at the 14th International conference on Interactive Collaborative Learning (ICL2011), Piestany, Slovakia.

Reimann, P., Hesse, F., Avramides, K., Cierniak, G., & Vatrapu, R. (2012). Supporting teachers in capturing and analyzing learning data in the technology-rich classroom. InJ. van Aalst, K. Thompson, M. J. Jacobson, & P. Reimann (Eds.), The future of learning: Proceedings of the 10th International Conference of the Learning Sciences (ICLS 2012),Vol. 2 (pp. 33–40). Sydney, NSW, Australia: International Society of the Learning Sciences, Short Papers, Symposia, and Abstracts.

Reimann, P., Kickmeier-Rust, M., Meissl-Egghart, G., Moe, E., & Utz, W.. Specification of ECAAD methodology. Retrieved 10 September 2012 from. http://www.next-tell.eu/wp-content/uploads/2011/04/NEXT-TELL-D2.1-MTO-Specification-of-ECAAD-Methodology.pdf.

Suebnukarn, S., & Haddawy, P. (2006). Modeling individual and collaborative problem-solving in medical problem-based learning. User Modeling and User-Adapted Interaction,16, 211–248.

Van der Linden, W. J., & Hambleton, R. K. (Eds.), (1997). Handbook of modern item response theory. New York: Springer.VanLehn, K., Lynch, C., Schulz, K., Shapiro, J. A., Shelby, R., Taylor, L., et al. (2005). The Andes physics tutoring system: lessons learned. International Journal of Artificial

Intelligence in Education, 15, 147–204.Wiliam, D. (2010). An integrated summary of the research literature and implications for a new theory of formative assessment. In H. L. Andrade, & G. J. Cizek (Eds.), Handbook

of formative assessment (pp. 18–40). New York: Routledge.Williamson, D. M., Mislevy, R. J., & Bejar, I. I. (2006). Automated scoring of complex tasks in computer-based testing: an introduction. In D. M. Williamson, R. J. Mislevy,

& I. I. Bejar (Eds.), Automated scoring of complex tasks in computer-based testing (pp. 1–14). Mahwah, NJ: Lawrence Erlbaum.Wilson, M., Bejar, I. I., Scalise, K., Templin, J., Wiliam, D., & Irribardi, D. T. (2010). Perspectives on methodological issues (draft white paper, assessment & teaching of 21st century

skills). Retrieved 10 September, 2012, from. http://atc21s.org/wp-content/uploads/2011/11/2-Methodological-Issues.pdf.

http://www.next-tell.eu/wp-content/uploads/2011/04/NEXT-TELL-D2.1-MTO-Specification-of-ECAAD-Methodology.pdf

http://www.next-tell.eu/wp-content/uploads/2011/04/NEXT-TELL-D2.1-MTO-Specification-of-ECAAD-Methodology.pdf

http://atc21s.org/wp-content/uploads/2011/11/2-Methodological-Issues.pdf

Documents

Problem solving learning environments and assessment: A knowledge space theory approach