Using linear programming to solve clustered oversubscription planning problems for designing e-courses

Expert Systems with Applications 39 (2012) 5178–5188

Contents lists available at SciVerse ScienceDirect

Expert Systems with Applications

journal homepage: www.elsevier .com/locate /eswa

Using linear programming to solve clustered oversubscription planning problemsfor designing e-courses

Susana Fernández ⇑, Daniel BorrajoDepartamento de Informática, Universidad Carlos III de Madrid, Avda. de la Universidad, 30 Leganés, Madrid, Spain

a r t i c l e i n f o a b s t r a c t

Keywords:Automated planningApplicatione-LearningLinear programmingOversubscription

0957-4174/$ - see front matter � 2010 Elsevier Ltd. Adoi:10.1016/j.eswa.2011.11.021

⇑ Corresponding author.E-mail addresses: [email protected] (S.

uc3m.es (D. Borrajo).1 http://www.icaps-conference.org/index.php/Main/C

The automatic generation of individualized plans in specific domains is an open problem that combinesaspects related to automated planning, machine learning or recommendation systems technology. In thispaper, we focus on a specific instance of that task; that of generating e-learning courses adapted to stu-dents’ profiles, within the automated planning paradigm. One of the open problems in this type of auto-mated planning application relates to what is known as oversubscription: given a set of goals, each onewith a utility, obtain a plan that achieves some (or all) the goals, maximizing the utility, as well as min-imizing the cost of achieving those goals. In the generation of e-learning designs there is only one goal:generating a course design for a given student. However, in order to achieve the goal, the course designcan include many different kinds of activities, each one with a utility (that depend on the student profile)and cost. Furthermore, these activities are usually grouped into clusters, so that at least one of the activ-ities in each cluster is needed, though many more can be used. Finally, there is also an overall cost thresh-old (usually in terms of student time). In this paper, we present our work on building an individualized e-learning design. We pose each course design as a variation of the oversubscription problem, that we callthe clustered-oversubscription problem, and we use linear programming for assisting a planner to generatethe design that better adapts to the student.

� 2010 Elsevier Ltd. All rights reserved.

1. Introduction been one of the major drivers of this advancement, and the standard

Automated planning is the branch of artificial intelligence thatstudies the computational synthesis of ordered sets of actionsthat perform a given task (Ghallab, Nau, & Traverso, 2004). Aplanner receives as input a collection of actions (that indicatehow to modify the current state), a collection of goals to achieve,and a state. It then outputs a sequence of actions that achieve thegoals from the initial state. Given that each action transforms thecurrent state, planners may be viewed as searching for pathsthrough the state space defined by the given actions. However,the search spaces can quickly become intractably large, such thatthe general problem of automated planning is PSpace-complete(Bylander, 1994).

The evolution of automated planning can be seen from threedifferent interrelated perspectives: planning techniques, expres-siveness of planning languages, and applications. In relation tothe first one, it is one of the most active areas, since very powerfuldomain-independent techniques have been devised in the last twodecades. However, we can see that there is no such counterpart inthe other two. The International Planning Competition (IPC1) has

ll rights reserved.

Fernández), daniel.borrajo@

ompetitions.

language, PDDL (Gerevini & Long, 2005), one of its bases. Although,PDDL is gaining in expressiveness, very few planners can handlethe full set of features that it defines. This is due mainly to the com-petition: if competition does not focus on specific aspects of theplanning task, then it is hard to see which technique is better forsolving each aspect. Unfortunately, most real world applications re-quire the use of some features of PDDL that are not handled by mostplanners, as well as the use of some representation features that arenot present in PDDL. So, in relation to the third perspective, whentrying to solve different real world applications, then one is very re-stricted to the planners that can be used. Another relation betweencurrent planners and their ability to directly be used in applicationsis that most planners are still in an advanced prototype status. So, aswe will see later, they either crash or fail on solving problems in do-mains that do not belong to the competition. This is probably due tosmall implementation errors, but this makes it unfeasible to usethem ‘‘as is’’ in current applications. Thus, there is still a gap betweenstate of the art planners and their direct application. This can be eas-ily explained given the two different emphasis on both communi-ties: either generating powerful search techniques that can handlesome expressiveness, and the need of polished implementations tobe used as ‘‘off-the-shelf’’ components for applications.

Despite these problems, the application of planning techniquesto real problems sometimes requires solving interesting associated

http://dx.doi.org/10.1016/j.eswa.2011.11.021

mailto:[email protected]

mailto:daniel.borrajo@ uc3m.es

mailto:daniel.borrajo@ uc3m.es

http://www.icaps-conference.org/index.php/Main/Competitions

http://dx.doi.org/10.1016/j.eswa.2011.11.021

http://www.sciencedirect.com/science/journal/09574174

http://www.elsevier.com/locate/eswa

S. Fernández, D. Borrajo / Expert Systems with Applications 39 (2012) 5178–5188 5179

problems that can be useful in more general contexts. This is thecase of the work presented in this paper. The application area isthe generation of learning designs adapted to students’ profiles.And, the associated problem is a variation of the oversubscriptionproblem (OSP) that we have called clustered oversubscription.OSP was first introduced by Smith (2004) and the objective ofthe planning process in the presence of goals oversubscription isnot to find a feasible plan accomplishing all the goals, but to finda plan which reaches some of them, maximizing the sum of utili-ties of achieved goals. As Smith pointed out, many NASA planningproblems are over-subscription problems. For example, space andairborne telescopes, such as Hubble, SIRTF, and SOFIA, receivemany more observation requests than can be accommodated. Asa result, only a small subset of the desirable requests can beaccomplished during any given planning horizon. For a Mars rovermission, there are many science targets that the planetary geolo-gists would like to visit. However, the rover can only visit a fewsuch targets in any given command cycle because of time and en-ergy limitations, and limitations on the rover’s ability to tracktargets.

In this paper, we describe our work on a task that also presentsa kind of oversubscription: automatic generation of e-learningcourses. An e-learning task can be posed as: given a specific stu-dent, and a course defined by a pool of diverse learning activities,automatically generate a learning design for that student usingthe given learning activities that better suit the student profile.This domain has some interesting characteristics that resemblesOSP: (1) each activity has a utility, and a time (cost) that it takesto complete it; (2) the utility that a given learning activity will pro-vide a student depends on some student features; (3) as a hardconstraint, the time to execute all learning activities in the finalplan must be bound by a threshold (the total amount a studentcan dedicate to that course); (4) and, as with usual planning tasks,the activities present causal relationships among them (the stu-dent should follow some learning activities before others). Besides,it has an extra characteristic that comes up with the clusteredoversubscription problem: (5) each learning activity belongs to agiven cluster, and the final design should have at least one activityof each cluster, potentially more. In clustered oversubscription, weperform some analysis before planning that selects a subset of thelearning activities that is worth pursuing, given that they maxi-mize utility (or have a high utility), and fulfill the constraint thatthe sum of their cost (time) is less than the cost limit (time thresh-old). The approach for performing this selection consists of posingthe problem as a clustered-knapsack problem and using linear pro-gramming. Then, we translate this subset in two different ways, aspreferences or as a metric.

Our work is based on previous pedagogical research wherethey have tested a valid way for measuring the utility eachlearning activity reports to the students (Baldiris et al., 2008).So, we focus on the problem of sequencing learning activitiesin a course adapted to every student’s profile by trying to max-imize the total utility. Other relevant work concerning the use ofAI planning and scheduling techniques for sequencing learningactivities, according to different student’s profiles and pedagogi-cal theories, is reported in Castillo et al. (2009), Ullrich and Melis(2009). But, they did not focus on the goal of finding a feasibleplan that maximizes some measure of utility by reaching a sub-set of the goals.

Our aim is that the proposed solution would be general enoughso it can be extrapolated to other domains. An example of domainthat have similar clustered-oversubscription problems is avariation of the Rovers domain. The rovers need to take severalsamples, each sample of a specific kind (cluster) of rock, and thereare several rocks of each kind. Also, taking those samples takes

some time, and they have a specific time threshold to performthe activities. Other examples are earth observing satellites (clus-ters are areas with the same information, as big forests, mountains,or oceans) or the planning and optimization of resources in bustransportation networks (clusters are the different lines that com-pose the transportation network and the goal is optimizing theirresources and minimizing costs).

The contribution of the current paper is twofold: modelling e-learning tasks as clustered-oversubscription planning problems;and providing a generic approach for solving the resulting prob-lems, applicable in other contexts than e-learning applications, aswell. The remainder of the paper proceeds as follows. Next sectionintroduces automated planning and its representation languagePDDL. Section 3 describes how we have modelled the e-learningapplication in PDDL. Section 4 explains the approach that we havedevised for solving the clustered-oversubscription problem by per-forming an action selection pre-processing to help the planningtask using linear programming. Section 5 reports the experimentsperformed to test the validity of the approach. Section 6 describesthe related work. Finally, last section draws the conclusions andour future work.

2. Automated planning

In general, two elements typically define a planning task: a do-main definition comprised of a set of states and a set of actions;and a problem description composed of the initial state and a setof goals that the planner must achieve. Here, a state description in-cludes a collection of grounded predicates and functions, while anaction includes the parameters or typed elements involved in itsexecution, a set of preconditions (a list of predicates describingthe facts that must hold for the action to apply), and a set of effects(a list of changes to the state that follow from the action). The plan-ning task consists of obtaining a partially ordered sequence of ac-tions that achieve all of the goals when executed from the initialstate. Each action modifies the current state by adding and deletingthe predicates represented in the domain-action effects. For exam-ple, the Rovers domain (inspired by planetary rovers problems) re-quires that a collection of rovers navigate a planet surface, findingsamples and communicating them back to a lander. There are threetypes of samples, named soil, rock and image. The domain includesnine actions:

� (navigate ?x – rover ?y – waypoint ?z – waypoint) nav-igates a rover from one point to another. Both points must bevisible and traversable.� (sample_soil ?x – rover ?s – store ?p – waypoint) stores

a soil sample in a rover that is located in a specified point.� (sample_rock ?x – rover ?s – store ?p – waypoint) stores

a rock sample in a rover that is located in a specified point.� (drop ?x – rover ?y – store) empties a rover so that it can

collect a new sample.� (calibrate ?r – rover ?i – camera ?t – objective ?w –

waypoint) calibrates a rover’s camera using a specified cali-bration target which is visible from the current rover position.� (take_image ?r – rover ?p – waypoint ?o – objective ?i– camera ?m – mode) takes an image of an objective in a spec-ified mode. The objective is located in a point visible from thecurrent rover position.� (communicate_soil_data ?r – rover ?l – lander ?p –

waypoint ?x – waypoint ?y – waypoint) communicates asoil data to the lander.� (communicate_rock_data ?r – rover ?l – lander ?p –

waypoint ?x – waypoint ?y – waypoint) communicates arock data to the lander.

2 The full standard can be found in the IMS Global Learning Consortium web page:ttp://www.imsglobal.org.3 http://www.reload.ac.uk/.

5180 S. Fernández, D. Borrajo / Expert Systems with Applications 39 (2012) 5178–5188

� (communicate_image_data ?r – rover ?l – lander ?o –

objective ?m – mode ?x – waypoint ?y – waypoint) com-municates an image to the lander.

Here, the question marks indicate the predicate variables, whichthe planner must bind to domain constants to create specific actioninstances. An example of domain action is:

(:action communicate_image_data:parameters (?r - rover ?l– lander ?o– objective ?m–

mode ?x – waypoint ?y – waypoint)

:precondition (and (at ?r ?x)(at_lander ?l

?y)(have_image ?r ?o ?m) (visible ?x ?y) (available

?r)(channel_free ?l))

:effect (and (not (available ?r))(not (channel_free?l))(channel_free ?l) (communicated_image_data ?o

?m)(available ?r)))

A problem example could be given by the following initial stateand goals:

(:init (available rover0)(on_board camera0 rover0)

(calibration_target camera0 objective1)

(supports camera0 colour)(equipped_for_imagingrover0)

(channel_free general)(at rover0 waypoint3)

(visible_from objective1 waypoint2)

(at_lander general waypoint0)(visible waypoint3

waypoint0)

(visible waypoint2 waypoint3)(visible waypoint3

waypoint2)

(can_traverse rover0 waypoint3 waypoint2)

(can_traverse rover0 waypoint2 waypoint3))

(:goal (and (communicated_image_data objective1

colour)))

And, a solution plan for that problem is: (navigate rover0

waypoint3 waypoint2) (calibrate rover0 camera0

objective1) (take_image rover0 waypoint2 objective1

camera0) (navigate rover0 waypoint2 waypoint3) (com-

municate_image_data rover0 general objective1 colour

waypoint0).Automated planning tasks are usually described in predicate

logic (although, others languages like description logic or temporallogic have also been used). Currently, the Planning Domain Defini-tion Language (PDDL) is the standard representation language ofautomated planning (Gerevini & Long, 2005). PDDL was createdas the planning input language for the planning competition withthe aim of standardizing the planning representation languageand facilitating comparative analysis of the diverse planningsystems. The representation language of the examples above isPDDL.

PDDL language has evolved from its initial STRIPS formulationsof planning problems to the different language extends beyondthat. Next, we only explain the PDDL enhancements relevant forour work. The first ones, defined in the ADL extension (Pednault,1989), are for expressing actions with negative preconditions andconditional effects. Negative preconditions are represented byadding not before the precondition while a conditional effect hasthe form: (when <condition_formula> <effect_formula>).The semantics of conditional effects is that the specified effecttakes place only if the specified condition formula is true in thestate where the action is executed. The second enhancement,defined in PDDL2.1 (Fox & Long, 2002), is for handling numeric val-ues and plan metrics that allow reasoning about plan quality. It al-lows users to define functions (fluents) that can be used in actions’

preconditions or effects (as they were predicates). Their initial val-ues are given in the problem file and actions can increase or de-crease them. For example, we can define a fluent for representingthe total cost of a plan. Its value is initially set to zero and actionsincrement it according to the cost of executing the action. Plannerstry to find a plan that includes the actions that minimizes the over-all cost. Finally, PDDL3.0 extends PDDL to express (among othersthings) soft goals, which are desired goals that a valid plan doesnot have to necessarily achieve (Gerevini & Long, 2005). They canbe used to express preferences that affect the plan quality, withoutrestricting the set of valid plans. In general, not all the specifiedpreferences can be satisfied, and identifying the best subset of pref-erences that can be achieved is an extra difficulty to deal with inthe planning process. The problem file contains the soft goals with-in the usual goals. A soft goal or preference has the form: (pref-erence <id > <goal>), where <id> is the preference identifier and<goal> is an instantiated domain-predicate.

3. Modelling the planning task in PDDL

The first step for building an e-learning planning applicationconsists on modelling the courses. There are many theories,methodologies and tools that help on doing it (Cristea & Garzotto,2004; Jochems, Koper, & Merrienboer, 2003). The most extendedapproach uses e-learning standards, as IMS-MD, where thedesigner of the course defines each different learning activity asan XML schema named learning object.2 Besides, there are tools asReload3 that help on creating these schemas. For example, in thedefinition of an AI course, there may be a generic task for readingthe introduction to planning. And, there could be several learning ob-jects to accomplish it, as viewing a slide presentation, reading theintroduction text from the text book, searching for the concept ofautomated planning in the Web and reading a couple of pages, orseeing a graph about automated planning history. All the activitiesthat achieve the task can be seen as belonging to the task cluster.Usually, it is enough for the student to follow only one of them toaccomplish the task, but more than one can be performed as well,improving the overall utility (learning the subject). As an instanceof the application of planning to this task, the work reported inGarrido et al. (in press) proposes three approaches for automaticcompilations that interpret and translate IMS-MD courses toplanning models.

Also, each activity can be more or less appropriate to eachstudent depending on the student profile (activity utility for aspecific student). So, the second modelling step is to determine thisprofile, given that it will define part of the knowledge of theplanning problem. Again, there are many theories about it as theFelder’s learning styles (Felder, 1996). The learning styles modeldeveloped by Richard Felder incorporates four dimensions: the Per-ception dimension (sensitive/intuitive), the Processing dimension(active/reflective), the Input dimensions (visual/verbal) and theUnderstanding dimension (sequential/global). From the planningpoint of view, the relevant issue is that each learning activity canbe labelled with an utility value that depends on the particularcharacteristics of each student. Each activity also takes a time to ful-fill it. And, also, learning activities can have dependency relationsamong them. For instance, before reading about PDDL, the studentshould have some knowledge on predicate logic.

Once someone has defined a course in some e-learningstandard, we use a compiler that translates it into PDDL. We defineone domain per course. The input to the planner in the e-learning

h

http://www.imsglobal.org

http://www.reload.ac.uk/

Table 1Classification of the learning styles according to the learning source type. A value of 2 means the learning source type is very good for the learning style, 1 is good and 0 indifferent.Source: Baldiris et al. (2008).

L. source type Processing Perception Input Understanding

Active Reflective Intuitive Sensitive Sequential Global Visual Verbal

lecture 0 2 2 0 0 0 0 2Narrative text 1 2 2 0 0 0 0 2Slide 0 1 1 1 1 2 2 1Table 0 0 0 1 1 1 1 1Index 0 0 0 0 1 2 0 1Diagram 0 1 1 1 1 2 2 0Figure 0 1 1 1 1 2 2 0Graph 0 1 1 1 1 2 2 0Exercise 1 1 2 1 1 0 1 2Simulation 2 0 0 2 0 1 2 0Experiment 2 1 1 2 1 0 2 0Questionnaire 1 0 0 0 1 0 0 0Problem statement 2 1 1 2 0 0 1 2Selfassessment 1 0 0 0 1 0 0 0Exam 1 0 0 0 1 0 0 0


application is a set of learning activities defined in the IMS-MDstandard (domain) and a student profile according to the Felder’slearning styles (problem). The compiler translates the IMS-MDset of activities into a PDDL domain defining one action perlearning activity. And, the student profile is represented as predi-cates included in the PDDL problem. We use one predicate for eachFelder’s learning style. For example, (sequential ?s – student

?p – <profile_level_type>). The <profile_level_type>can take the value strong, moderate or balanced. If the systemdetermines that the student is, for example, strong sensitive andstrong active we would add in the :inits of the PDDL problem thepredicates (active student1 strong) and (sensitive

student1 strong).The following sections describe in detail how to obtain learning

designs adapted to the student profile, the compilation of anIMS-MD course into a PDDL domain and the planning task forgenerating the designs.

3.1. Student profile adaptation

Each learning object defined in a IMS-MD course belongs to alearning resource type, such as a lecture, narrative text, diagram,etc. According to education experts the learning resource typehighly interacts with the student profile. For instance, a lecture isvery recommendable for Felder verbal students but not for visualones; and just the opposite holds for a diagram. So, by modellingboth the learning resource type of each learning object and the stu-dent learning style, and including a reward function (that repre-sents the activity utility), we can offer a full support for learningobject adaptation to the students. Particularly, we use conditionaleffects in the domain actions (Section 2 explains the conditional ef-fects in PDDL) to encode in one action all the profile adaptation op-tions. For each learning style we add a conditional effect thatincreases the reward function in a given amount.

In order to define these conditional effects, we used the tabledefined in Baldiris et al. (2008) where rows represent learningsource types, columns Felder’s learning styles and intersectionscan take the values: very good, good or indifferent. Table 1 showsthis table where 2 is equivalent to very good, 1 to good and 0 toindifferent. We take the row of the learning source types definedin the learning object and add a conditional effect for each columnwith a value different from indifferent. So, we add (when (<style>?s strong) (increase (reward_ student?s) <v>)), where<style> is the learning style represented in the column and <v>is the reward increment. The fluent (reward_student ?s) repre-sents the reward function. To compute these increments, we

assume each activity can have a maximum reward of 120, and thata very good value takes double reward than a good value and anindifferent value has zero reward.

For each learning source type in a row, we sum the number ofcolumns with a value different from indifferent, d. The reward in-creases for each learning style in that row will be 120/d for verygood and half of it for good. For example, the index row in the tabletakes the very good value for the global column, good for sequentialand verbal and indifferent for the rest of the columns (i.e. there arethree columns with a value different from indifferent). Thus, thereward when the student is strong global will be 120/3 = 40 andhalf of it when the student is strong sequential or verbal. So theconditional effects of the corresponding PDDL action would be:

(when (global ?s strong) (increase (reward_student?s) 40))

(when (sequential ?s strong) (increase (reward_stu-dent ?s) 20))

(when (verbal ?s strong) (increase (reward_student?s) 20))

3.2. Domain compilation

As we said, an IMS-MD course is composed of a set of learningobjects. Each learning object represents a learning activity that acompiler translates into a PDDL action in the following way:

� The XML label <title> represents the action name.� We define a predicate with the same action name, but adjoining

the prefix task_ and the suffix _done. The action effects includethis literal and represents the fact that the student hasperformed such activity and prevents him/her from repeating it.� The XML label <typicallearningtime> represents the activity

duration. We use a fluent, (total_time_student ?s), thatthe action effects increase in the amount of this label. Thisfluent represents the total_time function that computes theoverall time the student requires to fulfill the generatedlearning design.� The XML label <learningsourcetype> represents the activity

source type. As explained above, we have used a fluent,(reward_student ?s), to represent the reward function(action utility), and conditional effects to compress in the actionall the effects that depend on the student learning style.� The XML label <relation>. We only use in this paper two of the

several types of relations (among learning objects) defined inIMS-MD: Requires and IsBasedOn. Requires represents a causal


dependency between two activities and IsBasedOn representsthat the activity can be performed by doing one of the activitiesin the relation. In the case of the Required elements, all of them(conjunctively) have to be completed before initiating a newlearning objects: if ’A Requires B’ and ’A Requires C’, both B andC need to be finished before doing A. And, the learning objectthat represent A will have a Requires relation with two ele-ments, B and C. In the case of the IsBasedOn elements, at leastone of them (disjunctively) has to be completed: if ’A IsBasedOnB’ and ’A IsBasedOn C’, only B or C must be completed to com-plete A. We translate each activity in a Requires relation intoan action precondition while activities in a IsBasedOn relationare translated as or preconditions. In fact, we consider a learn-ing object with an IsBasedOn relation as a fictitious action,because the student has to perform only one of the actions inthe or condition. So, it neither consumes time nor reports extrareward to the student (the actions in the or condition will dothem).

Figs. 1 and 2 show PDDL actions translated from learning ob-jects with relations of type Requires and IsBasedOn, respectively.The first action describes the activity simulates-strips-problem. It re-quires that the student has already performed activity reads-classi-cal-planning, it takes 30 min, and it adds the correspondingrewards. The learning source type is simulation that is very goodfor strong active, sensitive and visual students and good for strongglobal students, generating the conditional effects (the when pred-icates). We assume all activities provide some positive reward tothe student, so we add a minimum of five to the total reward (ef-fect (increase (reward_student ?s) 5)). We add the precon-dition (not (task_simulates-strips-problem_done ?s))

to avoid including twice the same activity in the plan. The secondaction represents that a student could perform the activity simu-lates-strips-problem or experiments-strips-problem to accomplishthe task task_strips_done. Again, we add the precondition (not

(task_strips_done ?s)) to avoid including twice the sameactivity in the plan. We translate the activities in the IsBasedOnrelation into or preconditions (adjoining the prefix task_ and thesuffix _done). Finally, both the reward and the total_time functionsremain the same because the actions in the or preconditions willincrease their values.

The last learning object defined in the IMS-MD design isnamed finish-course and it contains, as a Requires relation,the tasks required to fulfill the course. We translate this learningobject into a fictitious PDDL action with one effect, (task_fic-titious-finish-course_done ?s), and its preconditions arethe tasks required to complete the course, plus the predicates(< (total_time_student ?s) (time_threshold_student

Fig. 1. Example of PDDL action translated from

?s)), to avoid the plan to exceed the time limit, and(>(total_reward_student ?s) (reward_threshold_student?s)). The problem file defines both thresholds, (time_thresh-old_student ?s) is the total time the student can devote tothe course (it is an input data) and the activities selection pre-processing allows our system to compute a lower bound of thereward threshold, as explained in Section 4. The problem filehas only the goal (task_fictitious-finish-course_done?s). Fig. 3 displays an example of fictitious-finish-course for anAI course. In this case, the tasks required to fulfill the courseare the ones displayed in the precondition. Notice that only a sub-set of the tasks in the course are encoded here. The other ones areincluded as elements of Requires relations representing the causaldependencies among tasks. For example, before learning multi-agent search the student must know heuristic search, that previ-ously requires knowing non-informed search, and so on. In thesame way, the action includes the last task to fulfill each topic,that is usually performing a questionnaire to evaluate the studentknowledge on the subject.

3.3. Planning task

This representation allows any planner supporting full ADLextension (including conditional effects) and fluents to find a solu-tion. However, there is no guarantee to obtain the best solution; i.e.the sequence of actions that reports the maximum reward to thestudent. We could use as plan metric to minimize the total_time,but then the planners will not maximize the reward. On the otherside, a metric for maximizing the reward is unfeasible for plannersbased on the enhanced relaxed-plan heuristic introduced inMetric-FF (Hoffmann, 2003), as they only work with minimizationtasks. We cannot easily transform the metric for maximizing thereward into a metric for minimizing the inverse using state-independent fluents, due to how rewards are computed in thisdomain. The recommendation table contains information onwhether the learning resource type is good, very good or indifferentfor each learning style. But we cannot assert anything about theinverse. For instance, it is not true that if a lecture is very good fora reflective student it is very bad for a non reflective student. Itcould be that for a non reflective student the lecture is indifferent.So, we do not know which values to assign to those cases not inthe table. Other equivalent transformations would requirestate-dependent fluents for computing the metric. Thus, insteadof giving the planners the task of minimizing a function of timeand inverse of reward, we first run an activity (action) selectionalgorithm to select the most promising activities, O = {ai}, and thenuse that information when planning.

a learning object with a Requires relation.

Fig. 2. Example of PDDL action translated from a learning object with a IsBasedOn relation.

Fig. 3. Example of PDDL action translated from a learning object finish-course representing an AI course.


Although, there is no exact transformation of the metric formaximizing the reward into a metric for minimizing its inverse,we can use an approximation. We define a penalty_student flu-ent that increases in two units each time the plan includes an indif-ferent action, in one unit when it includes a good action, and thereis not penalty for very good actions. This way, we guide the plannerto include very good actions in the plan; i.e. actions that providehigher rewards to the student. Conditional effects, similar to theones used to increase the reward_student, can model this metric.We have performed some experiments to compare both approxi-mations as reported in Section 5.

4. Activities selection

This section describes the method for obtaining the set of ac-tions that maximizes the student reward within a time limit, byusing linear programing. First, we formalize the problem and thenwe describe the approach in detail.

4.1. Problem formalization

The problem is similar to the well-known knapsack problem4 incombinatorial optimization, but with the addition of clusters. Wehave a set of activities, A, each with a time t (duration, cost) and areward r (utility), "a 2 A, a = <t, r>. The goal is to determine the setof activities to include in the output, O = {a1, . . . ,an}, ai 2 A, so thatthe total time is less than a given limit T:X

ai¼<ti ;ri>2O

ti 6 T

and the total reward is maximized. Besides, activities are groupedinto a set of clusters, C ¼ fc1; . . . ; cmg; ci ¼ fa1; . . . ; aci

g that can per-form the same learning task. The solution must contain at least oneactivity of each cluster: "ci 2 C at least one aj 2 ci should be in O. Weautomatically extract this maximization task from a PDDL domainand problem file. Domain actions translated from learning objectswith a IsBasedOn relation define the clusters. The activity rewardis computed from the problem file (extracting the student learn-ing-styles) and the conditional effects concerning the fluent

4 Given a set of items, each with a weight and a value, determine the items toinclude in a collection, so that the total weight is less than a given limit and the totavalue is as large as possible.

l

(reward_student). And the activity time is the increase of thefluent (total_time_student) in the corresponding PDDL action.

For example, giving three tasks or clusters c1, c2 and c3, a timelimit of 140 and the following learning activities: (1) a1 2 c1: withduration 10 and reward 30, (2) a2 2 c1: with duration 10 andreward 10, (3) a3 2 c1: with duration 10 and reward 25, (4)a4 2 c2: with duration 60 and reward 5 and (5) a4 2 c3: withduration 60 and reward 5. A feasible solution for that problemcould include activities a1, a4 and a5. There is at least one activitybelonging to each cluster and the overall duration is 130 (less orequal than the time limit). The total reward is 40. However, theoptimal solution includes activities a2, a3, a4, a5 because the overallutility is higher (45) and the overall duration (140) is still less orequal than the time limit.

4.2. Linear programming

The linear programming (LP) task will take as inputs: the set ofactivities A; the set of clusters or tasks C; the time ti and reward ri

of each activity; a binary matrix ci,j representing whether activity ibelongs to task j; the time limit T; and one binary variable xi foreach activity representing whether activity i is in the solutionset, O. The objective function consists of maximizing

Pixi � ri

subject to two linear inequality constraints:P

ixi � ti <¼ T forthe time limit and 8j;

Pici;j � xi >¼ 1 to ensure that at least one

activity is present per cluster task j. This last constraint is thenew one with respect to standard knapsack problems. An impor-tant property is that LP algorithms guarantee optimality. Anotheralternative would have been modelling the action preconditionsas linear equality and inequality constraints. However, this is stillan open issue and it is not trivial due to the disjunction relationsamong some preconditions. Therefore, a complete formulation ofthe whole problem in LP or CSP (Constraints Satisfaction Problems)is difficult. Here, we opt for a hybrid approach in that LP solves theLP component of the task, and planning solves the causalcomponent of the task.

4.3. Using the selection when planning

We have devised two approaches for using the knowledge onthe most promising activities when planning: including the actionsin O as PDDL3 preference-goals (see soft goals in Section 2); orincluding those actions in the plan metric. In the former, we add

1100

1200

1300

1400

1500

1600

1700

1800

1900

2000

2100

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: AI. Active student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

1200

1400

1600

1800

2000

2200

2400

2600

2800

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: AI. Active and global student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

1000

1200

1400

1600

1800

2000

2200

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: AI. Reflective student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

Fig. 4. Total reward obtained by the different systems for the Artificial Intelligence course.


the goal (preference pi (task_ai_done student1)) for eachai 2 O (pi is the preference identifier). In the latter, we define anew predicate (action-in-plan?s – student?a – plan-ac-

tion) and a new fluent (penalty ?s -student). We updateeach domain action ai with the following conditional effect (being<action-name> its name):

(when (not (action-in-plan ?s <action-name>))(increase (penalty ?s) 1))

Thus, everytime the planner adds an activity to the plan thatdoes not belong to the selected set, it increases the penalty valuein one unit. Then, we add a predicate (action-in-plan stu-

dent1 ai-name) in the problem file for each ai 2 O, and we usethe metric (minimize (penalty student1)). Thus, the plannerwill only try to minimize the number of activities in the plan thatwere not selected previously. So, O contains the knowledge formaximizing the student reward assuming there are no causaldependencies between activities.

As a useful side-effect, the action selection methods also com-pute the maximum reward, MaxR, that could be obtained for thegiven time limit in case there were no activities dependencies:

MaxR ¼X

ai¼<ti ;ri>2O

ri: ð1Þ

On the other hand, the penalty metric represents the number ofactivities that appear in the plan, but were not selected by theselection mechanism (due to the causal dependencies). Therefore,we can compute a lower bound value for the reward threshold

necessary for the precondition (> (total_reward_student ?s)

(reward_threshold_student ?s)) of the last domain action.We use Eq. (2).

reward threshold student ¼ MaxR� ð120� penaltyÞ; ð2Þ

where 120 is the highest individual activity reward.

5. Experiments

We have carried out the experiments with a domain represent-ing a real Artificial Intelligence course. The domain has 126 actionsand there are 53 tasks or clusters. Each task includes between onlyone activity up to six. There are activities for covering all thetypical tasks in an AI course as reading a subject, practising,programming, performing exams, . . .. We have also definedanother three courses, subsets of the AI course, of different sizes,specifically: the Search course, with 76 actions and 26 clusters;the Representation course, with 24 actions and 10 clusters; andthe Others subjects in AI course with 20 actions and 14 clusters.Finally, we have considered two different student profiles, activeand reflective according to the Felders’ taxonomy and a third pro-file mixing an active and global student to obtain a higher variationwithin the generated plans.

After compiling the IMS-MD courses, we have defined thecorresponding planning problems. All the compilation processestook only a few seconds. We have used the CBP (Fuentetaja,Borrajo, & Linares, 2009) planner for incorporating the obtainedknowledge as the plan metric. The CBP planner is based onMetric-FF, but with some enhancements (specific cost-basedheuristic function, lookahead search) that allow it to solve our

450

500

550

600

650

700

750

800

850

900

950

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Search. Active student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

600

700

800

900

1000

1100

1200

1300

1400

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Search. Active and global student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

400

500

600

700

800

900

1000

1100

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Search. Reflective student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

Fig. 5. Total reward obtained by the different systems for the Search course.


problems using plan metrics. Metric-FF was able to solve the prob-lems, but only with the algorithm Enforced Hill-climbing (EHC).The algorithm Best First Search (BFS), which was needed for usingplan metrics, exhausted computer memory. The other plannersthat competed in the IPC-08 could not solve our problems. We be-lieve that this might be due to the domain size and the PDDL fea-tures included in the domain (fluents, numerical expressions in thepreconditions, preconditions with or and not expressions and con-ditional effects).

We have used SGPlan5.2.2 (Hsu & Wah, 2008) for incorporatingthe obtained knowledge by the LP as PDDL3 preferences-goals.Again, SGPlan is the only planner (from the IPC-08) that supportsPDDL3 and it was able to solve our problems. However, the solu-tions generated by SGPlan ignored the time-limit constraint pro-ducing unfeasible plans.

Therefore, we compared the following configurations:

� EHC: original Enforced Hill-climbing algorithm in Metric-FF,� BFS TIME: CBP planner with Best First Search algorithm and min-

imizing the total_time_student as plan metric,� BFS PENALTY: CBP planner with Best First Search algorithm and

minimizing the penalty_student as plan metric,� BFS LP: CBP planner with BFS algorithm incorporating the

knowledge obtained by the LP algorithm and minimizing pen-alty as plan metric,� BFS LP-TH: the same as above but using the reward_threshold.

CBP is able to find multiple solutions. So, we let it run duringone second in multiple solutions mode, and it usually found oneor two solutions. We have experimentally seen that increasing this

running time does not make it find more solutions, but it exhaustscomputer memory. Similarly, we let simplex run during 1800 s tosolve each LP problem.

We varied the student time limit by decreasing/increasing it in20, 15, 10 and 5% of the sum of the times of the highest-timeactivity in each cluster. So, we decrease/increase it from a �20%to a 20%. Figs. 4–7 show the reward results for the different do-mains. We have included only the best (highest reward) solutionfor each configuration in case the planner found multiple solu-tions. The Simplex algorithm solved all the LP problems in lessthan one second, except in some isolated cases where it tooklonger (exactly, 1501.8, 296.6, 258.9, 90.2, 50.6, 48.2, 7.9, 4, 1.8and 1.3 s). There was only one case where it did not find the solu-tion after the 1800s, in the AI course, reflective student and timelimit �15%.

The results show that the combination of the two kinds of infor-mation that the pre-process offers us (computing the most prom-ising activities and the reward threshold) reports a significantbenefit to the solution plans, especially when the time thresholdis high. Without the reward threshold the planner stops the searchwhen the plan includes one activity per cluster although there wasenough time for adding more learning activities that could rein-force the student knowledge. The greater the variety of the actionsutility is, the higher the benefit of the pre-processing process is, asthe active and global student graphs show. Also, the size of thecourse influences: the bigger differences among the tested systemsappear in the first courses that have more activities and clusters. Inthe Others course, that only has 20 actions and 14 clusters, the pre-processing does not improve the solutions of the penalty_studentmetric. In general, the base planners found reasonable solutions

260

280

300

320

340

360

380

400

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Representation. Active student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

300

320

340

360

380

400

420

440

460

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Representation. Active and global student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

200

210

220

230

240

250

260

270

280

290

300

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Representation. Reflective student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

Fig. 6. Total reward obtained by the different systems for the Representation course.


except in some problems when the time limit was small. Thepenalty_student metric obtained similar solutions to the BFS LP ap-proach, but there were quite a few problems where it could notfind the solution.

6. Related work

There are several classes of oversubscription problems, depend-ing on the cause preventing all the goals to be accomplished. Ouroversubscription problem is caused by having limited resources(time), and plans achieving all goals would need more resourcesthan the available ones. It is known as Max-cost problems, focusingon maximizing plan utility, while maintaining the cost lower thana maximum cost (García-Olaya, de la Rosa, & Borrajo, 2008). This isopposite to Net-benefit problems, that focus on maximizing the netbenefit which is the cumulative utility of the achieved goals minusthe cumulative cost of actions used in the plan (Nigenda et al.,2005). In our case, time is the resource and performing all theactivities defined in the course would achieve the problem goals.But not all goals can be achieved because of lack of time. The planmust select only those goals that report the maximum utility (re-ward) to the student without exceeding the given time threshold.However, our problem differs from other approaches to OSP inthe following aspects:

(i) In OSP problems, the utilities are associated to goals, whilein our case they are associated to actions. Since each domainaction in our approach only adds one literal, it is possible totransfer utility values from goals to actions.

(ii) OSP problems are further complicated by the fact that thegoals have both cost and utility interaction (becauseachievement of one goal may make it cheaper or moreexpensive to achieve another goal, and because the achieve-ment of one goal may make it more or less useful to achieveanother goal) (Yoon, Benton, & Kambhampati, 2008), but ourproblems lacks this interaction.

(iii) Our problem adds the clustering constraint, where a feasibleplan must contain at least one activity of each cluster.

OSP approaches can be classified in the ones that solve a relaxedversion of the original OSP problem at each search node (Benton,van den Briel, & Kambhampati, 2007; Do, Benton, van den Briel,& Kambhampati, 2007) and the planners that first select goalsup-front, and then solve for those goals in a classical planningproblem (Nigenda et al., 2005; Smith, 2004). The first ones developeffective heuristics that take cost and utility interaction intoaccount. In Benton et al. (2007) they provide an admissibleheuristic based on linear programming and in Do et al. (2007) aninformative heuristic based on integer programming. They alsoimplement a planner, IPUD, that compiles partial satisfactionplanning (PSP) with utility dependencies to integer programming.Goals selection procedures vary; in Smith (2004) an orienteeringproblem (a generalization of the travelling salesperson problem(Feillet, Dejax, & Gendreau, 2005)) is constructed and solved,obtaining an ordered subset of goals to plan for. A differentapproach is taken by Nigenda et al. (2005), which using relaxedplans heuristics estimates the net benefit of including a new goalinto the set of currently selected goals. We also select goals

300

320

340

360

380

400

420

440

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Others. Active student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

340

360

380

400

420

440

460

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Others. Active and global student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

370

380

390

400

410

420

430

440

−20 −15 −10 −5 0 5 10 15 20

Rew

ard

Time increments (%)

COURSE: Others. Reflective student

EHC

BFS Time

BFS Penalty

BFS LP

BFS LP−TH

Fig. 7. Total reward obtained by the different systems for the Others course.


up-front, but due to differences (i) and (ii) pointed out above in-stead of constructing an orienteering problem we construct a clus-tered-knapsack problem. In Smith (2004) a beam search algorithmis used to solve the orienteering problem, while we use linear pro-gramming to solve the clustered-knapsack problem.

7. Conclusions and future work

This paper presents our work on building an e-learning plan-ning application for generating learning designs adapted to differ-ent students’ profiles. An e-learning course is defined by a set ofdifferent learning activities, that use learning objects. Learningactivities, with their duration, student profile’s dependence andthe relations defined in their metadata, keep a strong resemblancewith actions as traditionally used in AI planning domains. In partic-ular, each learning activity can be simply modelled as an action, itsdependency relations as preconditions, and its outcomes as effects.This way P&S techniques can be very powerful to automaticallygenerate sequences of learning activities fully adapted to the stu-dents. However, current planners are good for sequencing actions,but they are not so good for optimizing. Optimizing requires to usefluents and plan metrics, and it notably increases planning com-plexity. We are interested in finding the sequence that maximizesthe student’s net benefit. Therefore, we have modelled the problemas a clustered-oversubscription problem performing an actionselection pre-processing to help the planning task. We have pro-posed an action-selection schemes based on linear programing,that partially solves the optimization problem, and whose outputis compiled into the planning task. It turns out that for this task,

LP provides optimal solutions in a more than reasonable time.PDDL provides expressive power to model the problem, and toinclude the optimization knowledge in two different formats. Theresulting PDDL domain is apparently simple, because there areno delete effects, but the complexity arises from the number ofactions, the conditional effects, and the plan metric. Only theSGPlan and CBP planners were able to solve the problem, butSGPlan gave unfeasible solutions because it violated the thresholdtime constraint.

Results show that the learning designs obtained with ourapproach report greater benefits to the students with regards tothe overall utility of the proposed learning activities. The benefitsare specially relevant when the threshold time is high and thecourse size is big. A base planner, without the information reportedby our clustered oversubscription pre-processing, stops the searchwhen the plan includes one activity per cluster although there wasenough time for adding more learning activities that couldreinforce the student knowledge. Our approach, however, contin-ues including activities in the design until the reward thresholdis exceeded. On the other hand, when the time limit is small, a baseplanner has difficulties finding a solution that our pre-processingcan overcome.

Another alternative would have been modelling the actionpreconditions as linear equality and inequality constraints.However, this is still an open issue and it is not trivial how to doit, due to the disjunction relations among some preconditions.Therefore, a complete formulation of the whole problem as LP orCSP tasks is difficult. Here, we opt for a hybrid approach in thatLP solves the LP component of the task, and planning solves thecausality-related component of the task.


In the future, we would like to test our approach in other do-mains as the clustered Rovers and Satellite ones, as well as, in otherapplication areas. For example, the planning and optimization ofresources in bus transportation networks.

Acknowledgement

This work has been partially supported by the Spanish MICINNunder projects TIN2008-06701-C03-03 and TIN2005-08945-C06-05.

References

Baldiris, S., Santos, O., Barrera, C., Gonzalez, J., Velez, J., & Fabregat, R. (2008).Integration of educational specifications and standards to support adaptivelearning scenarios in adaptaplan. Special Issue on New Trends on AI techniquesfor Educational Technologies. International Journal of Computer Science andApplications (IJCSA).

Benton, J., van den Briel, M., & Kambhampati, S. (2007). A hybrid linearprogramming and relaxed plan heuristic for partial satisfaction planningproblems. In Proceedings of the 17th international conference on AutomatedPlanning & Scheduling (ICAPS-07) (pp. 34–41).

Bylander, T. (1994). The computational complexity of propositional STRIPSplanning. Artificial Intelligence, 69(1-2), 165–204.

Castillo, L., Morales, L., González-Ferrer, A., Fdez-Olivares, J., Borrajo, D., & Onaindía,E. (2009). Automatic generation of temporal planning domains for e-learningproblems. Journal of Scheduling, 13(4), 347–362.

Cristea, A., & Garzotto, F. (2004). Adapt major design dimensions for educationaladaptive hypermedia. In ED-Media ’04 conference on educational multimedia,hypermedia & telecommunications. Association for the advancement ofcomputing in education.

Do, M. B., Benton, J., van den Briel, M., & Kambhampati, S. (2007). Planning with goalutility dependencies. In Proceedings of the 20th international joint conference onArtificial Intelligence (IJCAI-07) (pp. 1872–1878). Hyderabad, India

Feillet, D., Dejax, P., & Gendreau, M. (2005). Traveling salesman problems withprofits. Transportation Science, 39(2), 188–205.

Felder, R. M. (1996). Matters of style. ASEE Prism, 6(4), 18–23.Fox, M., & Long, D. (2002). PDDL2.1: An extension to PDDL for expressing temporal

planning domains. Durham, UK: University of Durham.Fuentetaja, R., Borrajo, D., & Linares, C. (2009). A look-ahead B&B search for cost-

based planning. In Proceedings of CAEPIA’09 (pp. 105–114).García-Olaya, A., de la Rosa, T., & Borrajo, D. (2008). A distance measure between

goals for oversubscription planning. In Proceedings of ICAPS’08 workshop onOversubscribed Planning & Scheduling. Sydney, Australia.

Garrido, A., Fernández, S., Morales, L., Onaindía, E., Borrajo, D., & Castillo, L. (2010).On the automatic compilation of e-learning models to planning. The KnowledgeEngineering Review, in press.

Gerevini, A., & Long, D. (2005). Plan constraints and preferences in pddl3 – Thelanguage of the fifth international planning competition. Tech. rep.

Ghallab, M., Nau, D., & Traverso, P. (2004). Automated planning – Theory and practice.San Francisco, CA: Morgan Kaufmann.

Hoffmann, J. (2003). The Metric-FF planning system: Translating ‘‘ignoring deletelists’’ to numeric state variables. JAIR, 20, 291–341.

Hsu, C.-W., & Wah, B. W. (2008). The sgplan planning system in ipc-6. In Workingnotes of the ICAPS’08 International Planning Competition.

Jochems, W., Koper, R., & Merrienboer, J. V. (Eds.). (2003). Integrated e-learning:pedagogy, technology and organization. London: Kogan Page.

Pednault, E., 1989. ADL: Exploring the middle ground between strips and thesituation calculus. In Proceedings of the conference on Knowledge Representationand Reasoning (pp. 324–332).

Sanchez Nigenda, R., & Kambhampati, S. (2005). Planning graph heuristics forselecting objectives in over-subscription planning problems. In Proceedings ofICAPS-05 (pp. 192–201).

Smith, D. (2004). Choosing objectives in over-subscription planning. In Proceedingsof ICAPS-04 (pp. 393–401).

Ullrich, C., & Melis, E. (2009). Pedagogically founded courseware generation basedon htn-planning. Expert Systems With Applications, 36(5), 9319–9332.

Yoon, S. W., Benton, J., & Kambhampati, S. (2008). An online learning method forimproving over-subscription planning. In ICAPS (pp. 404–411).

Documents

Using linear programming to solve clustered oversubscription planning problems for designing e-courses