Problem Order Implications for Learning Transfer

Embed Size (px)

DESCRIPTION

Problem Order Implications for Learning Transfer. Nan Li, William Cohen, and Kenneth Koedinger School of Computer Science Carnegie Mellon University. Order of Problems. One of the most important variables that affects learning effectiveness Blocked order vs. interleaved order - PowerPoint PPT Presentation

Text of Problem Order Implications for Learning Transfer

Hidden Concept Detection in Graph-Based Ranking Algorithm for Personalized Recommendation

Problem Order Implications for Learning TransferNan Li, William Cohen, and Kenneth KoedingerSchool of Computer ScienceCarnegie Mellon University#1Order of ProblemsOne of the most important variables that affects learning effectivenessBlocked order vs. interleaved order

Interleaved is better! Why?Most existing textbooksNumerous previous studies#Provide a theoretical explanation for why interleaving enhances learning

PracticeMost existing textbooks organize problems in a blocked orderTheoryInterleaved order > blocked orderWhy?A computational model that demonstrates such behavior

2Need for Better TheoryStudiesContextual interference (CI) effect (Shea and Morgan, 1979)Mixed results on complex tasks or novicesHypothesisElaboration hypothesis (Shea and Morgan, 1979)Forgetting or reconstruction hypothesis (Lee and Magill, 1983)Proposed ApproachA controlled simulation studyUsing a machine-learning agent, SimStudentGiven problems of blocked orders or interleaved ordersA precise implementation.

Easier to inspect SimStudents learning processes and outcomes.Lacks the precision of a computational theory.#According to the elaboration hypothesis,random practice leads to more distinctive and elaboratememory representations than does blocked practice becauseparticipants use multiple and variable information processingstrategies. Since the different tasks to be learnedreside together in working memory, they can be comparedduring practice (which is not possible under blocked conditions),increasing the level of distinctiveness. Also, theuse of different encoding strategies supposedly leads to amore elaborate memorial representation than does the impoverishedencoding under blocked conditions. The moredistinctive and elaborate representation of the skill afterrandom practice is assumed to be responsible for the moreeffective retention and transfer performance.

According to the reconstruction hypothesis (Lee &Magill, 1983, 1985), on the other hand, the CI created byrandom practice leads to forgetting of the action plan, or188 WULF AND SHEA motor program (Magill & Hall, 1990), owing to the interferenceof the interspersed tasks. Random practice, therefore,necessitates repeated reconstructions of the motorprogram that are not necessary under blocked practiceconditions, since the motor program is already in workingmemory. The repeated action plan reconstructions in randompractice are supposed to be responsible for the learningadvantages, as compared with blocked practice.3A Brief Review of SimStudentA learning agent thatAcquires production rulesFrom examples and problem-solving experienceGiven a perceptual representation, a set of feature predicates and operator functions

Matsuda et al., CogSci-09#Image shows one of the current applications of SimStudent., which is as a Teachable Agent whereby algebra students learn by teaching SimStudent .Applied to other domains.

http://www.youtube.com/watch?v=LbjLBRjzTsI&feature=channel_video_title4SimStudent Learns Production RulesSkill divide (e.g. -3x = 6)

Retrieval path:Left side (-3x)Right side (6)Precondition:Left side (-3x) does not have a constant term

=>Function sequence:Get-coefficient (-3) of left side (-3x)Divide both sides by the coefficient#The actual production rule is in LISP formatSimStudent with strong operatorsPerhaps better to indicate just has-constant as the prior knowledge given to SimStudent. The -3x and not are part of what gets learned, not the prior knowledge5

SimStudent Learns Production RulesSkill divide (e.g. -3x = 6)

Retrieval path:Left side (-3x)Right side (6)Precondition:Left side (-3x) does not have a constant term

=>Function sequence:Get-coefficient (-3) of left side (-3x)Divide both sides by the coefficientPrior knowledge

Retrieval path:Perceptual representationPrecondition:Feature predicatesE.g., (not (has-constant -3x))=>Function sequence:Operator functionsE.g., (coefficient -3x), (divide -3)#The actual production rule is in LISP formatSimStudent with strong operatorsPerhaps better to indicate just has-constant as the prior knowledge given to SimStudent. The -3x and not are part of what gets learned, not the prior knowledge6

Retrieval Path LearnerA perceptual learnerFinding paths to identify useful information (percepts) in GUIE.g. Specific generalE.g. Cell 21 Cell 2? Cell ??The most specific path that covers all of the training percepts

Retrieval path:Left side (-3x)Right side (6)

#Any row is also possibleYou should point out how the rule learned from the first example will not generalize to the second example, but after the second example it will generalize to ??the cells just above the target cell?? [or whatever it is]Say also: In other circumstances, SimStudent might need to look more than one row up (e.g., in a geometry proof).7Precondition LearnerA feature test learnerAcquiring the precondition of the production ruleGiven a set of feature predicatesA boolean function that describes relations among objectsE.g. (has-coefficient -3x), (has-constant 2x+5)Utilize FOIL (Quinlan, 1990)Input:Positive and negative examples based on the percepts

E.g. positive: , negative: Output:A set of feature tests thatdescribe the desired situation to fire the production ruleE.g. (not (has-constant ?percept1))Different problem orders Different intermediate production rules Incorrect rule applications Different negative feedbackPrecondition:Left side (-3x) does not have constant term#This is the learning mechanism that will be involved in the theoretical explanation for interleavingDifferent problem sequences may yield different negative examples.

Not the deep features----- Meeting Notes (5/31/12 13:58) -----Focus more8Function Sequence LearnerAn operator function sequence learnerAcquires a sequence of operator functions to apply in producing the next stepGiven a set of operator functionsE.g. (coefficient -3x), (add-term 5x-5 5)Input:A set of records, Ri = E.g. Output:A sequence of operator functions, op = (op1, op2, opk), that explains all recordsE.g.

(bind ?coef (coefficient ?percepts1)), (bind ?step (divide ?coef))Function sequence:Get-coefficient (-3) of left side (-3x)Divide both sides with the coefficient(coefficient -3x)(divide -3)

-3(divide -3)#9Domain-specific Prior KnowledgeSkill divide (e.g. -3x = 6)

Retrieval path:Left side (-3x)Right side (6)Precondition:Left side (-3x) does not have a constant term=>Function sequence:Get-coefficient (-3) of left side (-3x)Divide both sides by the coefficientOperator functionsDomain-generalBasic skillsUsed across multiple domainsOften known by human studentsE.g. (add 1 2), (copy -3x)Domain-specificMore complicated skillsMay not known by human studentsE.g. (coefficient -3x) Domain-specific operator functionsCan we reduce the need of domain-specific operator functions?#Add-term subsumes addGet coefficient is extracting deep featuresCan we automatically learn those features?10An Example in Algebra

#we can model this as ...Fast learner: -3 (deep feature), Slow learner: 3Underlying Structural KnowledgeFast learner: signed number, Slow learner: minus sign + number11

Representation Learning as Induction of Probabilistic Context-Free Grammar (pCFG)Underlying structure in the problem GrammarFeature Non-terminal symbol in a grammar rulee.g., Expression 1.0, [SignedNumber], VariableRepresentation Parse treeRepresentation learning Grammar inductionStudent errors Incorrect parsingLi, Cohen & Koedinger, ITS-10E.g., Langley, and Stromsten, 2000; Stolcke, 1994; VanLehn and Ball, 1987; Wolff, 1982 #12Learning ProblemInput is a set of feature recognition records consisting ofAn original problem (e.g. -3x)The feature to be recognized (e.g. -3 in -3x)OutputA probabilistic context free grammar (pCFG)A non-primitive symbol in a grammar rule that represents target feature

#A Two-Step PCFG Learning AlgorithmGreedy Structure Hypothesizer:Hypothesizes grammar rules in a bottom-up fashionCreates non-terminal symbols for frequently occurred sequencesE.g. and 3, SignedNumber and VariableViterbi Training Phase:Refines rule probabilitiesOccur more frequently Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

#Sequences of symbols that occur more often together (e.g., - and 3) are more likely to be joined with a non-terminal (high level nodes) in the grammar (like SignedNumber in the grammar tree image). Then, with time, non-terminals that occur more often together are more likely to be jointed with higher level non-terminal (like Expression in the image).14Feature LearningBuild most probable parse treesFor all observation sequencesSelect a non-terminal symbol thatMatches the most training records as the target feature

#Transfer Learning Using Prior KnowledgeGSH Phase:Build parse trees based on some previously acquired grammar rulesThen call the original GSHViterbi Training:Add rule frequency in previous task to the current task

0.660.330.50.5#Effective Learning Using Bracketing ConstraintForce to generate a feature symbolLearn a subgrammar for featureLearn a grammar for whole traceCombine two grammars

#Integrating Representation Learning in SimStudentHow could representation learning affect the performance of an intelligent agent?Can we reduce the need for domain-specific operator functions?Will we get a better model of human students?How can we integrate the acquired representation into SimStudent?Li, Cohen & Koedinger, ITS-12#Example of Production Rules Before and After integrationExtend the Retrieval path in Production RuleOriginal:Skill divide (e.g. -3x = 6)Retrieval path:Left side (-3x)Right side (6)Precondition:Left side (-3x) does not have co