A Machine Learning Approach for Automatic Student Model Discovery

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

A Machine Learning Approach for Automatic Student Model Discovery. Nan Li, N oboru Matsuda, William Cohen, and Kenneth Koedinger Computer Science Department Carnegie Mellon University. Student Model. A set of knowledge components ( KCs ) - PowerPoint PPT Presentation

Text of A Machine Learning Approach for Automatic Student Model Discovery

Hidden Concept Detection in Graph-Based Ranking Algorithm for Personalized Recommendation

A Machine Learning Approach for Automatic Student Model DiscoveryNan Li, Noboru Matsuda, William Cohen, and Kenneth KoedingerComputer Science DepartmentCarnegie Mellon University#1Student ModelA set of knowledge components (KCs)Encoded in intelligent tutors to model how students solve problemsExample: What to do next on problems like 3x=12A key factor behind instructional decisions in automated tutoring systems#Student Model ConstructionTraditional MethodsStructured interviewsThink-aloud protocolsRational analysisPrevious Automated MethodsLearning factor analysis (LFA)Proposed ApproachUse a machine-learning agent, SimStudent, to acquire knowledge1 production rule acquired => 1 KC in student model (Q matrix)Require expert input.Highly subjective.Within the search space of human-provided factors.Independent of human-provided factors.#A Brief Review of SimStudentA machine-learning agent thatacquires production rules fromexamples & problem solving experiencegiven a set of feature predicates & functions

#Image shows one of the current applications of SimStudent., which is as a Teachable Agent whereby algebra students learn by teaching SimStudent .4Production RulesSkill divide (e.g. -3x = 6)

What:Left side (-3x)Right side (6)When:Left side (-3x) does not have constant term=>How:Get-coefficient (-3) of left side (-3x)Divide both sides with the coefficientEach production rule is associated with one KCEach step (-3x = 6) is labeled with one KC, decided by the production applied to that stepOriginal model required strong domain-specific operators, like Get-coefficient Does not differentiate important distinctions in learning (e.g., -x=3 vs -3x = 6)#Deep Feature LearningExpert vs Novice (Chi et al., 1981)Example: Whats the coefficient of -3x?Expert uses deep functional features to reply -3Novice may use shallow perceptual features to reply 3Model deep feature learning using machine learning techniquesIntegrate acquired knowledge into SimStudent learningRemove dependence on strong operators & split KCs into finer grain sizes#Feature Recognition asPCFG InductionUnderlying structure in the problem GrammarFeature Non-terminal symbol in a grammar ruleFeature learning task Grammar inductionStudent errors Incorrect parsing#Learning ProblemInput is a set of feature recognition records consisting ofAn original problem (e.g. -3x)The feature to be recognized (e.g. -3 in -3x)OutputA probabilistic context free grammar (PCFG)A non-terminal symbol in a grammar rule that represents target feature

#A Two-Step PCFG Learning AlgorithmGreedy Structure Hypothesizer:Hypothesizes grammar rules in a bottom-up fashionCreates non-terminal symbols for frequently occurred sequencesE.g. and 3, SignedNumber and VariableViterbi Training Phase:Refines rule probabilitiesOccur more frequently Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

#Sequences of symbols that occur more often together (e.g., - and 3) are more likely to be joined with a non-terminal (high level nodes) in the grammar (like SignedNumber in the grammar tree image). Then, with time, non-terminals that occur more often together are more likely to be jointed with higher level non-terminal (like Expression in the image).9Example of Production Rules Before and After integrationExtend the What Part in Production RuleOriginal:Skill divide (e.g. -3x = 6)What:Left side (-3x)Right side (6)When:Left side (-3x) does not have constant term=>How:Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3)Extended:Skill divide (e.g. -3x = 6)What:Left side (-3, -3x)Right side (6)When:Left side (-3x) does not have constant term=>How:Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3) Fewer operators Eliminate need for domain-specific operators#Original:Skill divide (e.g. -3x = 6)What:Left side (-3x)Right side (6)When:Left side (-3x) does not have constant term=>How:Get coefficient (-3) of left side (-3x)Divide both sides with the coefficient (-3)#Experiment MethodSimStudent vs. Human-generated modelCode real student data71 students used a Carnegie Learning Algebra I Tutor on equation solvingSimStudent:Tutored by a Carnegie Learning Algebra I TutorCoded each step by the applicable production ruleUsed human-generated coding in case of no applicable productionHuman-generated model:Coded manually based on expertise#Human-generated vs SimStudent KCsHuman-generated ModelSimStudentCommentTotal # of KCs1221# of Basic Arithmetic Operation KCs413Split into finer grain sizes based on different problem forms# of Typein KCs44Approximately the same# of Other Transformation Operation KCs (e.g. combine like terms)44Approximately the same#How well two models fit with real student dataUsed Additive Factor Model (AFM)An instance of logistic regression thatUses each student, each KC and KC by opportunity interaction as independent variablesTo predict probabilities of a student making an error on a specific step

#divide1111111111simSt-divide1111111000simSt-divide-10000000111An Example of Split in DivisionHuman-generated Modeldivide: Ax=B & -x=ASimStudentsimSt-divide: Ax=BsimSt-divide-1: -x=AAx=B-x=A#Production Rules for DivisionSkill simSt-divide (e.g. -3x = 6)What:Left side (-3, -3x)Right side (6)When:Left side (-3x) does not have constant termHow:Divide both sides with the coefficient (-3)Skill simSt-divide-1 (e.g. -x = 3)What:Left side (-x)Right side (3)When:Left side (-x) is of the form -vHow:Generate one (1)Divide both sides with -1

#An Example without Spit in Divide TypeinHuman-generated Modeldivide-typeinSimStudentsimSt-divide-typeindivide-typein111111111simSt-divide-typin111111111#SimStudent vs SimStudent + Feature LearningSimStudentNeeds strong operatorsConstructs student models similar to human-generated modelExtended SimStudent Only requires weak operatorsSplit KCs into finer grain sizes based on different parse treesDoes Extended SimStudent produce a KC model that better fits student learning data?#ResultsHuman-generated ModelSimStudentAIC652964483-Fold Cross Validation RMSE0.40340.3997Significance TestSimStudent outperforms the human-generated model in 4260 out of 6494 stepsp < 0.001SimStudent outperforms the human-generated model across 20 runs of cross validationp < 0.001#Human-generated + division split stats are AIC 6509.92 and Cross Validation RMSE* 0.401899

19SummaryPresented an innovative application of a machine-learning agent, SimStudent, for an automatic discovery of student models.Showed that a SimStudent generated student model was a better predictor of real student learning behavior than a human-generate model.#20Future StudiesTest generality in other datasets in DataShop

Apply this proposed approach in other domainsStoichiometryFraction addition

#21Thank you! #An Example in Algebra

#we can model this as ...Fast learner: -3 (deep feature), Slow learner: 3Underlying Structural KnowledgeFast learner: signed number, Slow learner: minus sign + number23Feature Recognition asPCFG InductionUnderlying structure in the problem GrammarFeature Non-terminal symbol in a grammar ruleFeature learning task Grammar inductionStudent errors Incorrect parsing#Learning ProblemInput is a set of feature recognition records consisting ofAn original problem (e.g. -3x)The feature to be recognized (e.g. -3 in -3x)OutputA probabilistic context free grammar (PCFG)A non-terminal symbol in a grammar rule that represents target feature

#A Computational Model of Deep Feature LearningExtended a PCFG Learning Algorithm (Li et al., 2009)Feature LearningStronger Prior Knowledge:Transfer Learning Using Prior Knowledge#A Two-Step PCFG Learning AlgorithmGreedy Structure Hypothesizer:Hypothesizes grammar rules in a bottom-up fashionCreates non-terminal symbols for frequently occurred sequencesE.g. and 3, SignedNumber and VariableViterbi Training Phase:Refines rule probabilitiesOccur more frequently Higher probabilitiesGeneralizes Inside-Outside Algorithm (Lary & Young, 1990)

#Sequences of symbols that occur more often together (e.g., - and 3) are more likely to be joined with a non-terminal (high level nodes) in the grammar (like SignedNumber in the grammar tree image). Then, with time, non-terminals that occur more often together are more likely to be jointed with higher level non-terminal (like Expression in the image).27Feature LearningBuild most probable parse treesFor all observation sequencesSelect a non-terminal symbol thatMatches the most training records as the target feature

#Transfer Learning Using Prior KnowledgeGSH Phase:Build parse trees based on some previously acquired grammar rulesThen call the original GSHViterbi Training:Add rule frequency in previous task to the current task

0.660.330.50.5#

! "

#$%&'($)% *&+,-.

($)%-/*&+,-.

0"1.-''$2%

34.$4,5-

! "

#$%&'($)%

*&+,-.

(6

0"1.-''$2%

34.$4,5-

Nv=N Nv=N Nv=N N=Nv N=Nv Nv=N N=Nv v=N v=N N=v0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Problem Abstractions

Erro

r Rat

e

Real StudentHumangenerated ModelSimStudent Model

Nv/N=N/N Nv/N=N/NNv/N=N/N Nv/N=N/N N/N=Nv/N N/N=Nv/N v=N/N v=N/N N/N=Nv/N0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Problem Abstractions

Erro

r Rat

e

Real StudentHumangenerated ModelSimStudent Model

! "

#$%&'($)% *&+,-.

($)%-/*&+,-.

0"1.-''$2%

34.$4,5-

! "

#$%&'($)%

*&+,-.

(6

0"1.-''