23
Concept Graph Learning from Educational Data Yiming Yang, Hanxiao Liu, Jaime Carbonell and Wanli Ma School of Computer Science Carnegie Mellon University February 3, 2015 WSDM 2015 Concept Graph Learning from Educational Data 1

Concept Graph Learning from Educational Datahanxiaol/slides/wsdm2015-liu.pdf · Concept Graph Learning from Educational Data Yiming Yang, ... Introduction Motivation ... Cybernetics

Embed Size (px)

Citation preview

Concept Graph Learning from Educational Data

Yiming Yang, Hanxiao Liu, Jaime Carbonell and Wanli Ma

School of Computer ScienceCarnegie Mellon University

February 3, 2015

WSDM 2015 Concept Graph Learning from Educational Data 1

Outline of the Talk

MotivationConcept Representation SchemesConcept Graph LearningExperiments & Empirical ResultsFuture Work

WSDM 2015 Concept Graph Learning from Educational Data 2

IntroductionMotivation

Scenario: Massive course materials are online available fromdifferent course providers

Universities, Coursera, Edx, MIT OpenCourseWare ...Challenge: How to integrate the scattered information?

A CMU graduate: “After completing courses A, B on Coursera,what course shall I take next at CMU?”

Lack a method to measure the course overlapping and thecourse prerequisite relations across institutions.

We address this by putting cross-institutional courses under acanonical language—concept.

WSDM 2015 Concept Graph Learning from Educational Data 3

IntroductionConcept Graph Learning

E&M

Differential Eq

Algorithms

Num Analysis

Matrix AQuantum

Calculus

Mechanics

Java Prog

Matrix A Topology

Scalable Algs

Courses in University 1 Courses in University 2

Universal Concepts (e.g. Wikipedia Topics)

Goal: Learning a universal graph of concepts based on1 Course-level prerequisite relations2 Concept representation of courses

WSDM 2015 Concept Graph Learning from Educational Data 4

Outline of the Talk

MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work

WSDM 2015 Concept Graph Learning from Educational Data 5

Representation SchemesWord-based Representation

⇓ crawling and parsing<course><id>6.080</id><name>Great Ideas in Theoretical Computer Science</name><tag>Electrical Engineering and Computer Science</tag><description>This course provides a challenging introduction to some of the central ideas of

theoretical computer science. It attempts ... </description><keywords>computer science,theoretical computer science,logic,turing machines,computability,

finite automata,godel,complexity,polynomial time,efficient algorithms ... </keywords><calendar>Introduction # Logic # Circuits and finite automata # Turing machines # Reducibility

and Godel # Minds and machines # Complexity # Polynomial time # P and NP # NP-completeness # NP-completeness in practice # Space complexity and more ... </calendar>

</course>

Concepts def= WordsVocabularies not controlled

CMU 10-715: shattering coefficient; MIT 15.097: growth function

Words are in multiple granularities =⇒ interpretability ↓WSDM 2015 Concept Graph Learning from Educational Data 6

Representation SchemesCategory-based Representation

Bag of Words Bag of Wikipedia Categories

Wikipedia Classifier=⇒

Concepts def= Wikipedia CategoriesImproved interpretabilityControlled vocabulary at the right granularity

Shattering number, Growth function Wikipedia Classifier=⇒Computational Learning Theory

Leverage oceans of knowledge in Wikipedia

WSDM 2015 Concept Graph Learning from Educational Data 7

Representation SchemesLatent Space Representation

Schemes based on dimensionality reductionSparse Coding of Words

Trained on the given courses—purely unsupervisedDistributed Word Embedding

Trained on Wikipedia articles—leverages exterior info

Concepts def= Dimensionality-reduced vectorsControlled “vocabulary”

Words are mapped onto a unified latent spaceConcept granularity can be controlled by latent dimensionalityHard to interpret

WSDM 2015 Concept Graph Learning from Educational Data 8

Outline of the Talk

MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work

WSDM 2015 Concept Graph Learning from Educational Data 9

Problem Formulation

E&M

Differential Eq

Algorithms

Num Analysis

Matrix AQuantum

Calculus

Mechanics

Java Prog

Matrix A Topology

Scalable Algs

Courses in University 1 Courses in University 2

Universal Concepts (e.g. Wikipedia Topics)

Observed course-level relations OConcept representation of courses X n by p

Concept graph A p by p

How to evaluate A?1 Map A to an estimated course graph Θ (n by n) via X .2 Then, evaluate the quality of Θ with O.

WSDM 2015 Concept Graph Learning from Educational Data 10

Problem Formulation

How to map the concept graph A to an estimated course graph Θ?

—through a bilinear form:

Θ def= XAX>

Explanation:θij

def=∑u,v

auvxiuxjv

θij : strength from course j to course iauv : strength from concept v to concept u

Each course-level prerequisite θij is defined as the cumulative effectof multiple concept-level prerequisites

∑u,v auvxiuxjv

WSDM 2015 Concept Graph Learning from Educational Data 11

Problem Formulation

How to evaluate the estimated course graph Θ with our observedcourse-level relations O?

i.e. how to define the loss function over Θ w.r.t. O?

ProblemOnly positive examples are availableTreating unobserved course relations as negative examplesleads to highly skewed label set

Solution: rankingWe hope θij > θik if j ∈ prereq (i) and k 6∈ prereq (i)

WSDM 2015 Concept Graph Learning from Educational Data 12

OptimizationOptimization Objective

CGL objective with p2 variables

minA∈Rp×p

C∑

(i,j,k)` (θij − θik) + 1

2‖A‖2F

s.t. Θ = XAX>

ProblemA can be a huge dense matrix (e.g. is 15,396 by 15,396 forwords-based concept representation)Dual space? #dual variables = O

(n3)

Solution: derive an equivalent optimization problem with only n2

(n2 � p2) variables.

WSDM 2015 Concept Graph Learning from Educational Data 13

OptimizationVariable Reduction

Introduce slack variable S ∈ Rn×n for constraint Θ = XAX>.The Lagrangian is

L = C∑

(i,j,k)` (θij − θik) + 1

2‖A‖2F +

⟨S,Θ−XAX>

⟩∂L∂A should vanish at the stationary point

=⇒ A∗ = X>S∗>XA∗ ∈ Rp×p only has n2 (n2 � p2) degrees of freedom!

Equivalent CGL objective with n2 variables

minS∈Rn×n

C∑

(i,j,k)` (θij − θik) + 1

2tr(ΘS>

)s.t. Θ = KSK

WSDM 2015 Concept Graph Learning from Educational Data 14

OptimizationLoss Function & Optimization Solver

We choose the squared hinge loss ` (x) = (max (1− x, 0))2

large-margin property: strong generalization abilitysmoothness: allows Nesterov’s accelerated GD

GD: 37.3min & 1490 iterations on MITaccelerated GD: 3.08 min & 103 iterations on MIT

An alternative of GD: Inexact Newton MethodTo avoid the huge Hessian—use a matrix-free ConjugateGradient to compute the Newton direction

WSDM 2015 Concept Graph Learning from Educational Data 15

Outline of the Talk

MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work

WSDM 2015 Concept Graph Learning from Educational Data 16

ExperimentsDatasets & Evaluation

Table : Datasets Statistics1

University Department # Courses # Prerequisites # WordsMIT2 * 2322 1173 15396

Caltech * 1048 761 5617CMU CS, STATS 83 150 1955

Princeton MATH 56 90 454

Metrics for evaluation: MAP and AUC

1available at http://nyc.lti.cs.cmu.edu/teacher/dataset/2MIT OpenCourseWare http://ocw.mit.edu/index.htm

WSDM 2015 Concept Graph Learning from Educational Data 17

ExperimentsComparison among Concept Representation Schemes

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(a) CGL.Rank on MIT Data

AUC MAP

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(b) CGL.Rank on Caltech Data

AUC MAP

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(c) CGL.Rank on CMU Data

AUC MAP

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(d) CGL.Rank on Princeton Data

AUC MAP

Words � Categories � Sparse Coding � Distributed Word Embedding

WSDM 2015 Concept Graph Learning from Educational Data 18

ExperimentsCross-institutional Prerequisite Prediction

A good concept graph should be universal, thus should betransferable across different institutions

0.00

0.20

0.40

0.60

0.80

1.00

MIT

.MIT

Calte

ch.M

IT

Calte

ch.C

alte

ch

MIT

.Calte

ch

CMU.C

MU

MIT

.CMU

Calte

ch.C

MU

Prin

ceto

n.Pr

ince

ton

MIT

.Prin

ceto

n

Calte

ch.P

rince

ton

MA

P

<Traning Set>.<Test Set>

There is always a performance loss if we go across institutions.We do get good transfer.

WSDM 2015 Concept Graph Learning from Educational Data 19

Empirical ResultsConcept Graph for MIT

Randomness

Integral_calculus

Functional_analysis

Probability_theoryLinear_algebra

Numerical_analysis

Applied_mathematics_stubs

Cybernetics

Mathematical_analysis_stubs

Mathematical_optimization

Differential_geometry

Computational_scienceDynamical_systems

Signal_processing

Fourier_analysis

Operations_research

WSDM 2015 Concept Graph Learning from Educational Data 20

Outline of the Talk

MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work

WSDM 2015 Concept Graph Learning from Educational Data 21

Future Work

Deploying the induced concept graph for personalizedcurriculum planning (on-going work)

Student’s academic background/goal def= bag-of-conceptsFind an optimal sequence of courses?

Cross-language transfer learning by using Wikipedia categories(concepts) as the interlingua.

WSDM 2015 Concept Graph Learning from Educational Data 22

The End

[email protected]

WSDM 2015 Concept Graph Learning from Educational Data 23