Concept Graph Learning from Educational Datahanxiaol/slides/wsdm2015-liu.pdf · Concept Graph Learning from Educational Data Yiming Yang, ... Introduction Motivation ... Cybernetics

Concept Graph Learning from Educational Data

Yiming Yang, Hanxiao Liu, Jaime Carbonell and Wanli Ma

School of Computer ScienceCarnegie Mellon University

February 3, 2015

WSDM 2015 Concept Graph Learning from Educational Data 1

Outline of the Talk

MotivationConcept Representation SchemesConcept Graph LearningExperiments & Empirical ResultsFuture Work


IntroductionMotivation

Scenario: Massive course materials are online available fromdifferent course providers

Universities, Coursera, Edx, MIT OpenCourseWare ...Challenge: How to integrate the scattered information?

A CMU graduate: “After completing courses A, B on Coursera,what course shall I take next at CMU?”

Lack a method to measure the course overlapping and thecourse prerequisite relations across institutions.

We address this by putting cross-institutional courses under acanonical language—concept.


IntroductionConcept Graph Learning

E&M

Differential Eq

Algorithms

Num Analysis

Matrix AQuantum

Calculus

Mechanics

Java Prog

Matrix A Topology

Scalable Algs

Courses in University 1 Courses in University 2

Universal Concepts (e.g. Wikipedia Topics)

Goal: Learning a universal graph of concepts based on1 Course-level prerequisite relations2 Concept representation of courses


Outline of the Talk

MotivationConcept Representation SchemesLearning the Concept GraphExperiments & Empirical ResultsFuture Work


Representation SchemesWord-based Representation

⇓ crawling and parsing<course><id>6.080</id><name>Great Ideas in Theoretical Computer Science</name><tag>Electrical Engineering and Computer Science</tag><description>This course provides a challenging introduction to some of the central ideas of

theoretical computer science. It attempts ... </description><keywords>computer science,theoretical computer science,logic,turing machines,computability,

finite automata,godel,complexity,polynomial time,efficient algorithms ... </keywords><calendar>Introduction # Logic # Circuits and finite automata # Turing machines # Reducibility

and Godel # Minds and machines # Complexity # Polynomial time # P and NP # NP-completeness # NP-completeness in practice # Space complexity and more ... </calendar>

</course>

Concepts def= WordsVocabularies not controlled

CMU 10-715: shattering coefficient; MIT 15.097: growth function

Words are in multiple granularities =⇒ interpretability ↓WSDM 2015 Concept Graph Learning from Educational Data 6

Representation SchemesCategory-based Representation

Bag of Words Bag of Wikipedia Categories

Wikipedia Classifier=⇒

Concepts def= Wikipedia CategoriesImproved interpretabilityControlled vocabulary at the right granularity

Shattering number, Growth function Wikipedia Classifier=⇒Computational Learning Theory

Leverage oceans of knowledge in Wikipedia


Representation SchemesLatent Space Representation

Schemes based on dimensionality reductionSparse Coding of Words

Trained on the given courses—purely unsupervisedDistributed Word Embedding

Trained on Wikipedia articles—leverages exterior info

Concepts def= Dimensionality-reduced vectorsControlled “vocabulary”

Words are mapped onto a unified latent spaceConcept granularity can be controlled by latent dimensionalityHard to interpret


Outline of the Talk



Problem Formulation

E&M

Differential Eq

Algorithms

Num Analysis

Matrix AQuantum

Calculus

Mechanics

Java Prog

Matrix A Topology

Scalable Algs

Courses in University 1 Courses in University 2

Universal Concepts (e.g. Wikipedia Topics)

Observed course-level relations OConcept representation of courses X n by p

Concept graph A p by p

How to evaluate A?1 Map A to an estimated course graph Θ (n by n) via X .2 Then, evaluate the quality of Θ with O.


Problem Formulation

How to map the concept graph A to an estimated course graph Θ?

—through a bilinear form:

Θ def= XAX>

Explanation:θij

def=∑u,v

auvxiuxjv

θij : strength from course j to course iauv : strength from concept v to concept u

Each course-level prerequisite θij is defined as the cumulative effectof multiple concept-level prerequisites

∑u,v auvxiuxjv


Problem Formulation

How to evaluate the estimated course graph Θ with our observedcourse-level relations O?

i.e. how to define the loss function over Θ w.r.t. O?

ProblemOnly positive examples are availableTreating unobserved course relations as negative examplesleads to highly skewed label set

Solution: rankingWe hope θij > θik if j ∈ prereq (i) and k 6∈ prereq (i)


OptimizationOptimization Objective

CGL objective with p2 variables

minA∈Rp×p

C∑

(i,j,k)` (θij − θik) + 1

2‖A‖2F

s.t. Θ = XAX>

ProblemA can be a huge dense matrix (e.g. is 15,396 by 15,396 forwords-based concept representation)Dual space? #dual variables = O

(n3)

Solution: derive an equivalent optimization problem with only n2

(n2 � p2) variables.


OptimizationVariable Reduction

Introduce slack variable S ∈ Rn×n for constraint Θ = XAX>.The Lagrangian is

L = C∑

(i,j,k)` (θij − θik) + 1

2‖A‖2F +

⟨S,Θ−XAX>

⟩∂L∂A should vanish at the stationary point

=⇒ A∗ = X>S∗>XA∗ ∈ Rp×p only has n2 (n2 � p2) degrees of freedom!

Equivalent CGL objective with n2 variables

minS∈Rn×n

C∑

(i,j,k)` (θij − θik) + 1

2tr(ΘS>

)s.t. Θ = KSK


OptimizationLoss Function & Optimization Solver

We choose the squared hinge loss ` (x) = (max (1− x, 0))2

large-margin property: strong generalization abilitysmoothness: allows Nesterov’s accelerated GD

GD: 37.3min & 1490 iterations on MITaccelerated GD: 3.08 min & 103 iterations on MIT

An alternative of GD: Inexact Newton MethodTo avoid the huge Hessian—use a matrix-free ConjugateGradient to compute the Newton direction


Outline of the Talk



ExperimentsDatasets & Evaluation

Table : Datasets Statistics1

University Department # Courses # Prerequisites # WordsMIT2 * 2322 1173 15396

Caltech * 1048 761 5617CMU CS, STATS 83 150 1955

Princeton MATH 56 90 454

Metrics for evaluation: MAP and AUC

1available at http://nyc.lti.cs.cmu.edu/teacher/dataset/2MIT OpenCourseWare http://ocw.mit.edu/index.htm


http://nyc.lti.cs.cmu.edu/teacher/dataset/

http://ocw.mit.edu/index.htm

ExperimentsComparison among Concept Representation Schemes

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(a) CGL.Rank on MIT Data

AUC MAP

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(b) CGL.Rank on Caltech Data

AUC MAP

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(c) CGL.Rank on CMU Data

AUC MAP

0.00

0.20

0.40

0.60

0.80

1.00

Word Cat. SCW DWE

(d) CGL.Rank on Princeton Data

AUC MAP

Words � Categories � Sparse Coding � Distributed Word Embedding


ExperimentsCross-institutional Prerequisite Prediction

A good concept graph should be universal, thus should betransferable across different institutions

0.00

0.20

0.40

0.60

0.80

1.00

MIT

.MIT

Calte

ch.M

IT

Calte

ch.C

alte

ch

MIT

.Calte

ch

CMU.C

MU

MIT

.CMU

Calte

ch.C

MU

Prin

ceto

n.Pr

ince

ton

MIT

.Prin

ceto

n

Calte

ch.P

rince

ton

MA

P

<Traning Set>.<Test Set>

There is always a performance loss if we go across institutions.We do get good transfer.


Empirical ResultsConcept Graph for MIT

Randomness

Integral_calculus

Functional_analysis

Probability_theoryLinear_algebra

Numerical_analysis

Applied_mathematics_stubs

Cybernetics

Mathematical_analysis_stubs

Mathematical_optimization

Differential_geometry

Computational_scienceDynamical_systems

Signal_processing

Fourier_analysis

Operations_research


Outline of the Talk



Future Work

Deploying the induced concept graph for personalizedcurriculum planning (on-going work)

Student’s academic background/goal def= bag-of-conceptsFind an optimal sequence of courses?

Cross-language transfer learning by using Wikipedia categories(concepts) as the interlingua.


The End

[email protected]


Documents

Concept Graph Learning from Educational Datahanxiaol/slides/wsdm2015-liu.pdf · Concept Graph Learning from Educational Data Yiming Yang, ... Introduction Motivation ... Cybernetics