39
1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning Department Carnegie Mellon University July 6, 2006

1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

1

Feature Discovery in the Context of Educational Data Mining:

An Inductive Approach

Andrew Arnold, Joseph E. Beck and Richard ScheinesMachine Learning Department

Carnegie Mellon UniversityJuly 6, 2006

Page 2: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

2

Contributions

• Formulation of feature discovery problem in educational data mining domain

• Introduction and evaluation of algorithm that:– Discovers useful, complex features– Incorporates prior knowledge– Promotes the scientific process– Balances between predictiveness and interpretability

Page 3: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

3

Outline• The Big Problem

– Examples– Why is it hard?

• An Investigation– Details of Our Experimental Environment– Lessons Learned from Investigational Experiments

• Our Solution– Algorithm– Results

• Conclusions and Next Steps

Page 4: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

4

Problem: Features (Conceptual)• Many spaces are too complex to deal with

• Features are ways of simplifying these spaces by adding useful structure– Domain: raw data features– Vision: pixels 3-d objects– Speech: frequencies + waves phonemes– Chess: board layout king is protected / exposed

Page 5: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

5

Problem: Features (Example)• A poker hand consists of 5 cards drawn from a deck of

52 unique cards. This is the raw data.– This yields (52 choose 5) = 2,598,960 unique hands

• onePair is a possible feature of this space.– There are 1,098,240 different ways to have exactly one pair– Thus, with a single feature, we have reduced the size of our

space by over 40%

• But onePair is only one of many, many possible features: – twoPair (123,552), fullHouse (3744), fourOfaKind (624)

• Not all features are useful, most are not:– 3spades, oneAce_and_OneNine, 3primeCards

• Need an efficient way to find useful features

Page 6: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

6

Problem: Models (Example)

• Given features, still need a model– For poker, the model is simple because it is explicit:

• Features are ranked. Better features Better chance of winning

• Educational example: – Given features SATmathScore, preTestScore, and curiosity

– Want to predict finalExamScore

SATmathScore

preTestScore

Curiosity

finalExamScore Model

Linear regression Neural net Etc.

Page 7: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

7

Problem: Operationalizing FeaturesBut how to operationalize the feature curiosity?

• Each possible mapping of raw data into curiosity (e.g., curiosity_1, curiosity_2, curiosity_3), increases the space of models to search.

SATmathScore

preTestScore

Curiosity_1

finalExamScore Model

Linear regression Neural net Etc.

SATmathScore

preTestScore

Curiosity_2

finalExamScore Model

Linear regression Neural net Etc.

SATmathScore

preTestScore

Curiosity_3

finalExamScore Model

Linear regression Neural net Etc.

Etc….

Page 8: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

8

Our Research problem SATmathScore

preTestScore

Curiosity_1

finalExamScore Model

Linear regression Neural net Etc.

SATmathScore

preTestScore

Curiosity_2

finalExamScore Model

Linear regression Neural net Etc.

SATmathScore

preTestScore

Curiosity_3

finalExamScore Model

Linear regression Neural net Etc.

Raw data

Page 9: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

9

Details of Our Environment IData & Course

• On-line course teaching causal reasoning skills– consists of sixteen modules, about an hour per module

• The course was tooled to record certain events:– Logins, page requests, self assessments, quiz attempts, logouts

• Each event was associated with certain attributes:– time– student-id– session-id

Page 10: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

10

What We’d Like to Be Able to Do

• Raw data– At 17:51:23 student jd555 requested page causality_17

– At 17:51:41 student rp22 began simulation sim_3

– At 17:51:47 student ap29 finished quiz_17 with score 82%

• Feature– Student jd555 is five times as curious as ap29

• Model– For every 5% increase in curiosity, student quiz performance

increases by 3%.

Page 11: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

11

Details of Our Environment IIModels & Experiments

• Wanted to find features associated with engagement and learning.

• For engagement, used the amount of time students spent looking at pages in a module.

• For learning, looked at quiz scores.• For all experiments, only looked at linear

regression models.

Page 12: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

12

Lesson 1: Obvious Ideas Don’t Always Work

• To measure engagement, examined the amount of time a user spent on a page

• To predict this time, used three features:– student: mean time this student spent per page

– session: mean time spent on pages during this session

– page: mean time spent by all students on this page

• Which features would you guess would be most significant for predicting time spent on a page?

Page 13: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

13

Lesson 1: Obvious Ideas Don’t Always Work

• To measure engagement, examined the amount of time a user spent on a page

• To predict this time, used three features:– student: mean time this student spent per page

– session: mean time spent on pages during this session

– page: mean time spent by all students on this page

• Which features would you guess would be most significant for predicting time spent on a page?– Our belief was: page > student >> session

Page 14: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

14

Turns Out Session Trumps User

• In fact, given session, student had no effect.

• R-squared of a linear model using:– student = 4.8%– page = 16.6%– session = 19.9%– session + student = 19.9%– session + page = 31.4%– session + page + student = 31.5%

Page 15: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

15

Lesson 2: Small Differences in Features Can Have Big Impact

• self_assessments measures the number of optional self assessment questions a student attempted.

• How well would this feature predict learning?• To measure this, we needed an outcome feature that

measured performance• Our idea was to look at quiz scores.• But what, exactly, is a quiz score?

– Students can take a quiz up to three times in a module.– Should we look at the maximum of these scores? The mean?

Page 16: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

16

Only Max Score Mattered

• Max is significant, but mean is not.• Yet max and mean are both encompassed by the term “quiz score”

– Researchers should not be expected to make such fine distinctions

p-value : .504p-value : .036

Score vs self_assessments

Self_assessments (normed) Self_assessments (normed)

Page 17: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

17

Automation

• Given these lessons, how can we automate the process?– Enumeration

• Costly, curse of dimensionality

– Principle component analysis, kernels• Interpretation

Page 18: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

18

Challenges

• Defining and searching feature space– Expressive enough to discover new features

• Constraining and biasing– Avoid nonsensical or hard to interpret features

Page 19: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

19

Algorithm

• Start with small set of core features

• Iteratively grow and prune this set– Increase predictiveness– Preserve scientific and semantic interpretability

Page 20: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

20

Architecture

Page 21: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

21

Experiment

• Can we predict student’s quiz score using features that are:– Automatically discovered– Complex– Predictive– Interpretable

Page 22: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

22

Raw Data

NAME DESCRIPTION

User_id (Nominal) Unique user identifier

Module_id (Nominal) Unique module identifier

Assess_quiz (Ordinal) Number of self-assessment quizzes taken by this user in this module

Assess_quest (Ordinal) Number of self-assessment questions taken by this user in this module. Each self-assessment quiz contains multiple self-assessment questions.

Quiz_score (Ordinal) (Dependent variable) % of quiz questions answered correctly by this student in this module. In each module, students were given the chance to take the quiz up to three times. The max of these trials was taken to be quiz_score.

Page 23: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

23

Sample Data

User_id Module_id

Assess quiz

Assess quest

Quizscore

Alice module_1 12 27 86

Bob module_1 14 31 74

Alice module_2 18 35 92

Bob module_2 13 25 87

Page 24: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

24

Predicates

• A logical statement applied to each row of data– Selects subset of data which satisfies it

• Examples: User_id = Alice

Module_id = 1

Page 25: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

25

Calculators

• A function applied to a subset of data– Calculated over a certain field in the data

• Incorporates bias and prior knowledge:– E.g. Timing effects are on log scale

• Examples:– Mean(Assess_quiz)– Log(Quiz_score)

Page 26: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

26

Candidate Features

• Predicate + Calculator = New Feature– Predicate: User_id = Alice, User_id = Bob– Calculator: Mean(Assess_quiz)– Feature: Mean assess quizzes for each user

X1: User_id

X2:Module_id

X3:Assess quiz

F: Mean Assess Quiz

Y: QuizScore

Alice module_1 12 13 86

Bob module_1 14 13 74

Alice module_2 18 15.5 92

Bob module_2 13 15.5 87

Page 27: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

27

Models

• Complexity is in feature space– Put complicated features in simple model

• Linear and logistic regression

Page 28: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

28

Scoring & Pruning I

• Exhaustive search impractical

• Partition predicates and calculators semantically– Allows independent, greedy search

• Fast Correlation-Based Filtering [Yu 2003]

– Prevents unlikely features: mean_social_security_number

• Select b best from each category, and pool

Page 29: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

29

Scoring & Pruning II

• Features graded on:– Predictiveness: R2

– Interpretability: Heuristics based on experts and literature• Depth of nesting

• Assigned “interpretability score” of predicates and calculators– E.g. Sum more interpretable than SquareRoot

• Select k best features to continue– k regularizes run-time, memory and depth of search

Page 30: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

30

Iteration & Stopping Criteria

• Full model is evaluated after each iteration• Stopping conditions:

– Cross-validation performance

– Hard cap on processor time or iterations

• If met:– Stop and return discovered features and model

• If not:– Iterate again, using current features as seeds for next step

Page 31: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

31

Results

• Two main goals:– Machine Learning:

• Discover features predictive of student performance

– Scientific Discovery:• Discover interpretable features incorporating prior

scientific knowledge

Page 32: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

32

Machine Learning

• 24 students, 15 modules, 203 quiz scores:– Predict: quiz_score

– Given initial features: • user_id, assess_quiz, assess_quest

• Learn features and regression coefficients on training data, test on held out data

• 38% improvement in R2 of discovered features over baseline regression on initial features

Page 33: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

33

Summary of Features

Page 34: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

34

Scientific Discovery

• Interpretation of features and model– mean_assess_quizzes_per_user

• Introspectivness of student

• Intuitively negatively correlated with quiz score– Less mastery insecurity self assessment poor quiz

• Mean_assess_quest_per_user should be similarly correlated– In fact, regression coefficients have opposite signs

• Discovered features reaffirm certain intuitions and contradict others

Page 35: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

35

Generality

• Applied same framework to entirely different data and domain:– Effect of tutor interventions on reading

comprehension

• Achieved similarly significant results with no substantial changes to algorithm

Page 36: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

36

Limitations

• Looked at small number of initial features

• To increase feature capacity:– Better partition of features, predicates, calculators– Less greedy search – More expressive, biased interpretability scores

• E.g. Time of day and day of week:Doing homework on Sunday night vs. Friday night

Page 37: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

37

Better & Faster Search

• Want to discover more complicated features– Search more broadly:

• Prune fewer features

– Search more deeply:• Run more iterations

• Decomposable feature scores:– Reuse computation

• Smoother feature space parameterization:– Efficient, gradient-like search

Page 38: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

38

Conclusions

• Algorithm discovers useful, complex features– Elucidates underlying structure– Hides complexity

• Promotes scientific process– Tests hypotheses and suggests new experiments– Incorporates prior scientific knowledge [Pazzani 2001]– Results are interpretable and explainable, and still predictive

• Balances between predictiveness and interpretability– Careful definition and partitioning of feature space– Search balances biased, score-based pruning with exploration

Page 39: 1 Feature Discovery in the Context of Educational Data Mining: An Inductive Approach Andrew Arnold, Joseph E. Beck and Richard Scheines Machine Learning

39

ReferencesArnold, A., Beck, J. E., Scheines, R. (2006). Feature Discovery in the Context of Educational Data Mining: An Inductive Approach. In Proceedings of the AAAI2006 Workshop on Educational Data Mining, Boston, MA.

Pazzani, M. J., Mani, S., Shankle, W. R. (2001). Acceptance of Rules Generated by Machine Learning among Medical Experts. Methods of Information in Medicine, 40:380--385.

Yu, L. and Liu, H. (2003). Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In Proceedings of The Twentieth International Conference on Machine Leaning , 856--863.

Thank You

¿ Questions ?