31
COMPANY NAME NAME OF PRESENTER 1 I Edith Cowan University Rachna Dhand, Senior Strategic Information Analyst Application of Predictive Analytics to Higher Degree Research Course Completion Times Application of Decision Theory to PhD Course Completions (2006 – 2013)

Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 1 I

Edith Cowan University

Rachna Dhand, Senior Strategic Information Analyst

Application of Predictive Analytics to Higher Degree Research Course Completion Times

Application of Decision Theory to PhD Course Completions (2006 – 2013)

Page 2: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 2 I

Edith Cowan University

Application of Predictive Analytics to Higher Degree Research Course Completion Times

Higher Degree Research (HDR) students carry significant costs for Universities. Failure of students to complete either on time or at all results in sub optimal resource utilisation and impacts to government grant allocations and ratings. Objective The objective of this project was to analyse the historical completion time for ECU PhD candidates and identify the primary determinants for the same, so that some intervention strategies can be implemented for future students for their timely completion. Methodology Classification Decision Science Models are used to predict the Completion Times for HDR candidates. Some shortlisted Models are CHAID, QUEST and C5.0.

Predicting Higher Degree Research (HDR) Course Completion Time

Page 3: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 3 I

Edith Cowan University

What is Predictive Analytics?

Predicting Higher Degree Research (HDR) Course Completion Time

Page 4: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 4 I

Edith Cowan University

Data Mining Introduction, Modelling and Prediction Accuracy

Use of Data Mining strategies to identify who is at-risk of drop-out or who is likely to take longer time to finish his or her degree is not a new subject in Institutional Research (IR). Explanatory models by regression and path analysis have contributed substantially to our understanding of student retention (Adam and Gaither, 2005; Pascarella and Terenzini, 2005; Braxton, 2000). Though published studies on the use and prediction accuracy of data-mining approaches in IR are few. Luan (2002) explained the application of neural-net and decision tree analysis in predicting the transfer of college students to four-year institutions. Byres Gonzalez and Desjardins (2002) showed neural-network model predicts with better accuracy over Binary Logistic Regression. Prediction Accuracy does not solely depends on the type of model chosen for predictions but is also dependant on the independent variables chosen, their measurement levels and data size.

Predicting Higher Degree Research (HDR) Course Completion Time

Page 5: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 5 I

Edith Cowan University

Predicting Completion Status for Higher Degree Research Candidates

Is it possible to predict the Completion Status of the HDR student given the 1) Variable set of Demographic and Course information 2) Research Experience and Nature of Research Project 3) Faculty and School information 4) Supervisor information .................

Predicting Higher Degree Research (HDR) Course Completion Time

Page 6: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 6 I

HDR Completion Status Prediction Analysis and Modeling Approach

Analysis Modelling

The main objective of the model is to predict the Completion Time for future applicants where Completion Time is defined as the period spent by the candidate since commencement of the HDR degree till the completion of the degree.

The analysis takes into account the PhD students from 2006 to 2013 with their research performance and demographic information from all faculties of ECU as the history data set. The completion time is estimated.

The research performance and demographic variables are pooled through IVA to filter the most correlated variables with Completion Time and finalized for Modeling dataset.

Objective

Analysis

Information Value Analysis (IVA)

2

3

4

1

Edith Cowan University

Predicting Higher Degree Research (HDR) Course Completion Time

The dataset and estimated Completion time are modeled using Decision based Predictive Modeling. The tested models are C5.0, CHAID and QUEST, following classification based paradigm for modeling and scoring.

Methodology

Page 7: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 7 I

HDR Completion Status Prediction

Completion Time Prediction Modelling

Historical data analysis filters PhD candidacy outcomes for ECU with reference years from 2006 to 2012.

Information Value Analysis screens out the primary determinants correlated with Completion Time that is to be targeted with Model Building.

Target definition is based on the Historical outcome and Model building is initiated using classification Models CHAID, C5.0 and Quest.

The Model with best result and most accurate emulation of the actual target is chosen to score the future candidates

1

2

3

4

Predicting Higher Degree Research (HDR) Course Completion Time

Edith Cowan University

Page 8: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 8 I

Edith Cowan University

Predicting Completion Status for Higher Degree Research Candidates

Historical Analysis and Understanding Data used for Prediction...............

Predicting Higher Degree Research (HDR) Course Completion Time

Page 9: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER

HDR Cohort Analysis Count by Citizenship (2006 – 2012)

Predicting Higher Degree Research (HDR) Course Completion Time

Note: The small cohort size for HDR Enrolments poses constraints to Modelling process thereby making classification models more suitable for building and training process.

Edith Cowan University

Page 10: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER

HDR Candidacy Time Distribution Domestic Enrolled Candidates (2006 – 2013)

Predicting Higher Degree Research (HDR) Course Completion Time

Candidacy Time is the time spent by the student since commencement of the PhD degree till he or she reaches the final state of completion or discontinuity of the degree or stay enrolled for longer duration.

Edith Cowan University

Page 11: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER

HDR Course Attempt Distribution by Final Outcome Domestic Vs International

Predicting Higher Degree Research (HDR) Course Completion Time

Edith Cowan University

Page 12: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER

HDR Completion Status Estimation Scope of Modeling

Predicting Higher Degree Research (HDR) Course Completion Time

HDR Research Report Data (Set of 45 variables related to

student Research Experience and Candidature Progress)

Domestic PhD + International PhD Candidates Only

Student Course Details

(Set of 215 variables)

Target Definition for

Modelling (Completion Outcome)

Student Demographic and Course Information

Modelling Dataset

Training Dataset (90%) (2006 – 2012)

Testing & Scoring Dataset (10%) (2013)

C5.0, CHAID or QUEST Decision Tree Model

Discard Inactive and Intermittent Status

Research Data (Milestone status, ABS

Research Classifications and Scholarship Status data)

Edith Cowan University

Page 13: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER

HDR Completion Status Estimation Target Definition

Predicting Higher Degree Research (HDR) Course Completion Time

Edith Cowan University

1. WILL_COMPLETE ( Candidacy <= 4 Years)

2. WILL_COMPLETE_LATE (Candidacy > 4 Years)

3. STILL_ENROLLED (Candidacy > 3.5 Years)

4. WLL_DISCONTINUE (Attrition Flags set for all Teaching Periods)

5. IMMATURE VINTAGE (Candidacy < 3.5 Years) [Discarded]

date_years_difference ( D_COURSE_COMMENCEMENT_DT, D_COURSE_COMPLETION_

DT)

T_COURSE_ATTEMPT_STATUS matches "ENROL*" then 2013 - D_INTAKE_PERIOD

date_years_difference (D_COURSE_COMMENCEMENT_DT,T_COURSE_DISCONTINUED_DT

Actual Candidacy Status Target: T_ATTEMPT_STATUS Completion Time (Calculated in Years)

Page 14: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 14 I

Edith Cowan University

Predicting Completion Status for Higher Degree Research Candidates

Using Classification Models for Prediction......................

Predicting Higher Degree Research (HDR) Course Completion Time

Page 15: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 15 I

Edith Cowan University

Decision Tree Models Due to non-linear relationships of indicator with Target and having a nominal target outcome, Decision Tree Models were selected for predicting the Completion Time outcomes for currently enrolled domestic students as well as International students. The outcome from the Model is: 1)Target Prediction (STILL_ENROLLED, WILL_COMPLETE,WILL_DISCONTINUE, WILL_COMPLETE_LATE). 2)Confidence Score for each enrolled student (ranges between 0 and 1). Rule Induction is basically categorised into: C5.0 , Chi-Square Automatic Interaction Detection (CHAID), QUEST and Classification and Regression (C&R )Tree. C5.0 Model handles Nominal or Flag targets with All Predictor categories (nominal, Continuous, or Flag).

Predicting Higher Degree Research (HDR) Course Completion Time

Page 16: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 16 I

Edith Cowan University

Predicting Higher Degree Research (HDR) Course Completion Time

Decision Tree Models Model Criteria C5.0 CHAID QUEST

Type of Split for Categorical Targets

Multiple Multiple Binary

Continuous Target No Yes No

Continuous Predictors

Yes No Yes

Criteria for Predictor Selection

Information Measure

Chi-Square F-Test for Continuous

Statistical

Supports Bagging/Boosting

Yes Yes Yes

Page 17: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 17 I

CHAID

CHAID performs Chi-Square tests for Predictor Importance and Variable Reduction. The test preferably gives higher importance to continuous variables rather than nominal or categorical.

Predictor Importance F-Test Association with Target

Predicts the target with Best Accuracy .

Edith Cowan University

Predicting Higher Degree Research (HDR) Course Completion Time

Milestones Achieved

School Name

Load Completed

Field of Education

Course Fraction Completed

Meeting Frequency

Basis of Admission

Research Literature

Funding Category Changed

Annual Leaves Availed

Page 18: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 18 I

C5.0

C5.0 performs Information Value (IV) and Weight of Evidence (WoE) Method for Variable Reduction. While WoE analyzes the predictive power of a variable in relation to the targeted outcome. IV assesses the overall predictive power of the variable being

considered.

Predictor Importance Information Value Analysis

Edith Cowan University

Predicting Higher Degree Research (HDR) Course Completion Time

Course Fraction Completed

Literature Review Feedback

Field of Education

NESB Indicator

Age at Enrolment

Mode of Attendance

Milestone Achieved

Page 19: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER

Completion Status Modeling Decision Tree Models Used

Completion Status

Estimation

C5.0 Decision Tree Model

QUEST CHAID

Predicting Higher Degree Research (HDR) Course Completion Time

Edith Cowan University

Page 20: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 20 I

Edith Cowan University

C 5.0 Decision Tree Model

Predicting Higher Degree Research (HDR) Course Completion Time

Actual Candidature Status Predicted Candidature Status

STILL_ENROLLED 18.0% 101 STILL_ENROLLED 33.51% 188 WILL_COMPLETE 14.08% 79 WILL_COMPLETE 8.91% 50 WILL_COMPLETE_LATE 14.8% 83 WILL_COMPLETE_LATE 4.46% 25 WILL_DISCONTINUE 53.12% 298 WILL_DISCONTINUE 53.12% 298

Target Following

Page 21: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 21 I

Edith Cowan University

C 5.0 Decision Tree Model

Predicting Higher Degree Research (HDR) Course Completion Time

Model Evaluation & Analysis

Results for output field T_ATTEMPT_STATUS Comparing $C-T_ATTEMPT_STATUS with T_ATTEMPT_STATUS Correct 409 72.91% Wrong 152 27.09% Total 561 Performance Evaluation STILL_ENROLLED 1.0 WILL_COMPLETE 1.545 WILL_COMPLETE_LATE 1.636 WILL_DISCONTINUE 0.515 Confidence Values Report for $CC-T_ATTEMPT_STATUS Range 0.35 - 0.906 Mean Correct 0.728 Mean Incorrect 0.553 Always Correct Above 0.906 (0% of cases) Always Incorrect Below 0.35 (0% of cases) 85.56% Accuracy Above 0.478 2.0 Fold Correct Above 0.86 (53.57% of cases)

Page 22: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 22 I

Edith Cowan University

CHAID Decision Tree Model

Predicting Higher Degree Research (HDR) Course Completion Time

Actual Candidature Status Predicted Candidature Status

STILL_ENROLLED 18.0% 101 STILL_ENROLLED 11.51% 65 WILL_COMPLETE 14.08% 79 WILL_COMPLETE 14.26% 80 WILL_COMPLETE_LATE 14.8% 83 WILL_COMPLETE_LATE 13.9% 78 WILL_DISCONTINUE 53.12% 298 WILL_DISCONTINUE 60.25% 338

Target Following

Page 23: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 23 I

Edith Cowan University

CHAID Decision Tree Model

Predicting Higher Degree Research (HDR) Course Completion Time

Model Evaluation & Analysis

Results for output field T_ATTEMPT_STATUS Comparing $R-T_ATTEMPT_STATUS with T_ATTEMPT_STATUS Correct 412 73.44% Wrong 149 26.56% Total 561 Performance Evaluation STILL_ENROLLED 1.566 WILL_COMPLETE 1.242 WILL_COMPLETE_LATE 1.192 WILL_DISCONTINUE 0.441 Confidence Values Report for $RC-T_ATTEMPT_STATUS Range 0.3 - 0.978 Mean Correct 0.775 Mean Incorrect 0.459 Always Correct Above 0.978 (0% of cases) Always Incorrect Below 0.3 (0% of cases) 76.05% Accuracy Above 0.379 2.0 Fold Correct Above 0.875 (42.4% of cases)

Page 24: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 24 I

Edith Cowan University

QUEST Decision Tree Model

Predicting Higher Degree Research (HDR) Course Completion Time

Actual Candidature Status Predicted Candidature Status

STILL_ENROLLED 18.0% 101 STILL_ENROLLED 25.18% 141 WILL_COMPLETE 14.08% 79 WILL_COMPLETE 12.32% 69 WILL_COMPLETE_LATE 14.8% 83 WILL_COMPLETE_LATE 2.5% 14 WILL_DISCONTINUE 53.12% 298 WILL_DISCONTINUE 60.0% 336

Target Following

Page 25: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 25 I

Edith Cowan University

QUEST Decision Tree Model

Predicting Higher Degree Research (HDR) Course Completion Time

Model Evaluation & Analysis

Results for output field T_ATTEMPT_STATUS Comparing $R-T_ATTEMPT_STATUS with T_ATTEMPT_STATUS Correct 374 66.79% Wrong 186 33.21% Total 560 Performance Evaluation STILL_ENROLLED 1.082 WILL_COMPLETE 0.902 WILL_COMPLETE_LATE 1.349 WILL_DISCONTINUE 0.404 Confidence Values Report for $RC-T_ATTEMPT_STATUS Range 0.321 - 0.808 Mean Correct 0.687 Mean Incorrect 0.542 Always Correct Above 0.808 (0% of cases) Always Incorrect Below 0.321 (0% of cases) 77% Accuracy Above 0.482

Page 26: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 26 I

Edith Cowan University

Predicting Completion Status for Higher Degree Research Candidates

Validating Prediction Accuracy...........

Predicting Higher Degree Research (HDR) Course Completion Time

Page 27: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 27 I

CHAID QUEST C5.0

Model Comparison Confidence Level Distributions

Predicts the target with Best Accuracy . Predicts the Target with weak accuracy of the three models used. Predicts the target with average accuracy

Strong Weak Average

Edith Cowan University

Predicting Higher Degree Research (HDR) Course Completion Time

Page 28: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 28 I

Completion Status Estimation Prediction Accuracy by Target Values

CHAID QUEST C5.0

Field STILL_ENROLLED* WILL_COMPLETE* WILL_COMPLETE_LATE* WILL_DISCONTINUE* Importance

$RC-T_ATTEMPT_STATUS 0.666 0.472 0.460 0.822 1.000 Important

Field STILL_ENROLLED* WILL_COMPLETE* WILL_COMPLETE_LATE* WILL_DISCONTINUE* Importance

$CC-T_ATTEMPT_STATUS 0.492 0.584 0.543 0.809 1.000 Important

Field STILL_ENROLLED* WILL_COMPLETE* WILL_COMPLETE_LATE* WILL_DISCONTINUE* Importance

$RC-T_ATTEMPT_STATUS 0.521 0.527 0.524 0.740 1.000 Important

Edith Cowan University

Predicting Higher Degree Research (HDR) Course Completion Time

Page 29: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 29 I

Completion Status Modeling

• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.

• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.

• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.

• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.

Conclusion

HDR DQ standards need to be raised. Data has good predictor strength. But it should be consistently populated over the span time used for prediction.

The model has good prediction accuracy, though ECU’s HDR Cohort is very small (700 students Approx).

The limitation with the modeling process was that only classification models can be used because of the limited size of the cohort. Neural Net and Logistic Regression modeling cannot be applied.

The next phase will be to design the Reporting Standards and Intervention Strategies, so that the modeling outcome can be used effectively to reduce the completion time for future students.

Edith Cowan University

Predicting Higher Degree Research (HDR) Course Completion Time

CHAID QUEST C5.0

Page 30: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 30 I

Edith Cowan University

1. Adam, J., and Gaither, G. H. “Retention in Higher Education: A Selective Resource Guide.” In G.H. Gaither (ed.), Minority Retention: What Works? New Directions for Institutional Research, no. 125. San Francisco: Jossey-Bass, 2005.

2. Pascarella, E., and Terenzini, P. How College Affects Students. San Francisco: Jossey-Bass, 2005.

3. Braxton, J. Reworking the Student Departure Puzzle. Nashville, Tenn.: Vanderbilt University Press, 2000.

4. Luan, J. “Data Mining and its Applications in Higher Education.” In A. M. Serban and J. Luan (eds.), Knowledge Management: Building a Competitive Advantage in Higher Education. New Directions for Institutional Research, no. 113. San Francisco: Jossey-Bass, 2002.

5. Byers Gonzalez, J., and DesJardins, S. “Artificial Neural Networks: A New Approach for Predicting Application Behaviour.” Research in Higher Education, 2002, 43 (2), 235 -258.

Predicting Higher Degree Research (HDR) Course Completion Time

References

Page 31: Application of Predictive Analytics to Higher Degree ... Fora... · Edith Cowan University Predicting Completion Status for Higher Degree Research Candidates Is it possible to predict

COMPANY NAME NAME OF PRESENTER 31 I

Edith Cowan University

Questions

Predicting Higher Degree Research (HDR) Course Completion Time