Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
COMPANY NAME NAME OF PRESENTER 1 I
Edith Cowan University
Rachna Dhand, Senior Strategic Information Analyst
Application of Predictive Analytics to Higher Degree Research Course Completion Times
Application of Decision Theory to PhD Course Completions (2006 – 2013)
COMPANY NAME NAME OF PRESENTER 2 I
Edith Cowan University
Application of Predictive Analytics to Higher Degree Research Course Completion Times
Higher Degree Research (HDR) students carry significant costs for Universities. Failure of students to complete either on time or at all results in sub optimal resource utilisation and impacts to government grant allocations and ratings. Objective The objective of this project was to analyse the historical completion time for ECU PhD candidates and identify the primary determinants for the same, so that some intervention strategies can be implemented for future students for their timely completion. Methodology Classification Decision Science Models are used to predict the Completion Times for HDR candidates. Some shortlisted Models are CHAID, QUEST and C5.0.
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 3 I
Edith Cowan University
What is Predictive Analytics?
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 4 I
Edith Cowan University
Data Mining Introduction, Modelling and Prediction Accuracy
Use of Data Mining strategies to identify who is at-risk of drop-out or who is likely to take longer time to finish his or her degree is not a new subject in Institutional Research (IR). Explanatory models by regression and path analysis have contributed substantially to our understanding of student retention (Adam and Gaither, 2005; Pascarella and Terenzini, 2005; Braxton, 2000). Though published studies on the use and prediction accuracy of data-mining approaches in IR are few. Luan (2002) explained the application of neural-net and decision tree analysis in predicting the transfer of college students to four-year institutions. Byres Gonzalez and Desjardins (2002) showed neural-network model predicts with better accuracy over Binary Logistic Regression. Prediction Accuracy does not solely depends on the type of model chosen for predictions but is also dependant on the independent variables chosen, their measurement levels and data size.
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 5 I
Edith Cowan University
Predicting Completion Status for Higher Degree Research Candidates
Is it possible to predict the Completion Status of the HDR student given the 1) Variable set of Demographic and Course information 2) Research Experience and Nature of Research Project 3) Faculty and School information 4) Supervisor information .................
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 6 I
HDR Completion Status Prediction Analysis and Modeling Approach
Analysis Modelling
The main objective of the model is to predict the Completion Time for future applicants where Completion Time is defined as the period spent by the candidate since commencement of the HDR degree till the completion of the degree.
The analysis takes into account the PhD students from 2006 to 2013 with their research performance and demographic information from all faculties of ECU as the history data set. The completion time is estimated.
The research performance and demographic variables are pooled through IVA to filter the most correlated variables with Completion Time and finalized for Modeling dataset.
Objective
Analysis
Information Value Analysis (IVA)
2
3
4
1
Edith Cowan University
Predicting Higher Degree Research (HDR) Course Completion Time
The dataset and estimated Completion time are modeled using Decision based Predictive Modeling. The tested models are C5.0, CHAID and QUEST, following classification based paradigm for modeling and scoring.
Methodology
COMPANY NAME NAME OF PRESENTER 7 I
HDR Completion Status Prediction
Completion Time Prediction Modelling
Historical data analysis filters PhD candidacy outcomes for ECU with reference years from 2006 to 2012.
Information Value Analysis screens out the primary determinants correlated with Completion Time that is to be targeted with Model Building.
Target definition is based on the Historical outcome and Model building is initiated using classification Models CHAID, C5.0 and Quest.
The Model with best result and most accurate emulation of the actual target is chosen to score the future candidates
1
2
3
4
Predicting Higher Degree Research (HDR) Course Completion Time
Edith Cowan University
COMPANY NAME NAME OF PRESENTER 8 I
Edith Cowan University
Predicting Completion Status for Higher Degree Research Candidates
Historical Analysis and Understanding Data used for Prediction...............
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER
HDR Cohort Analysis Count by Citizenship (2006 – 2012)
Predicting Higher Degree Research (HDR) Course Completion Time
Note: The small cohort size for HDR Enrolments poses constraints to Modelling process thereby making classification models more suitable for building and training process.
Edith Cowan University
COMPANY NAME NAME OF PRESENTER
HDR Candidacy Time Distribution Domestic Enrolled Candidates (2006 – 2013)
Predicting Higher Degree Research (HDR) Course Completion Time
Candidacy Time is the time spent by the student since commencement of the PhD degree till he or she reaches the final state of completion or discontinuity of the degree or stay enrolled for longer duration.
Edith Cowan University
COMPANY NAME NAME OF PRESENTER
HDR Course Attempt Distribution by Final Outcome Domestic Vs International
Predicting Higher Degree Research (HDR) Course Completion Time
Edith Cowan University
COMPANY NAME NAME OF PRESENTER
HDR Completion Status Estimation Scope of Modeling
Predicting Higher Degree Research (HDR) Course Completion Time
HDR Research Report Data (Set of 45 variables related to
student Research Experience and Candidature Progress)
Domestic PhD + International PhD Candidates Only
Student Course Details
(Set of 215 variables)
Target Definition for
Modelling (Completion Outcome)
Student Demographic and Course Information
Modelling Dataset
Training Dataset (90%) (2006 – 2012)
Testing & Scoring Dataset (10%) (2013)
C5.0, CHAID or QUEST Decision Tree Model
Discard Inactive and Intermittent Status
Research Data (Milestone status, ABS
Research Classifications and Scholarship Status data)
Edith Cowan University
COMPANY NAME NAME OF PRESENTER
HDR Completion Status Estimation Target Definition
Predicting Higher Degree Research (HDR) Course Completion Time
Edith Cowan University
1. WILL_COMPLETE ( Candidacy <= 4 Years)
2. WILL_COMPLETE_LATE (Candidacy > 4 Years)
3. STILL_ENROLLED (Candidacy > 3.5 Years)
4. WLL_DISCONTINUE (Attrition Flags set for all Teaching Periods)
5. IMMATURE VINTAGE (Candidacy < 3.5 Years) [Discarded]
date_years_difference ( D_COURSE_COMMENCEMENT_DT, D_COURSE_COMPLETION_
DT)
T_COURSE_ATTEMPT_STATUS matches "ENROL*" then 2013 - D_INTAKE_PERIOD
date_years_difference (D_COURSE_COMMENCEMENT_DT,T_COURSE_DISCONTINUED_DT
Actual Candidacy Status Target: T_ATTEMPT_STATUS Completion Time (Calculated in Years)
COMPANY NAME NAME OF PRESENTER 14 I
Edith Cowan University
Predicting Completion Status for Higher Degree Research Candidates
Using Classification Models for Prediction......................
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 15 I
Edith Cowan University
Decision Tree Models Due to non-linear relationships of indicator with Target and having a nominal target outcome, Decision Tree Models were selected for predicting the Completion Time outcomes for currently enrolled domestic students as well as International students. The outcome from the Model is: 1)Target Prediction (STILL_ENROLLED, WILL_COMPLETE,WILL_DISCONTINUE, WILL_COMPLETE_LATE). 2)Confidence Score for each enrolled student (ranges between 0 and 1). Rule Induction is basically categorised into: C5.0 , Chi-Square Automatic Interaction Detection (CHAID), QUEST and Classification and Regression (C&R )Tree. C5.0 Model handles Nominal or Flag targets with All Predictor categories (nominal, Continuous, or Flag).
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 16 I
Edith Cowan University
Predicting Higher Degree Research (HDR) Course Completion Time
Decision Tree Models Model Criteria C5.0 CHAID QUEST
Type of Split for Categorical Targets
Multiple Multiple Binary
Continuous Target No Yes No
Continuous Predictors
Yes No Yes
Criteria for Predictor Selection
Information Measure
Chi-Square F-Test for Continuous
Statistical
Supports Bagging/Boosting
Yes Yes Yes
COMPANY NAME NAME OF PRESENTER 17 I
CHAID
CHAID performs Chi-Square tests for Predictor Importance and Variable Reduction. The test preferably gives higher importance to continuous variables rather than nominal or categorical.
Predictor Importance F-Test Association with Target
Predicts the target with Best Accuracy .
Edith Cowan University
Predicting Higher Degree Research (HDR) Course Completion Time
Milestones Achieved
School Name
Load Completed
Field of Education
Course Fraction Completed
Meeting Frequency
Basis of Admission
Research Literature
Funding Category Changed
Annual Leaves Availed
COMPANY NAME NAME OF PRESENTER 18 I
C5.0
C5.0 performs Information Value (IV) and Weight of Evidence (WoE) Method for Variable Reduction. While WoE analyzes the predictive power of a variable in relation to the targeted outcome. IV assesses the overall predictive power of the variable being
considered.
Predictor Importance Information Value Analysis
Edith Cowan University
Predicting Higher Degree Research (HDR) Course Completion Time
Course Fraction Completed
Literature Review Feedback
Field of Education
NESB Indicator
Age at Enrolment
Mode of Attendance
Milestone Achieved
COMPANY NAME NAME OF PRESENTER
Completion Status Modeling Decision Tree Models Used
Completion Status
Estimation
C5.0 Decision Tree Model
QUEST CHAID
Predicting Higher Degree Research (HDR) Course Completion Time
Edith Cowan University
COMPANY NAME NAME OF PRESENTER 20 I
Edith Cowan University
C 5.0 Decision Tree Model
Predicting Higher Degree Research (HDR) Course Completion Time
Actual Candidature Status Predicted Candidature Status
STILL_ENROLLED 18.0% 101 STILL_ENROLLED 33.51% 188 WILL_COMPLETE 14.08% 79 WILL_COMPLETE 8.91% 50 WILL_COMPLETE_LATE 14.8% 83 WILL_COMPLETE_LATE 4.46% 25 WILL_DISCONTINUE 53.12% 298 WILL_DISCONTINUE 53.12% 298
Target Following
COMPANY NAME NAME OF PRESENTER 21 I
Edith Cowan University
C 5.0 Decision Tree Model
Predicting Higher Degree Research (HDR) Course Completion Time
Model Evaluation & Analysis
Results for output field T_ATTEMPT_STATUS Comparing $C-T_ATTEMPT_STATUS with T_ATTEMPT_STATUS Correct 409 72.91% Wrong 152 27.09% Total 561 Performance Evaluation STILL_ENROLLED 1.0 WILL_COMPLETE 1.545 WILL_COMPLETE_LATE 1.636 WILL_DISCONTINUE 0.515 Confidence Values Report for $CC-T_ATTEMPT_STATUS Range 0.35 - 0.906 Mean Correct 0.728 Mean Incorrect 0.553 Always Correct Above 0.906 (0% of cases) Always Incorrect Below 0.35 (0% of cases) 85.56% Accuracy Above 0.478 2.0 Fold Correct Above 0.86 (53.57% of cases)
COMPANY NAME NAME OF PRESENTER 22 I
Edith Cowan University
CHAID Decision Tree Model
Predicting Higher Degree Research (HDR) Course Completion Time
Actual Candidature Status Predicted Candidature Status
STILL_ENROLLED 18.0% 101 STILL_ENROLLED 11.51% 65 WILL_COMPLETE 14.08% 79 WILL_COMPLETE 14.26% 80 WILL_COMPLETE_LATE 14.8% 83 WILL_COMPLETE_LATE 13.9% 78 WILL_DISCONTINUE 53.12% 298 WILL_DISCONTINUE 60.25% 338
Target Following
COMPANY NAME NAME OF PRESENTER 23 I
Edith Cowan University
CHAID Decision Tree Model
Predicting Higher Degree Research (HDR) Course Completion Time
Model Evaluation & Analysis
Results for output field T_ATTEMPT_STATUS Comparing $R-T_ATTEMPT_STATUS with T_ATTEMPT_STATUS Correct 412 73.44% Wrong 149 26.56% Total 561 Performance Evaluation STILL_ENROLLED 1.566 WILL_COMPLETE 1.242 WILL_COMPLETE_LATE 1.192 WILL_DISCONTINUE 0.441 Confidence Values Report for $RC-T_ATTEMPT_STATUS Range 0.3 - 0.978 Mean Correct 0.775 Mean Incorrect 0.459 Always Correct Above 0.978 (0% of cases) Always Incorrect Below 0.3 (0% of cases) 76.05% Accuracy Above 0.379 2.0 Fold Correct Above 0.875 (42.4% of cases)
COMPANY NAME NAME OF PRESENTER 24 I
Edith Cowan University
QUEST Decision Tree Model
Predicting Higher Degree Research (HDR) Course Completion Time
Actual Candidature Status Predicted Candidature Status
STILL_ENROLLED 18.0% 101 STILL_ENROLLED 25.18% 141 WILL_COMPLETE 14.08% 79 WILL_COMPLETE 12.32% 69 WILL_COMPLETE_LATE 14.8% 83 WILL_COMPLETE_LATE 2.5% 14 WILL_DISCONTINUE 53.12% 298 WILL_DISCONTINUE 60.0% 336
Target Following
COMPANY NAME NAME OF PRESENTER 25 I
Edith Cowan University
QUEST Decision Tree Model
Predicting Higher Degree Research (HDR) Course Completion Time
Model Evaluation & Analysis
Results for output field T_ATTEMPT_STATUS Comparing $R-T_ATTEMPT_STATUS with T_ATTEMPT_STATUS Correct 374 66.79% Wrong 186 33.21% Total 560 Performance Evaluation STILL_ENROLLED 1.082 WILL_COMPLETE 0.902 WILL_COMPLETE_LATE 1.349 WILL_DISCONTINUE 0.404 Confidence Values Report for $RC-T_ATTEMPT_STATUS Range 0.321 - 0.808 Mean Correct 0.687 Mean Incorrect 0.542 Always Correct Above 0.808 (0% of cases) Always Incorrect Below 0.321 (0% of cases) 77% Accuracy Above 0.482
COMPANY NAME NAME OF PRESENTER 26 I
Edith Cowan University
Predicting Completion Status for Higher Degree Research Candidates
Validating Prediction Accuracy...........
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 27 I
CHAID QUEST C5.0
Model Comparison Confidence Level Distributions
Predicts the target with Best Accuracy . Predicts the Target with weak accuracy of the three models used. Predicts the target with average accuracy
Strong Weak Average
Edith Cowan University
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 28 I
Completion Status Estimation Prediction Accuracy by Target Values
CHAID QUEST C5.0
Field STILL_ENROLLED* WILL_COMPLETE* WILL_COMPLETE_LATE* WILL_DISCONTINUE* Importance
$RC-T_ATTEMPT_STATUS 0.666 0.472 0.460 0.822 1.000 Important
Field STILL_ENROLLED* WILL_COMPLETE* WILL_COMPLETE_LATE* WILL_DISCONTINUE* Importance
$CC-T_ATTEMPT_STATUS 0.492 0.584 0.543 0.809 1.000 Important
Field STILL_ENROLLED* WILL_COMPLETE* WILL_COMPLETE_LATE* WILL_DISCONTINUE* Importance
$RC-T_ATTEMPT_STATUS 0.521 0.527 0.524 0.740 1.000 Important
Edith Cowan University
Predicting Higher Degree Research (HDR) Course Completion Time
COMPANY NAME NAME OF PRESENTER 29 I
Completion Status Modeling
• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.
• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.
• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.
• This is an example text. Example text. Go ahead and replace it. This is an example text. Example text.
Conclusion
HDR DQ standards need to be raised. Data has good predictor strength. But it should be consistently populated over the span time used for prediction.
The model has good prediction accuracy, though ECU’s HDR Cohort is very small (700 students Approx).
The limitation with the modeling process was that only classification models can be used because of the limited size of the cohort. Neural Net and Logistic Regression modeling cannot be applied.
The next phase will be to design the Reporting Standards and Intervention Strategies, so that the modeling outcome can be used effectively to reduce the completion time for future students.
Edith Cowan University
Predicting Higher Degree Research (HDR) Course Completion Time
CHAID QUEST C5.0
COMPANY NAME NAME OF PRESENTER 30 I
Edith Cowan University
1. Adam, J., and Gaither, G. H. “Retention in Higher Education: A Selective Resource Guide.” In G.H. Gaither (ed.), Minority Retention: What Works? New Directions for Institutional Research, no. 125. San Francisco: Jossey-Bass, 2005.
2. Pascarella, E., and Terenzini, P. How College Affects Students. San Francisco: Jossey-Bass, 2005.
3. Braxton, J. Reworking the Student Departure Puzzle. Nashville, Tenn.: Vanderbilt University Press, 2000.
4. Luan, J. “Data Mining and its Applications in Higher Education.” In A. M. Serban and J. Luan (eds.), Knowledge Management: Building a Competitive Advantage in Higher Education. New Directions for Institutional Research, no. 113. San Francisco: Jossey-Bass, 2002.
5. Byers Gonzalez, J., and DesJardins, S. “Artificial Neural Networks: A New Approach for Predicting Application Behaviour.” Research in Higher Education, 2002, 43 (2), 235 -258.
Predicting Higher Degree Research (HDR) Course Completion Time
References
COMPANY NAME NAME OF PRESENTER 31 I
Edith Cowan University
Questions
Predicting Higher Degree Research (HDR) Course Completion Time