27
Mark Hamner Mark Hamner Texas Woman’s University Texas Woman’s University Department of Mathematics and Computer Science Department of Mathematics and Computer Science Preet Ahluwalia Preet Ahluwalia Credit Risk Analyst-AmeriCredit Credit Risk Analyst-AmeriCredit Predicting Real-Time Percent Predicting Real-Time Percent Enrollment Increase Enrollment Increase __________________ __________________

Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Embed Size (px)

Citation preview

Page 1: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Mark HamnerMark HamnerTexas Woman’s UniversityTexas Woman’s University

Department of Mathematics and Computer ScienceDepartment of Mathematics and Computer Science

Preet AhluwaliaPreet AhluwaliaCredit Risk Analyst-AmeriCreditCredit Risk Analyst-AmeriCredit

Predicting Real-Time Percent Enrollment IncreasePredicting Real-Time Percent Enrollment Increase ____________________________________

Page 2: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Texas Woman’s UniversityTexas Woman’s University

Denton . Dallas . Houston

Year 2005 FactsYear 2005 Facts

Total Enrollment – 11,344

Undergrad – 6,266

Graduate (Masters) – 4,369

Doctoral - 709

Campus Enrollment

Denton –9,157

Dallas – 921

Houston – 1,266

59 academic programs

(19 doctoral)

Female – 10,368

Male – 976

Page 3: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

OutlineOutline

Problem Definition

Predicting Student Enrollment at Time ‘t’ Using Historical DataPredicting Student Enrollment at Time ‘t’ Using Historical Data

1. Enrollment Process - For Newly Enrolled

2. The predictive problem

3. Logistic Prediction Model

a. Data Issues and programming Solutions

4. Quadratic Prediction Model

a. Exploratory analysis to Identify Patterns

5. Combine for overall Prediction: Results

Page 4: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Enrollment

• Enrollment predictions can be broken into two fundamental pieces:

• The focus of this paper is the prediction of Newly Enrolled students.

NewlyEnrolledStudents

Re-Enrolling/Continuing Students

Page 5: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

All Prospective Students

Applicants

FTIC Transfer Graduate Others

Admitted to TWU

New12th Day

Enrolled

New StudentsNew Students: Enrollment ProcessEnrollment Process

Page 6: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Idea Behind Enrollment Prediction at Time = t

Page 7: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Enrollment Prediction at Time ‘t’

Let Time = t denote the prediction date

For Applicants Before t , we will have data

For Applicants after time t (denoted by t’) , we will not have data

Total Enrollment = Enroll_t + Enroll_t’

Predict Predict

Time tBegin

PredictionFall 12th

Day

Page 8: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Weekly Partition of Prediction Interval

Predict Predict

Week 0 5 17

The prediction interval will be broken up into weekly Intervals

The diagram below illustrates prediction at Week = 5

At Week = 5 we have 35 more days of applicant data than at Week = 0

Total Enroll = Enroll_t + Enroll_t’

Page 9: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Enroll_t

Pt = {1, 2, …, Nt} -- Finite set of applicants at week = t

k Pt

Enrollment is a dichotomous response variable – yk

yk = 1 (student enrolled), yk = 0 (student did not enroll)

Enrollment of all applicants at week = t ,

1

t

t_N

kk

Enroll y

Page 10: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Model Dichotomous Variable

For each yk, k Pt

let θk represent the probability that yk = 1

There exists applicant information for each individual:

xk = (x1k, x2k, …, xpk) = (Distancek, SATk,…, Major_Ratiok)

Use Logistic Regression to model θk

Page 11: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Logistic Regression Model

• The probability of student k enrolling is

Lk = β0 + β1 Distancek + β2 SATk +…+ βp Major_Ratiok

( )1

k

k

L

k k Le

e

x

These are predictor variables

Page 12: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Predict Enroll_tPredict Enroll_t

1

2

tN

y

y

y

Y

Estimated Enroll_t is …

1

2

1

( )

( )ˆ ( _ ) 1 ( ) 1 ( )

( )

t

t

N

t k kk

N

E y

E yT E Enroll t E

E y

Y x

Let Y be the random vector of responses:

Thus,

1

1

t

Y_N

kk

Enroll t y Note: 1 is a Nt x 1 vector of ones

Page 13: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Logistic Model

• Predictor variables: Distance, DOB, Major_Ratio, SAT_M, SAT_V, Gender, Personal, etc.

• What variables will get picked for model building?

Year Prior

Applicant

Data

Current Year

Prediction

Page 14: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Use SAS to create possibly significant variables and dummy code categorical variables

Example: Major_Ratio, Ethnic, etc.

Backward Selection

Slightly different variables are selected

for: FTIC, Transfer, and Graduate.

Programming and Variable SelectionProgramming and Variable Selection

Start SaturatedModel

DropPredictor

StopFitted Model

No

Yes

SAS Programming:Exploratory and Variable Creation

Page 15: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

FTIC Variable Selection

Variable Name Variable Type Variable Description

Twelve Response 1 if enrolled; 0 otherwise

Distance♦ Explanatory Continuous variable

SAT_M, SAT_V, ACT Explanatory Continuous Variable; SAT Math score, SAT Verbal score, Act Score

Give ACT♦ Explanatory 1 if score provided; 0 otherwise

Program Ratio♦ Explanatory Continuous variable

Major Ratio♦ Explanatory Continuous variable

Date of Birth Explanatory Continuous variable

Gender♦ Explanatory 1 if female; 0 for male

Apply Early♦ Explanatory 1 if apply before January 1; 0 otherwise

E1, E2, E3, E4, E5, E6, E7

Explanatory Dummy variables for Ethnicity

Personal♦ Explanatory Discrete Variable; Number of key information available for a student

Page 16: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Case Study-Logistic Model Prediction

Applicant data for 2003 to predict 2004 FTIC by weekly time intervals

• The Logistic Model does not predict after week = t

Week Total Apply Predict Enroll_t Actual Enroll Off % Off0 1,877 608 578 30 1.6%1 1,896 615 584 31 1.6%2 1,930 623 594 29 1.5%3 1,951 632 606 26 1.3%4 1,975 638 613 25 1.3%5 1,994 644 620 24 1.2%6 2,005 647 623 24 1.2%7 2,026 655 634 21 1.0%8 2,039 659 638 21 1.0%9 2,058 665 647 18 0.9%

10 2,065 667 650 17 0.8%11 2,081 669 653 16 0.8%12 2,097 673 661 12 0.6%13 2,111 678 668 10 0.5%14 2,118 680 671 9 0.4%15 2,122 681 673 8 0.4%16 2,123 681 674 7 0.3%17 2,146 690 687 3 0.1%

FTIC 2004 Enrollment

Page 17: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Enrollment after Week = t

• Total Enrollment = Enroll_t + Enroll_t’

• At any week = t, we need to predict Enroll_t’

• Identify historical relationships that may be helpful

Page 18: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Applicant Versus Enrolled by Year

• Both applications and enrollment have been increasing

• Notice enrollment yield is decreasing

66.4%

48.1% 46.7%43.5%

41.7%39.4%

34.6%01,0002,000

3,0004,0005,0006,000

7,0008,0009,000

1999 2000 2001 2002 2003 2004 2005

Year

To

tal

30%35%40%45%50%55%60%65%70%75%80%

Per

cen

t

Apply Enroll %Enroll

Is the % increase in enrollment matching the % increase in apply?

Page 19: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Applicant Yield By Strata

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

1999 2000 2001 2002 2003 2004 2005

Year

% A

ppli

cant

s E

nrol

l

Graduate FTIC Transfer

Enrollment is yield from applicant data is decreasing for each strata

How does this affect yearly increase in enrollment?

Page 20: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Percent Increase Applicant Vs. Enrolled

• Applicant increase is not a viable indicator of enrollment increase

102.7%

6.9%

31.7%

16.4%9.2% 13.4%

46.8%

3.9%

22.7%

11.5%3.3%

-0.4%

-20.0%

0.0%

20.0%

40.0%

60.0%

80.0%

100.0%

120.0%

2000 2001 2002 2003 2004 2005

Year

Per

cen

t

% Increase Apply

% Increase Enroll

• What patterns are reliable to model?

Page 21: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Cumulative FTIC Enrollment by Week

• Notice the parallel lines, which implies equal slopes!

• At any week = t, we can relate Enroll_tEnroll_t to Total EnrollmentTotal Enrollment (Week = 17)

• Thus, (Total Enroll – Enroll_t) should be very similar from year to year

0

100

200

300

400

500

600

700

800

900

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Week

Cu

mm

ula

tive

En

roll

1999 2000 2001 2002 2003 2004 2005

Page 22: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Relationship Between Enrollment & Total Enrollment

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17

Week

En

roll

_t'

• By definition, (Total Enroll – Enroll_t) = Enroll_t’

• Model Enroll_t’ and smooth out the consistent patterns by week

Page 23: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Enroll_t’ Model

2003 FTIC Enroll_t'

0

20

40

60

80

100

120

140

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

Week

En

roll

_t'

• Use 2003 Enroll_t’ Model to predict Enroll_t’ for 2004

Estimate of Enroll_t’: (R2 = 0.9857)

2t'T̂ = 0.1961 week - 10.514 week + 130.7

Page 24: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Predict 2004 Enroll_t’

Week Total Apply Actual Enroll Actual Enroll_t' Enroll_t' Off0 1,877 578 109 131 221 1,896 584 103 120 172 1,930 594 93 110 173 1,951 606 81 101 204 1,975 613 74 92 185 1,994 620 67 83 166 2,005 623 64 75 117 2,026 634 53 67 148 2,039 638 49 59 109 2,058 647 40 52 1210 2,065 650 37 45 811 2,081 653 34 39 512 2,097 661 26 33 713 2,111 668 19 27 814 2,118 671 16 22 615 2,122 673 14 17 316 2,123 674 13 13 017 2,146 687 0 9 9

FTIC 2004 Enrollment Predict

Page 25: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Predict 2004 FTIC Total Enroll

Total Enrollment = Enroll_t + Enroll_t’ Note: 2004 FTIC Actual Total is 687

Week Total Apply Enroll_t Enroll_t' Total Off0 1,877 608 131 739 521 1,896 615 120 736 492 1,930 623 110 733 463 1,951 632 101 733 464 1,975 638 92 729 425 1,994 644 83 727 406 2,005 647 75 721 347 2,026 655 67 722 358 2,039 659 59 718 319 2,058 665 52 717 3010 2,065 667 45 712 2511 2,081 669 39 708 2112 2,097 673 33 706 1913 2,111 678 27 705 1814 2,118 680 22 702 1515 2,122 681 17 698 1116 2,123 681 13 694 717 2,146 690 9 698 11

2004 FTIC Predict

Page 26: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

Predict 2005 FTIC Total Enroll

Total Enrollment = Enroll_t + Enroll_t’ Note: 2005 FTIC Actual Total is 765

Week Total Apply Enroll_t Enroll_t' Total Off0 2,328 668 109 777 121 2,357 675 100 775 102 2,390 687 92 779 143 2,409 690 84 774 94 2,432 696 76 772 75 2,444 697 69 766 16 2,480 707 61 768 37 2,497 712 55 767 28 2,521 716 48 764 -19 2,534 719 42 761 -410 2,549 722 36 758 -711 2,564 727 31 758 -712 2,583 732 26 757 -813 2,595 736 21 756 -914 2,606 739 16 755 -1015 2,611 740 12 752 -1316 2,617 742 8 750 -1517 2,652 755 5 760 -5

2005 FTIC Predict

Page 27: Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit Predicting Real-Time

- END -- END -

Thank you!

Any Questions?