24
Funding Education through Donors Choose General Assembly 2016 Fernando Hidalgo

Donors Choose Project (1)

Embed Size (px)

Citation preview

Page 1: Donors Choose Project (1)

Funding Education through Donors ChooseGeneral Assembly 2016Fernando Hidalgo

Page 2: Donors Choose Project (1)

Problem Description

Task: Predict Whether a Donor’s Choose Project will get FundedExperience: Donor’s Choose Data from Sept 2002 - CurrentlyPerformance: Classification Accuracy, the Number of correct prediction out of all predictions made.

Page 3: Donors Choose Project (1)

The Data

Page 4: Donors Choose Project (1)

LabelsCompleted: 592,757

&

Expired:261,536Class Skewness:

Use F1 Score as a way to use recall and precision in check.

Baseline: .69

Page 5: Donors Choose Project (1)

Features Abbreviations Descriptions

total_price_excluding_optional_support Total Price of the Project (integer)(dollars)

students_reached # of students that are project reaches(integer)

school_type Types of School:Charter, magnet, year_round, nlns, kipp, Charter_ready_promise(categorical)

date_posted Day that the project was posted(categorical)

resource_type Type of Resources the project asks(categorical)

grade_level The Grade Level of the Project(categorical

poverty_level Poverty Level (categorial)

school_state From what state the project is posted(categorical)

Eligible_double_your_impact_matchWhether it was eligible to be matched(categorical

teacher_prefix The Prefix of the Teacher Posting(categorical)

primary_focus_area The Project’s Primary Area of Focus(categorical)

primary_focus_subject The Project’s Primary Subject of Focus(categorical)

Original Feature

s

Page 6: Donors Choose Project (1)

Feature Engineering

New Features Description

price_per_student total_price/students_reached

project_length Date_expiration - date_posted

month_posted Extracted from date_posted

day_posted Extracted from date_posted

Page 7: Donors Choose Project (1)

Visualizations

Page 8: Donors Choose Project (1)
Page 9: Donors Choose Project (1)

Rate of Projects Funded to Total Projects per Resource

Page 10: Donors Choose Project (1)

Rate of Projects Funded to Total Projects per Month

Page 11: Donors Choose Project (1)

Rate of Projects Funded to Total Projects per Grades

Page 12: Donors Choose Project (1)

Rate of Projects Funded to Total Projects per Primary Focus Area

Page 13: Donors Choose Project (1)

Rate of Projects Funded to Total Projects per Teacher Prefix

Page 14: Donors Choose Project (1)

Rate of Projects Funded to Total Projects per Poverty Level

Page 15: Donors Choose Project (1)

Relationship Between Project Length and Funding

Page 16: Donors Choose Project (1)

Relationship Between Project Price and Funding

Page 17: Donors Choose Project (1)

Relationship Between Price per Student and Funding

Page 18: Donors Choose Project (1)

Predictive Model

Page 19: Donors Choose Project (1)

The 3 Models:

1.AdaBoost

2.Random Forest

3.Logistic Regression

Page 20: Donors Choose Project (1)

GridSearch Accuracy Scores

using F1 Score Metric

Model Accuracy Best Parameter

Random Forest 0.759 Criterion: Entropy

AdaBoost .7676 N_estimators: 60

Logistic Regression 0.811 Penalty: L2

Simplest Model with Best Score:Logistic Regression

Page 21: Donors Choose Project (1)

Checking Feature Significance:

Using Random Forest Classifier

The top 5 Features Seem to Have Most of the Predictive Power

Page 22: Donors Choose Project (1)

Using Only the 5 Most Significant Features

1. Total_price_excluding_optional_su

pport

2. Eligible_double_your_impact_match

3. Resource_Type_Books

4. Resource_Type_Technology

5. price_per_student

New Score withLogistic Regression:

.8171

Page 23: Donors Choose Project (1)

Overview● Model Improvement of .1271 over the baseline using

Logistic Regression with F1 Score.

● Most of Predictive Power Lies in 5 Features

● Ethical Implications:○ The features with the most predictive power are not

ones that can be changed without fabrication

Page 24: Donors Choose Project (1)

Model Improvements Add Prescriptive Data:

Project Essays Project Materials

Use Data Based on Location:Census

Skewed Data:Find Reasons

Methods