View
368
Download
1
Tags:
Embed Size (px)
Citation preview
I’m not Allowed to Reveal My Yellow
Number for a meeting Up Front After my Talk
The Difficulty of Knowing and The “E” Word
Dr. Eric P. Bettinger
Stanford University and NBER
April 19, 2012
What is the “E” Word?• The “E” word today is “Evaluation”
• Why is “evaluation” a frightening word?• Typically external and out of your control. • Judges your work.• Expensive and time-consuming.• Small look at a larger, more intensive service.• “Evaluation” courses in graduate school were not your
favorite.
• My goal today is to suggest that evaluation is integral to the life cycle of knowledge and the continued success of access and success programs.
Experimentation and The Learning Organization: A Virtuous Cycle
Evaluate
Innovate
Experiment
Economist’s View of Higher Education
Human Capital Model
Net benefit of
attendance
Net benefit of alternative
ATTENDANCE DECISION
Includes:• Monetary Benefits• Monetary Costs• Non-monetary
Costs/Benefits
Why Does the Process Fail?
Net benefit of not
attending
Net benefit of attending
ATTENDANCE DECISION
Human Capital Model
Includes:• Cost of Completing
Applications• Procrastination• Bad Information
• Students’ Guesses on Financial Aid
Experimentation and The Learning Organization: A Virtuous Cycle
Evaluate
Innovate
Experiment
Innovate: How to Improve Information and Application
Partnership with H&R Block: Families reveal their finances to tax professionals
each year. 2/3 of FAFSA information comes from the taxes.
60 percent of clientele are Pell eligibleH&R Block tax professionals have expertise in
complicated income informationThey have the ability to process highly-accurate
forms to meet deadlines in a timely fashionScalable – could replicate in any tax preparation
settingAbout 2/3 of welfare recipients use tax preparers,
like H&R Block, to complete their taxes
Experimentation and The Learning Organization: A Virtuous Cycle
Evaluate
Innovate
Experiment
Then the Experiment…HRB completes regular tax services
Software screens to see if likely eligible
Complete consent & basic background questions
Treatment #1Financial Aid Application
Help & Information
RANDOMIZATION
ControlGroup
Treatment #2Information Only
Over 40,000 individuals participated in Ohio and North Carolina between 2007-2009.
Experimentation and The Learning Organization: A Virtuous Cycle
Evaluate
Innovate
Experiment
The FAFSA Treatment significantly increased enrollment among graduating HS seniors • Substantial increase of 7 percentage points in
college going (34% compared to 27% for the control group)
• Effect continues into students’ third year of college
Among older, independent students who had not previously attended college , there was also an effect • Enrollment effect was 21% (near significant)• The effect seems to be concentrated among
those with incomes less than $22,000
For other independents, there was an effect on aid receipt (addressing problem of eligible college students not getting aid)
Summary: Impact on College Enrollment & Aid Receipt
Experimentation and The Learning Organization: A Virtuous Cycle
Evaluate
Innovate
Experiment
Do Such Cycles Happen?• Perhaps, but if so, the knowledge is not being passed.• What Works Clearinghouse Guide to College Access (2009)
• Academic preparation evidence = low• Communicate with students their academic preparation
evidence = low• Surround with peers and adults to build aspirations
evidence = low• Assist with critical steps for college entry evidence =
moderate• 6 Studies: 3 positive, 3 with no effect.
• Increase awareness of financial aid with help with FAFSA evidence = moderate
• 2 Studies: Both positive
• Why so little?
Why Would We Want Evaluation?
• Assess Overall Effectiveness• Provides information for stakeholders• Without evaluation, there is only “conjecture and
criticism” (Phipps 1998)
• Policy Preservation• Social Security Student Benefit Program (US)• Example of Colombia PACES Program
• Alignment and Modifications of Policies• Georgia Hope Example• Unexpected benefits and consequences• Identifying specific programmatic elements that
could lead to the impacts
Social Security Benefit Program Aid by Year
Percentage of Students Attending College
Father Not Deceased
Father Deceased
Finished Secondary School 1979-1981 54% 63%
Finished Secondary School 1982-83 49% 32%
Evidence Came too Late. Program was Cancelled.
Example of Georgia Hope Scholarship
• Georgia Hope Scholarship• Provided full tuition scholarships to Georgia students who stay in
Georgia• Students had to have a 3.0 GPA in secondary school• Stated goal of the program: Increase access to higher education
among low-income families
• Evaluation Results• Student enrollment increased in general (Cornwell, Mustard,
Sridhar 2002)• Low-Income, especially minority, enrollments did not increase
(Dynarski 2000)
Georgia Hope Scholarship (cont.)
• Gap Between Goal and Impact• Goal: Increase access for low-income• Impact: Increase access for middle- and upper-income
families but not lower-income
• Why the Failure?• Hope rewarded academic performance• HOPE required complex forms• Higher income families have
• Better secondary school performance• Greater access to college information
• How did Evaluation Impact Policy?• Academic performance requirement was reduced• Application process simplified
What Makes a Good Evaluation?
1. Comparison Strategy (“Identification Strategy”)• Research is about comparing what happened to
what might have happened
2. Data• Detailed data on program implementation and
use• Data on student outcomes
Comparison Strategy
• Core of Evaluation is Comparison
• Program effect is difference between observed outcome and outcome that would have happened without the program
• Counterfactual outcome is never observed• We cannot observe the same student with and
without assistance
• Comparison group represents the counterfactual
• Not all comparison groups are created equal
Who is the Comparison?• Suppose your strategy is to compare Student X who receives
help to similar students. There may be some unobservable differences.
• I’ll use myself as an example here.• My high school career was extremely good: Valedictorian, Near
Perfect ACT, Student Body President, All-State Football.• My counselors were energized to work with me.
• Now to the counselors, I’m a success story. They worked with me. I succeeded.
• Perhaps there is some truth to it, but I kept a little list of my goals I set at the start of high school. I made these goals at home by myself. The top of that list – a full ride to college.
• Was my success the result of advisers or some underlying drive?
Criticisms of Research on Access/Success
• Our anecdotes are compelling, but our numbers are often doubted.
• Often we base our numbers on simple comparisons of “similar” students, but our comparisons can be debated.• WWC rarely recognizes our comparisons as meeting evidence
standards. Only seven studies meet their definition of rigor, and only four of those find positive results.
• In an era of increased accountability, more rigorous evidence is required• Increased demands for “return on investment” data.
So How Do We Make It Better?
• We have to think more carefully about evaluation as we expand.• This is going to require talking to evaluators earlier in the
process.• Evaluation rarely meets rigorous standards when planned
after the fact.
• We may have to modify some expansion to accommodate evaluation.
• We need evaluation to be part of culture rather than the “E” word that we avoid.
Evaluators Have to Improve, Too
• Results are time sensitive and important for continued and future funding.
• Complex research designs are difficult to enact. Evaluators have to be creative.• Often there are tensions between programmatic goals and
good evaluation design.• Both sides have to compromise.
• We need greater communication on results and research design.
Why do Some EvaluationsShow No Impact?
• How can we often find no impacts when the stories and anecdotes emerging have so much salience?
• The key is the counterfactual. To have impact, we need to change what would have happened in the absence of the program.
• Consider my case, what was the counterfactual? Would I not have gone to college without their aid?
• This presents an enormous Catch-22 for advising.• You have to be the judge of what situations would succeed without
your help.• You have to have confidence that some would make it through the
potholes.• We have to find the ones that cannot make it without help.
Example of Rigorous Evaluation
• Angrist, Lang & Oreopoulos (2006) • Large Canadian university• Multiple Services
• Program providing support services to new college students (e.g. tutoring)
• Financial incentive for maintaining a certain grade point average in college
• 700 students applied. There was only funding for half of the students.
• Program managers used random lottery to assign students to level of treatment. They considered it a “fair” way to determine who received services.
Pre-Lottery Similarities in High School Grades
0.0
2.0
4.0
6.0
8
65 70 75 80 85 90 95High School Grade Average Used for University Admission
Control SFP/SFSP
Post-Lottery Differences in Grade Point Average (Women)
0.0
1.0
2.0
3.0
4
30 35 40 45 50 55 60 65 70 75 80 85 90 95First Term Grade Average
Control SFP/SFSP
Other Examples with Mixed Evidence?
Hanushek’s (1996) Summary of Evidence on the Effects of Inputs on Student Outcomes
Type of Study
Number of
Studies
Statistically Significant
Statistically Insignificant
Positive Negative Positive Negative Unknown
Teacher-pupil Ratio
277 15 13 27 25 20
Teacher Education
171 9 5 33 27 26
Teacher Experience
207 29 5 30 24 12
Expenditure per Pupil
163 27 7 34 19 13
Matching Strategies: Example• Classic debate on class size in secondary school• Hanushek (1986, 1989, 1996, 1997, 1998)
• Surveyed research based on matching students to “comparable” students.
• Finds no consistent effect of class size on student achievement.
• Krueger (2003)• Uses randomization in Tennessee• Finds large positive effects of class size
Why the Difference?• Krueger:
• “not all estimates are created equal”• Krueger quoting Galileo:
• ‘I say that the testimony of many has little more value than that of few, since the number of people who reason well in complicated matters is much smaller than that of those who reason badly. If reasoning were like hauling I should agree that several reasoners would be worth more than one, just as several horses can haul more sacks of grain than one can. But reasoning is like racing and not like hauling, and a single Barbary steed can outrun a hundred dray horses.’
• “Tennessee’s Project STAR is the single Barbary steed in the class size literature”
“Bottom Line”• Research depends on the quality of comparisons
• Not all comparisons are equal• Some comparisons provide information
• BUT, may hide confounding factors
• If we want better results and more secure funding streams, we need better evidence.• We need to use strategies which are not susceptible to
confounding factors.
Additional Considerations in Evaluation
• Timing of Evaluation• Gap between start and production of evidence
• Cost of Evaluation• Cost of program, evaluation, data collection
• Ethical Considerations• Provision of the service• Right of privacy
• Political Feasibility of Evaluation• Lance Pritchett: “No advocate would want to
engage in research that potentially undermines support for his/her program. Endless, but less than compelling, controversy is preferred to knowing for sure.”
So Where Do We Start?
1. Plan Ahead• Impossible to use randomization after the fact• Creating and developing data collection
instruments takes time2. Consult People Who Know Research
• NCAC’s integration of research into their expansion.
3. Take a Risk• Evaluation is risky. It may be that the program
does not work, but knowing a policy’s strengths can lead to even better policies.
Experimentation and The Learning Organization: A Virtuous Cycle
Evaluate
Innovate
Experiment