Bettinger Keynote: The Difficulty of Knowing and The "E" Word

I’m not Allowed to Reveal My Yellow

Number for a meeting Up Front After my Talk

The Difficulty of Knowing and The “E” Word

Dr. Eric P. Bettinger

Stanford University and NBER

April 19, 2012

What is the “E” Word?• The “E” word today is “Evaluation”

• Why is “evaluation” a frightening word?• Typically external and out of your control. • Judges your work.• Expensive and time-consuming.• Small look at a larger, more intensive service.• “Evaluation” courses in graduate school were not your

favorite.

• My goal today is to suggest that evaluation is integral to the life cycle of knowledge and the continued success of access and success programs.

Experimentation and The Learning Organization: A Virtuous Cycle

Evaluate

Innovate

Experiment

Economist’s View of Higher Education

Human Capital Model

Net benefit of

attendance

Net benefit of alternative

ATTENDANCE DECISION

Includes:• Monetary Benefits• Monetary Costs• Non-monetary

Costs/Benefits

Why Does the Process Fail?

Net benefit of not

attending

Net benefit of attending

ATTENDANCE DECISION

Human Capital Model

Includes:• Cost of Completing

Applications• Procrastination• Bad Information

• Students’ Guesses on Financial Aid


Evaluate

Innovate

Experiment

Innovate: How to Improve Information and Application

Partnership with H&R Block: Families reveal their finances to tax professionals

each year. 2/3 of FAFSA information comes from the taxes.

60 percent of clientele are Pell eligibleH&R Block tax professionals have expertise in

complicated income informationThey have the ability to process highly-accurate

forms to meet deadlines in a timely fashionScalable – could replicate in any tax preparation

settingAbout 2/3 of welfare recipients use tax preparers,

like H&R Block, to complete their taxes


Evaluate

Innovate

Experiment

Then the Experiment…HRB completes regular tax services

Software screens to see if likely eligible

Complete consent & basic background questions

Treatment #1Financial Aid Application

Help & Information

RANDOMIZATION

ControlGroup

Treatment #2Information Only

Over 40,000 individuals participated in Ohio and North Carolina between 2007-2009.


Evaluate

Innovate

Experiment

The FAFSA Treatment significantly increased enrollment among graduating HS seniors • Substantial increase of 7 percentage points in

college going (34% compared to 27% for the control group)

• Effect continues into students’ third year of college

Among older, independent students who had not previously attended college , there was also an effect • Enrollment effect was 21% (near significant)• The effect seems to be concentrated among

those with incomes less than $22,000

For other independents, there was an effect on aid receipt (addressing problem of eligible college students not getting aid)

Summary: Impact on College Enrollment & Aid Receipt


Evaluate

Innovate

Experiment

Do Such Cycles Happen?• Perhaps, but if so, the knowledge is not being passed.• What Works Clearinghouse Guide to College Access (2009)

• Academic preparation evidence = low• Communicate with students their academic preparation

evidence = low• Surround with peers and adults to build aspirations

evidence = low• Assist with critical steps for college entry evidence =

moderate• 6 Studies: 3 positive, 3 with no effect.

• Increase awareness of financial aid with help with FAFSA evidence = moderate

• 2 Studies: Both positive

• Why so little?

Why Would We Want Evaluation?

• Assess Overall Effectiveness• Provides information for stakeholders• Without evaluation, there is only “conjecture and

criticism” (Phipps 1998)

• Policy Preservation• Social Security Student Benefit Program (US)• Example of Colombia PACES Program

• Alignment and Modifications of Policies• Georgia Hope Example• Unexpected benefits and consequences• Identifying specific programmatic elements that

could lead to the impacts

Social Security Benefit Program Aid by Year

Percentage of Students Attending College

Father Not Deceased

Father Deceased

Finished Secondary School 1979-1981 54% 63%

Finished Secondary School 1982-83 49% 32%

Evidence Came too Late. Program was Cancelled.

Example of Georgia Hope Scholarship

• Georgia Hope Scholarship• Provided full tuition scholarships to Georgia students who stay in

Georgia• Students had to have a 3.0 GPA in secondary school• Stated goal of the program: Increase access to higher education

among low-income families

• Evaluation Results• Student enrollment increased in general (Cornwell, Mustard,

Sridhar 2002)• Low-Income, especially minority, enrollments did not increase

(Dynarski 2000)

Georgia Hope Scholarship (cont.)

• Gap Between Goal and Impact• Goal: Increase access for low-income• Impact: Increase access for middle- and upper-income

families but not lower-income

• Why the Failure?• Hope rewarded academic performance• HOPE required complex forms• Higher income families have

• Better secondary school performance• Greater access to college information

• How did Evaluation Impact Policy?• Academic performance requirement was reduced• Application process simplified

What Makes a Good Evaluation?

1. Comparison Strategy (“Identification Strategy”)• Research is about comparing what happened to

what might have happened

2. Data• Detailed data on program implementation and

use• Data on student outcomes

Comparison Strategy

• Core of Evaluation is Comparison

• Program effect is difference between observed outcome and outcome that would have happened without the program

• Counterfactual outcome is never observed• We cannot observe the same student with and

without assistance

• Comparison group represents the counterfactual

• Not all comparison groups are created equal

Who is the Comparison?• Suppose your strategy is to compare Student X who receives

help to similar students. There may be some unobservable differences.

• I’ll use myself as an example here.• My high school career was extremely good: Valedictorian, Near

Perfect ACT, Student Body President, All-State Football.• My counselors were energized to work with me.

• Now to the counselors, I’m a success story. They worked with me. I succeeded.

• Perhaps there is some truth to it, but I kept a little list of my goals I set at the start of high school. I made these goals at home by myself. The top of that list – a full ride to college.

• Was my success the result of advisers or some underlying drive?

Criticisms of Research on Access/Success

• Our anecdotes are compelling, but our numbers are often doubted.

• Often we base our numbers on simple comparisons of “similar” students, but our comparisons can be debated.• WWC rarely recognizes our comparisons as meeting evidence

standards. Only seven studies meet their definition of rigor, and only four of those find positive results.

• In an era of increased accountability, more rigorous evidence is required• Increased demands for “return on investment” data.

So How Do We Make It Better?

• We have to think more carefully about evaluation as we expand.• This is going to require talking to evaluators earlier in the

process.• Evaluation rarely meets rigorous standards when planned

after the fact.

• We may have to modify some expansion to accommodate evaluation.

• We need evaluation to be part of culture rather than the “E” word that we avoid.

Evaluators Have to Improve, Too

• Results are time sensitive and important for continued and future funding.

• Complex research designs are difficult to enact. Evaluators have to be creative.• Often there are tensions between programmatic goals and

good evaluation design.• Both sides have to compromise.

• We need greater communication on results and research design.

Why do Some EvaluationsShow No Impact?

• How can we often find no impacts when the stories and anecdotes emerging have so much salience?

• The key is the counterfactual. To have impact, we need to change what would have happened in the absence of the program.

• Consider my case, what was the counterfactual? Would I not have gone to college without their aid?

• This presents an enormous Catch-22 for advising.• You have to be the judge of what situations would succeed without

your help.• You have to have confidence that some would make it through the

potholes.• We have to find the ones that cannot make it without help.

Example of Rigorous Evaluation

• Angrist, Lang & Oreopoulos (2006) • Large Canadian university• Multiple Services

• Program providing support services to new college students (e.g. tutoring)

• Financial incentive for maintaining a certain grade point average in college

• 700 students applied. There was only funding for half of the students.

• Program managers used random lottery to assign students to level of treatment. They considered it a “fair” way to determine who received services.

Pre-Lottery Similarities in High School Grades

0.0

2.0

4.0

6.0

8

65 70 75 80 85 90 95High School Grade Average Used for University Admission

Control SFP/SFSP

Post-Lottery Differences in Grade Point Average (Women)

0.0

1.0

2.0

3.0

4

30 35 40 45 50 55 60 65 70 75 80 85 90 95First Term Grade Average

Control SFP/SFSP

Other Examples with Mixed Evidence?

Hanushek’s (1996) Summary of Evidence on the Effects of Inputs on Student Outcomes

Type of Study

Number of

Studies

Statistically Significant

Statistically Insignificant

Positive Negative Positive Negative Unknown

Teacher-pupil Ratio

277 15 13 27 25 20

Teacher Education

171 9 5 33 27 26

Teacher Experience

207 29 5 30 24 12

Expenditure per Pupil

163 27 7 34 19 13

Matching Strategies: Example• Classic debate on class size in secondary school• Hanushek (1986, 1989, 1996, 1997, 1998)

• Surveyed research based on matching students to “comparable” students.

• Finds no consistent effect of class size on student achievement.

• Krueger (2003)• Uses randomization in Tennessee• Finds large positive effects of class size

Why the Difference?• Krueger:

• “not all estimates are created equal”• Krueger quoting Galileo:

• ‘I say that the testimony of many has little more value than that of few, since the number of people who reason well in complicated matters is much smaller than that of those who reason badly. If reasoning were like hauling I should agree that several reasoners would be worth more than one, just as several horses can haul more sacks of grain than one can. But reasoning is like racing and not like hauling, and a single Barbary steed can outrun a hundred dray horses.’

• “Tennessee’s Project STAR is the single Barbary steed in the class size literature”

“Bottom Line”• Research depends on the quality of comparisons

• Not all comparisons are equal• Some comparisons provide information

• BUT, may hide confounding factors

• If we want better results and more secure funding streams, we need better evidence.• We need to use strategies which are not susceptible to

confounding factors.

Additional Considerations in Evaluation

• Timing of Evaluation• Gap between start and production of evidence

• Cost of Evaluation• Cost of program, evaluation, data collection

• Ethical Considerations• Provision of the service• Right of privacy

• Political Feasibility of Evaluation• Lance Pritchett: “No advocate would want to

engage in research that potentially undermines support for his/her program. Endless, but less than compelling, controversy is preferred to knowing for sure.”

So Where Do We Start?

1. Plan Ahead• Impossible to use randomization after the fact• Creating and developing data collection

instruments takes time2. Consult People Who Know Research

• NCAC’s integration of research into their expansion.

3. Take a Risk• Evaluation is risky. It may be that the program

does not work, but knowing a policy’s strengths can lead to even better policies.


Evaluate

Innovate

Experiment