87
Machine Learning and Econometrics Sendhil Mullainathan

Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Machine Learning and Econometrics

Sendhil Mullainathan

Page 2: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Understand OLS

•  The real problem here is minimizing the “wrong” thing: In-sample fit vs out-of-sample fit

AVERAGES NOTATION

Page 3: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Decision Tree Example

Page 4: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Decision Tree Example

Page 5: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Decision Tree Example

Page 6: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Fitting

•  Suppose we fit the best tree we could do to some dataset

•  What would we get?

•  How do we resolve this problem?

Page 7: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

OLS vs Subset Selection

• If problem is that we are using too manyvariables what if we…– Looked at functions that only used s of the k

variables?

• Example:– Single best variable that fits best

• Isnt there overfit here too?

Page 8: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Let's do the same thing here

Unconstrained

Constrained: why not do this instead?

fA = arg minf2FA

EHL(f(x), y)

argminf2F EHL(f(x), y)s.t. R(f) c

Complexity measure: tendency to overfit

Page 9: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Constrained minimization

• We could do a constrained minimization

• But notice that this is equivalent to:

• Want the complexity measure to capturetendency to overfit

�R(f)| {z }want: ⇡L(f)�L(f)

fA� = arg minf2FA

EHL(f(x), y)+

Page 10: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Basic insight

• Data has signal and noise

• More complex function classes-– Allow us to pick up more of the signal– But also pick up more of the noise

• So the problem of prediction becomes theproblem of choosing complexity

Page 11: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Overall Structure

• Create a regularizer that:–Measures complexity

• Penalize algorithm for choosing more expressivefunctions– Tuning parameter lambda is the price

• Let it weigh this penalty against in-sample fit

Page 12: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Decision Tree Regularizer

•  What makes a good regularizer? – Depth – Number of data points per leaf – Number of splits

•  What happens as complexity gets higher?

Page 13: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Linear Example

• Linear function class

• Regularized linear regression

Page 14: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Regularizers for Linear Functions

• Linear functions more expressive if use morevariables

• Can weight coefficients

R(f) =Pk

j=1 1�j 6=0

R(�) =kX

j=1

|�j |p

Page 15: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Computationally More Tractable

•  Lasso

•  Ridge

Page 16: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Half the Sauce

• Regularization is one half of the secret sauce

• Gives a single-dimensional way of deciding ofcapturing expressiveness

• Missing ingredient is lambda: how muchcomplexity do we want?

�R(f)| {z }want: ⇡L(f)�L(f)

fA� = arg minf2FA

EHL(f(x), y)+

Page 17: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Choosing lambda

•  How much should we penalize expressiveness?

•  How do you make the over-fit approximation tradeoff?

•  The tuning problem.

Page 18: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

The tuning problem

• Back to where we started?

• We have parametrized the tradeoff

• But we still have no way of choosing the level ofcomplexity

Page 19: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Sample SnSample New Data

ESn [L(f(x), y)]| {z }Can measure

E[L(f(x), y)]| {z }Want

Sn

Train Tune

ESTrain [L(f(x), y)]| {z }Can measure

ESTune [L(f(x), y)]| {z }Can measure

Want Out of Sample But only have In Sample

Back to our original problem In-sample: No regularization is best regularization

Page 20: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Sample SnSample New Data

ESn [L(f(x), y)]| {z }Can measure

E[L(f(x), y)]| {z }Want

Sn

Train Tune

ESTrain [L(f(x), y)]| {z }Can measure

ESTune [L(f(x), y)]| {z }Can measure

Traditional Model Selection: Structural assumptions on DGP Analytically calculate diffference

Page 21: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Sample SnSample New Data

ESn [L(f(x), y)]| {z }Can measure

E[L(f(x), y)]| {z }Want

Sn

Train Tune

ESTrain [L(f(x), y)]| {z }Can measure

ESTune [L(f(x), y)]| {z }Can measure

Traditional Model Selection: Structural assumptions on DGP Analytically calculate difference

Page 22: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Sample SnSample New Data

ESn [L(f(x), y)]| {z }Can measure

E[L(f(x), y)]| {z }Want

Sn

Train Tune

ESTrain [L(f(x), y)]| {z }Can measure

ESTune [L(f(x), y)]| {z }Can measure

Page 23: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Empirical Tuning

• But now we can see what level of regularizationdoes best out of sample

• So estimate for many values of lambda�R(f)| {z }

want: ⇡L(f)�L(f)

fA� = arg minf2FA

EHL(f(x), y)+

Page 24: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Now in this case

•  See performance of this in the new “tune” data

•  A few assumptions and… – Simple convex optimization – So choosing between infinitely many procedures

� ⇠ argminEHL(f�(x), y)

Page 25: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Overfit Dominates

Page 26: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Creating Out-of-Sample In Sample

•  Major point: – Not many assumptions – Don’t need to know true model. – Don’t need to know much about algorithm

•  Something profound here: – We use the data itself to choose complexity

•  Aside: What happens as sample goes up?

Page 27: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Why does this work?

1.  Not just because we can split a sample and call it out of sample

–  It’s because the thing we are optimizing is observable

Page 28: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

This is more than a trick

•  It illustrates what separates prediction from estimation: –  I cant ‘observe’ my prior.

•  Whether the world is truly drawn from a linear model

– But prediction quality is observable

•  Put simply: – Validity of predictions are measurable – Validity of coefficient estimators require structural

knowledge

This is the essential ingredient to prediction: Prediction quality is an empirical quantity not a theoretical guarantee

Page 29: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Why does this work?

1.  It’s because the thing we are optimizing is observable •  Notice that this works irrespective of number of

variables – This was not directly hard-wired into our

calculations

Page 30: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Why does this work?

1. It’s because the thing we are optimizing is observable 2. By focusing on prediction quality we have reduced dimensionality

Page 31: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

To understand this…

•  Suppose you tried to use this to choose coefficients –  Ask which set of coefficients worked well out-of sample.

•  Does this work? •  Problem 1: Estimation quality is unobservable –  Need the same assumptions as algorithm to know whether

you “work” out of sample •  If you just go by fit you are ceding to say you want best predicting

model

–  Coefficients dont exist in the same way predictions do

•  Problem 2: No dimensionality reduction. –  You’ve got as many coefficients as before to search over

Page 32: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

We can be more efficient than this

•  Will use a tool called cross-validation

•  Basic insight: – Why not use the hold-out to estimate another

function and see how it does on the train set?

Page 33: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Train

Tune

Cross Validation

Tuning Set = 1/5 of Training Set

Page 34: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Some Notation

•  Cross-validation is used for tuning

•  But after we’ve done that, we cannot use it also to evaluate how well our algorithm is doing

•  Why??

Page 35: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Secret Sauce

•  Key ingredients 1.  Dimensionality reduction through regularization 2.  A focus on predictions means quality observable •  Which means we can empirically tune

Page 36: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Data

Engineering Fitting

Testing

FittingSample

Hold-OutSample

f

Prediction

[L,L]L, f

Loss

Function

Overview of ML Playbook

Page 37: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Use out-of-sample

Performance to

choose

ˆ�

8�: Fitpredictors

testable

out-of-sample

Use

ˆ� to

form

ˆf�

Train Tune Output

Fitting

FittingSample

T

�OLS

28,573

Use out-of-sample

Performance to

choose

ˆ�

8�: Fitpredictors

testable

out-of-sample

{F1, .., Fk}

{f�j� }

Use

ˆ� to

form

ˆf

Divide data T

into k folds:

(yi, xi) 2 F�(i)

8�: Estimate

ˆ

f

�j� on

T \ Fj

Sometimes instead:

1k

Pkj=1

ˆf�j

Train Tune Output

Fitting

Fit on Tusing

ˆ�and output

ˆfA,T

Which �

leads to

best prediction

of yi

by

ˆ

f

��(i)� (xi)?

FittingSample

A�

T

f

Page 38: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Data

Engineering Fitting

Testing

FittingSample

Hold-OutSample

f

Prediction

[L,L]L, f

Loss

Function

Overview of Steps

Page 39: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Applications of Machine Learning

•  New Data

•  Prediction in Policy

Page 40: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Applications of Machine Learning

•  New Data

•  Prediction in Policy

Page 41: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

New Data

•  An Example

Page 42: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning
Page 43: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Xie  et.  al.  (2016)  

Page 44: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning
Page 45: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

What does this have to do with ML?

•  Processing of data requires machine learning

Page 46: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Blumenstock  2015  

Page 47: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning
Page 48: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

What does this have to do with ML?

•  Processing of data requires machine learning

Page 49: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Crop  Yield  

Page 50: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning
Page 51: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

What does this have to do with ML?

•  Processing of data requires machine learning

•  Two kinds of processing: – Pre-processing

•  Extracting any sort of features from image

– Processing •  Conversion of features to economically meaningful

units

Find  Farms  

Relate  visual  features  to  yield  

Page 52: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Considerations

•  Need training data – Hand labeling – Merging to other data sets

•  Don’t be stingy

Page 53: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

New Data

•  An Example

•  Kinds of New Data – Satellite Data – Language data

Page 54: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

"This  class  was  a  religious  experience  for  me...  I  had  to  take  it  all  on  faith."  

"I  am  convinced  that  you  can  learn  by  osmosis  by  just  siKng  in  his  class."  

"Most  of  us  spent  the  1st  3  weeks  terrified  of  the  class.  Then  solidarity  kicked  in."  

"The  course  was  very  thorough.  What  wasn't  covered  in  class  was  covered  on  the  final  exam."  

TEXT  

Page 55: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Language Features

•  Bag of words

Page 56: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Bag of Words

"This  class  was  a  religious  experience  for  me...  I  had  to  take  it  all  on  faith."  

"I  am  convinced  that  you  can  learn  by  osmosis  by  just  siKng  in  his  class."  

"Most  of  us  spent  the  1st  3  weeks  terrified  of  the  class.  Then  solidarity  kicked  in."  

"The  course  was  very  thorough.  What  wasn't  covered  in  class  was  covered  on  the  final  exam."  

TEXT   Dic(onary  

This  Class  Was  A  Religious  Experience  For  Me  I    Had  To    Take    It  All    On    Faith  

Am  Convinced  That  You  Can  Learn  By  Osmosis  Just  SiKng  In  His      

Most  Of  Us  Spent  The  First  Three  Weeks  .  .  .      

Page 57: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

"This  class  was  a  religious  experience  for  me...  I  had  to  take  it  all  on  faith."  

This  

Am  

Most  

of  

class  

convinced  

By  

osmosis  

Three  

Weeks  

1   0   0   0   1   0   0   0   0   0  

0   1   0   0   1   1   1   1   0   0  

"I  am  convinced  that  you  can  learn  by  osmosis  by  just  siKng  in  his  class."  

Page 58: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Can Predict Which Bills Survive Committee

Yano  Smith  and  Wilkerson    

Page 59: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Can Predict Which Bills Survive Committee

Page 60: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Financial Information

Kogan  et.  al.    

10-­‐k  Forms  

Page 61: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Predicting Volatility

Page 62: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

What predicts?

Page 63: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Language Features

•  Bag of words •  Modifying bag of words: similarity/synonym •  Syntactic Structure •  Meaning: sentiment analysis

Page 64: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Can use sentiment as a features

Page 65: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Language Features

•  Bag of words •  Modifying bag of words: similarity/synonym •  Syntactic Structure •  Meaning: sentiment analysis •  LIWC

Page 66: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning
Page 67: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

New Data

•  An Example

•  Kinds of New Data – Satellite Data – Language data – Digital Exhaust

Page 68: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Google  Searches  for  “iPhone  slow”  

Page 69: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Choi  Varian  

Page 70: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

New Data

•  An Example

•  Kinds of New Data – Satellite Data – Language data – Digital Exhaust – Network Data – …...

Page 71: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Applications of Machine Learning

•  New Data

•  Prediction in Policy

Page 72: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Applications of Machine Learning

•  New Data

•  Prediction in Policy

Page 73: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Question

•  Can prediction be directly useful in policy?

•  These decisions seem inherently causal – “Should we do policy X”? –  “What will X do?” –  “What happens with and without X?”

•  In fact decisions seem inherently causal

Page 74: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Two Toy Policy Decisions

•  Rain Dance

•  Umbrella

•  Common Elements – Both are decisions with payoffs – Both rely on data of the type:

•  Y = rain, X = variables correlated with rain

– Both use data to estimate function y = f(x)

Predic`on  

Causa`on   �

y

Page 75: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Y

X

X0

Y

X

X0

Causa`on  

Rain  Rain  Dance  

Framework

Decision  

Y

X

Atmospheric  Condi`ons  

Page 76: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Y

X

X0

Y

X

X0

Predic`on  

Rain  

Atmospheric  Condi`ons  

Umbrella  

Decision  

Y

X

Framework

Page 77: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Y

X

X0

Predic`on  

Y

X

X0

Causa`on  

Rain  

Atmospheric  Condi`ons  

Rain  Dance   Rain  

Atmospheric  Condi`ons  

Umbrella  

Experiments   Machine  Learning  

Page 78: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Are there Umbrella Problems?

•  Decisions where predictions matter…

•  Where we can have big social impact

•  And with enough data

•  Prediction policy problems

Page 79: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Prediction

Page 80: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

A Policy Problem in the US

•  Each year police make over 12 million arrests

•  Where do people wait for trial?

•  Release vs. detain high stakes –  Pre-trial detention spells avg. 2-3 months (can be up to

9-12 months) – Nearly 750,000 people in jails in US – Consequential for jobs, families as well as crime

Kleinberg  Lakkaraju  Leskovec  Ludwig  and  Mullainathan  

Page 81: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Judge’s Problem

•  Judge must decide whether to release or not (bail)

•  Defendant when out on bail can behave badly: – Fail to appear at case – Commit a crime

•  The judge is making a prediction

Page 82: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

CrimeRisk

PastRecord

Release

Social Costs

PREDICTION    

Page 83: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

PREDICTION    

CAUSATION    

CrimeRisk

PastRecord

Release

Social Costs

Bracelet

Page 84: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

How big is this effect?

•  Effect of another police officer – Chaflin and McCrary 2013 – To get a 4 percentage point reduction in crime…

•  Would need ~ 40,000 officers more nationwide •  Costs more than 4.8 billion dollars per year

– Or just implement this prediction rule •  Some fixed costs and minimal flow cost

•  And we’re not even considering the other benefits

Page 85: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Important Caveat in this Analysis

•  Selective labels – The literature ignores this

•  How do we resolve it?

Page 86: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Bail Not Unique

Page 87: Machine Learning and Econometrics...• Processing of data requires machine learning . CropYield. What does this have to do with ML? • Processing of data requires machine learning

Prediction Policy Problems

•  Decision aids—not substitute for humans

•  Must resolve important policy considerations