28
DISCUSSION PAPER SERIES IZA DP No. 10535 Joshua D. Angrist Jörn‐Steffen Pischke Undergraduate Econometrics Instruction: Through Our Classes, Darkly JANUARY 2017

DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

Discussion PaPer series

IZA DP No. 10535

Joshua D. AngristJörn‐Steffen Pischke

Undergraduate Econometrics Instruction: Through Our Classes, Darkly

jANuAry 2017

Page 2: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

Any opinions expressed in this paper are those of the author(s) and not those of IZA. Research published in this series may include views on policy, but IZA takes no institutional policy positions. The IZA research network is committed to the IZA Guiding Principles of Research Integrity.The IZA Institute of Labor Economics is an independent economic research institute that conducts research in labor economics and offers evidence-based policy advice on labor market issues. Supported by the Deutsche Post Foundation, IZA runs the world’s largest network of economists, whose research aims to provide answers to the global labor market challenges of our time. Our key objective is to build bridges between academic research, policymakers and society.IZA Discussion Papers often represent preliminary work and are circulated to encourage discussion. Citation of such a paper should account for its provisional character. A revised version may be available directly from the author.

Schaumburg-Lippe-Straße 5–953113 Bonn, Germany

Phone: +49-228-3894-0Email: [email protected] www.iza.org

IZA – Institute of Labor Economics

Discussion PaPer series

IZA DP No. 10535

Undergraduate Econometrics Instruction: Through Our Classes, Darkly

jANuAry 2017

Joshua D. AngristMassachusetts Institute of Technology and IZA

Jörn‐Steffen PischkeLondon School of Economics and IZA

Page 3: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

AbstrAct

IZA DP No. 10535 jANuAry 2017

Undergraduate Econometrics Instruction: Through Our Classes, Darkly

The past half‐century has seen economic research become increasingly empirical, while

the nature of empirical economic research has also changed. In the 1960s and 1970s,

an empirical economist’s typical mission was to “explain” economic variables like wages

or GDP growth. Applied econometrics has since evolved to prioritize the estimation of

specific causal effects and empirical policy analysis over general models of outcome

determination. Yet econometric instruction remains mostly abstract, focusing on the search

for “true models” and technical concerns associated with classical regression assumptions.

Questions of research design and causality still take a back seat in the classroom, in spite

of having risen to the top of the modern empirical agenda. This essay traces the divergent

development of econometric teaching and empirical practice, arguing for a pedagogical

paradigm shift.

JEL Classification: A22

Keywords: econometrics, teaching

Corresponding author:Jörn-Steffen PischkeCentre for Economic PerformanceLondon School of EconomicsHoughton StreetLondon WC2A 2AEUnited Kingdom

E-mail: [email protected]

Page 4: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

1

As the Stones’ Age gave way to the computer age, applied econometrics wasmostly concerned with estimating the parameters governing broadly targetedtheoreticaldescriptionoftheeconomy.Canonicalexamplesincludemulti‐equationmacromodelsdescribingeconomy‐widevariables likeunemploymentandoutput,andmicromodels characterizing the choices of individual agents ormarket levelequilibria.Theempiricalframeworkofthesixtiesandseventiestypicallysoughttoexplain economic outcomeswith the aid of a long and diverse list of explanatoryvariables,butnosinglevariableofspecialinterest.

Muchofthecontemporaryempiricalagendalookstoanswerspecificquestions,rather than provide a general understanding of, say, GDP growth. This agendatargets the causal effects of a single factor, such as the effects of immigration onwages or the effects of democracy on GDP growth, often focusing on policyquestionsliketheemploymenteffectsofsubsidiesforsmallbusinessortheeffectsofmonetarypolicy. Appliedresearcherstodaylookforcrediblestrategiestoanswersuchquestions.

Empirical economics has changed markedly in recent decades, but, as wedocument below, econometric instruction has changed little. Market‐leadingeconometricstextsstill focusonassumptionsandconcernsmotivatedbyamodel‐drivenapproachtoregression,aimedtohelpstudentsproduceastatisticallypreciseaccount of the processes generating economic outcomes. Much of this materialprioritizes technical concerns over conceptual matters. We still see, for example,extended textbook discussions of functional form, whether error terms areindependent and identically distributed, and how to correct for serial correlationand heteroskedasticity. Yet this instructional edifice is not of primary importanceforthemodernempiricalagenda.Atthesametime,newerandwidely‐usedtoolsforcausal analysis, like differences‐in‐differences and regression discontinuitymethods,getcursorytextbooktreatmentifthey’rementionedatall.

How should changes in our use of econometrics change the way we teacheconometrics?

Our take on this is simple. We start with empirical strategies based onrandomizedtrialsandquasi‐experimentalmethodsbecausetheyprovideatemplatethat reveals the challenges of causal inference, and the manner in whicheconometrictoolsmeetthesechallenges. Wecallthisframeworkthedesign‐basedapproach to econometrics because the skills and strategies required to use itsuccessfully are related to research design. This viewpoint leads to our firstconcreteprescriptionforinstructionalchange:arevisioninthemannerinwhichweteachregression.

Regression should be taught the way it’s now most often used: as a tool tocontrolforconfoundingfactors.Thisapproachabandonsthetraditionalregressionframeworkinwhichallregressorsaretreatedequally.Thepedagogicalemphasisonstatisticalefficiencyandfunctional form,alongwiththesophomoricnarrativethatsets students off in pursuit of “true models” as defined by a seemingly precisestatistical fit, is ready for retirement. Instead, the focus should be on the set ofcontrol variables needed to insure that the regression‐estimated effect of thevariableofinteresthasacausalinterpretation.

Page 5: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

2

Inadditiontoaradicalrevisionofregressionpedagogy,theexponentialgrowthineconomists’useofquasi‐experimentalmethodsandrandomizedtrialsinpursuitof causal effects should move these tools to center stage in the classroom. Thedesign‐based approach emphasizes single‐equation instrumental variablesestimators, regression‐discontinuity methods, and variations on differences‐in‐differencesstrategies,whilefocusingonspecificthreatstoacausalinterpretationoftheestimatesgeneratedbythesefundamentaltools.

Finally,realempiricalworkplaysacentralroleinourclasses. Econometricsisbettertaughtbyexamplethanabstraction.

Causalquestionsandresearchdesignarenottheonlysortofeconometricworkthatremainsrelevant. Butourexperienceasteachersandresearchers leadsustoemphasizetheseskillsintheclassroom.Foronething,suchskillsarenowmuchindemand: Google and Netflix post positions flagged by keywords like causalinference, experimental design, and advertising effectiveness; Facebook’s datascienceteamfocusesonrandomizedcontrolledtrialsandcausalinference;Amazonoffersprospectiveemployeesareducedform/causal/programevaluationtrack.1

Of course, there’s econometrics to be done beyond the applied microapplications of interest to Silicon Valley and the empirical labor economics withwhichwe’repersonallymostengaged. Butthetoolswefavorarefoundational foralmostanyempiricalagenda.Professionaldiscussionsofsignaleconomiceventslikethegreatrecessionand important telecommunicationsmergersarealmostalwaysarguments over causal effects. Likewise, Janet Yellen and the hundreds ofresearcherswhosupportherattheFedcravereliableevidenceonwhetherXcausesY.Purelydescriptiveresearchremainsimportant,andthere’sarolefordata‐drivenforecasting. Applied econometricians have long been engaged in these areas, butthesevaluableskillsarethebreadandbutterofdisciplineslikestatisticsand,inthecase of prediction, computer science. These endeavors are not where ourcomparative advantage lies. Econometrics at its best is distinguished from otherdata sciencesby clear causal thinking. This sortof thinking is thereforewhatweemphasizeinourclasses.

Following abrief descriptionof the shift towarddesign‐basedempiricalwork,wefleshouttheargumentforchangebyconsideringthefoundationsofeconometricinstruction, focusingonold andnewapproaches to regression.We then look at acollection of classic and contemporary textbooks, and a sample of contemporaryreading lists and course outlines. Reading lists in our sample are more likely tocovermodernempiricalmethodsthanaretoday’smarket‐leadingbooks.Butmostcoursesremainboggeddowninboringandobsoletetechnicalmaterial.

GoodTimes,BadTimes

Theexponentialgrowth ineconomists’useofquasi‐experimentalmethodsandrandomized trials is documented in Panhans and Singleton(2016). Angrist andKrueger(1999) described an earlier empirical trend for labor economics, but thistrend is now seen in appliedmicroeconomic fieldsmore broadly. In an essay on

Page 6: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

3

changing empirical work (Angrist and Pischke,2010), we complained about themodernmacroresearchagenda,sowe’rehappytoseerecentdesign‐basedinroadseven in empirical macroeconomics (described in Fuchs‐Schuendeln andHassan,2016).Bowen,Frésard,andTaillard(forthcoming)reportontheacceleratingadoptionofquasi‐experimentalmethodsinempiricalcorporatefinance.

Design‐basedempiricalanalysisnaturallyfocusestheanalyst’sattentionontheeconometrictoolsfeaturedinthiswork.Alessobviousintellectualconsequenceoftheshift towardsdesign‐drivenresearch isachange in thewayweuseour linearregressionworkhorse.

Yesterday’sPapers(andToday’s)

Thechangedinterpretationofregressionestimatesisexemplifiedinthecontrastbetweentwostudiesofeducationproduction,SummersandWolfe(1977)andDaleand Krueger(2002). Both papers are concerned with the role of schools ingeneratinghumancapital:SummersandWolfewiththeeffectsofelementaryschoolcharacteristicsonstudentachievement;DaleandKruegerwiththeeffectsofcollegecharacteristicsonpost‐graduates’ earnings.Thesequestionsare similar innature,buttheanalysisinthetwopapersdifferssharply.

Summers andWolfe(1977) interpret theirmission to be one ofmodeling thecomplex process that generates student achievement. They begin with a generalmodel of education production that includes unspecified student characteristics,teacher characteristics, school inputs, andpeer composition. Themodel is looselymotivatedbyanappealtothetheoryofhumancapital,buttheauthorsacknowledgethatthespecificsofhowachievementisproducedremainmysterious.Whatstandsoutinthisframeworkislackofspecificity:theSummersandWolferegressionputsthechangeintestscoresfrom3rdto6thgradeonthelefthandside,withalistof29student and school characteristics on the right. This list includes family income,student IQ, sex, and race; the quality of the college attended by the teacher andteacher experience; class size and school enrollment; and measures of peercompositionandbehavior.

TheSummersandWolfepaperistruetothe1970sempiricalmission,thesearchforatruemodelwithalargenumberofexplanatoryvariables:

We are confident that the coefficients describe in a reasonable way therelationship between achieving and GSES [genetic endowment andsocioeconomic status], TQ [teacher quality], SQ [non‐teacher school quality],andPG[peergroupcharacteristics],forthiscollectionof627elementaryschoolstudents.

In the spirit of thewide‐ranging regressionanalysesof their times, Summers andWolfe offer no pride of place to any particular set of variables. At the same time,their narrative interprets regression estimates as capturing causal effects. Theydraw policy conclusions from empirical results, suggesting, for example, thatschoolsnotusetheNationalTeacherExamScoretoguidehiringdecisions.

Page 7: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

4

This interpretation of regression is in the spirit of Stones’ Age econometrics,which typically begins with a linear regression equation meant to describe aneconomic process,what somewould call a “structural relation.” Many authors ofthisAgegoontosay that inorder toobtainunbiasedorconsistentestimates, theanalyst must assume that regression errors aremean‐independent of regressors.Butsinceallregressionsproducearesidualwiththisorthogonalityproperty,foranyregressor included on the right hand side, it’s hard to see how this statementpromotesclearthinkingaboutcausaleffects.

The Dale and Krueger(2002) investigation likewise begins with a questionabout schools, askingwhether studentswho attend amore selective college earnmoreasaresult,and,likeSummersandWolfe(1977),usesOLSregressionmethodstoconstructananswer.Yettheanalysisherediffersinthreeimportantways. Thefirst is a focus on specific causal effects: there’s no effort to “explainwages.” TheDale and Krueger study compares students who attend more and less selectivecolleges.Collegequality(measuredbyschools’averageSATscore)isbutonefactorthatmightchangewages,surelyminorinanR2sense.Thishighly‐focusedinquiryisjustified by the fact that it aspires to answer a causal question of concern tostudents,parents,andpolicy‐makers.

The second distinguishing feature is a research strategy meant to eliminateselectionbias:Graduatesofeliteschoolsundoubtedlyearnmore(onaverage)thanthosewhowentelsewhere.Given thatelite schools select their studentscarefully,however, it’s clear that this difference may reflect selection bias. The Dale andKrueger paper outlines a selection‐on‐observables research strategy meant toovercomethiscentralproblem.

The Dale and Krueger research design compares individuals who sentapplicationstothesamesetofcollegesandreceivedthesameadmissiondecisions.Withingroupsdefinedbyapplicationandadmissiondecisions,studentswhoattenddifferentsortsofschoolsarefarmoresimilarthantheywouldbeinanunrestrictedsample. The Dale and Krueger study argues that any remaining within‐groupvariationintheselectivityoftheschoolattendedisessentiallyserendipitous—asgoodasrandomlyassigned—andthereforeunrelatedtoability,motivation,familybackground,andotherfactorsrelatedtointrinsicearningspotential.ThisargumentconstitutesthemostimportanteconometriccontentoftheDaleandKruegerpaper.

ThethirddifferenceoftheDaleandKruegerstudyisacleardistinctionbetweencauses and controls on the right hand side of the regression. In the modernparadigm,regressorsarenotallcreatedequal.Rather,onlyonevariableatatimeisseen as having causal effects. All others are controls included in service of thisfocusedcausalagenda.2

Ineducationproduction,forexample,coefficientsondemographicvariablesandother student characteristics areunlikely tohavea clear economic interpretation.What shouldwemake of the coefficient on IQ in the Summers‐Wolfe regression?This coefficient reveals only that twomeasures of intellectual ability, IQ and thedependent variable, are positively correlated after regression‐adjusting for otherfactors.Ontheotherhand,featuresoftheschoolenvironment, likeclasssizes,cansometimesbechangedbyschooladministrators.Wemightindeedwanttoconsidertheimplicationsofclasssizecoefficientsforeducationpolicy.

Page 8: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

5

Themoderndistinctionbetweencausalandcontrolvariablesontherighthandsideofaregressionequationrequiresmorenuancedassumptionsthantheblanketstatement of regressor‐error orthogonality that’s emblematic of the traditionalpresentation of regression. This difference in roles between right‐hand variablethatmightbe causal and those thatare just controls shouldemerge clearly in theregressionstorieswetellourstudents.

OutofControl

The modern econometric paradigm exemplified by Dale and Krueger(2002)treatsregressionasanempiricalcontrolstrategydesignedtocapturecausaleffects.Specifically, regression is an automated matchmaker that produces within‐groupcomparisons: there’s a single causal variable of interest, while other regressorsmeasure conditions and circumstances that we would like to hold fixed whenstudyingtheeffectsofthiscause.Byholdingthecontrolvariablesfixed—thatis,byincludingtheminamultivariateregressionmodel,wehopetogive theregressioncoefficienton thecausalvariableaceterisparibus, apples‐to‐apples interpretation.Wetell thisstorytoundergraduateswithoutelaboratemathematics,but the ideasare subtle and our students find them challenging. Detailed empirical examplesshowinghowregressioncanbeusedtogenerateinteresting,useful,andsurprisingcausalconclusionshelpmaketheseideasclear.

Our instructional version of the Dale and Krueger (2002) application askswhether itpays toattendaprivateuniversity,Duke, say, insteadofa state schoolliketheUniversityofNorthCarolina.Thisconvertscollegeselectivityintoasimpler,binarytreatment,sothatwecancasttheeffectsofinterestasgeneratedbysimpleon/offcomparisons.Specifically,weaskwhetherthemoneyspentonprivatecollegetuitionisjustifiedbyfutureearningsgains.Thisleadstothequestionofhowtouseregressiontoestimatethecausaleffectofprivatecollegeattendanceonearnings.

For starters,weusenotation that distinguishesbetween cause and control. Inthiscase,thecausalregressoris ,adummyvariablethatindicatesattendanceataprivatecollege for individual .Controlvariablesaredenotedby ,orgivenothernames when specific controls are noteworthy, but in all cases distinct from theprivilegedcausal variable, . Theoutcomeof interest, , is ameasureofearningsroughly20yearspost‐enrollment.

The causal relationship between private college attendance and earnings isdescribedintermsofpotentialoutcomes: representstheearningsofindividual wereheor she togoprivate ( 1 ,and represents ′searningsafterapubliceducation( 0 .Thecausaleffectofattendingaprivatecollegeforindividual isthedifference, .Thisdifferencecanneverbeseen;rather,weseeonly or, depending on the value of . The analyst’s goal is therefore to measure an

averagecausaleffect,likeE .AtMIT (wherewehaveboth taught),weaskourprivate‐college econometrics

studentstoconsidertheirpersonalcounterfactual hadtheymadeapublic‐schoolchoiceinsteadofcomingtoMIT.Someofourstudentsareseniorswhohavelinedupjobswith the likesofGoogleandGoldman.Manyof thepeople theyworkwithatthesefirms—perhapsthemajority—willhavegonetostateschools.Inviewofthis

Page 9: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

6

fact,weaskourstudentstoconsiderwhetherMIT‐styleprivatecollegesreallymakeadifferencewhenitcomestocareersuccess.

Thefirstcontributionofacausal frameworkbasedonpotentialoutcomesis toexplainwhynaivecomparisonsofpublicandprivatecollegegraduatesarelikelytobe misleading. The second is to explain how an appropriately constructedregressionstrategyleadsustosomethingbetter.

Naïve comparisons between private and public alumni confound the averagecausal effect of private attendance with selection bias. The selection bias herereflects the fact that students who go to private colleges are, on average, fromstrongerfamilybackgroundsandprobablymoremotivatedandbetterpreparedforcollege.Thesecharacteristicsarereflectedintheirpotentialearnings,thatis,inhowmuchtheycouldearnwithoutthebenefitofaprivatecollegedegree. If thosewhoendupattendingprivateschoolshadinsteadattendedpublicschools,theyprobablywould have had higher incomes anyway. This reflects the fact that public andprivatestudentshavedifferent ’s,onaverage.

Tous, themostnatural anduseful presentationof regression is as amodel ofpotentialoutcomes.Writepotentialearningsinthepubliccollegescenarioas

, where is the mean of and is the difference between this potentialoutcomeanditsmean.Supposefurtherthatthedifferenceinpotentialoutcomesisa constant, , so we can write . Putting the pieces together gives acausalmodelforobservedearnings

. Selection bias amounts to the statement that and hence depends (in astatisticalsense)on .

Theroadtoaregression‐basedsolutiontotheproblemofselectionbiasbeginswith the claim that the analyst has information that can be used to eliminateselection bias, that is, to purge of it’s correlation with . In particular, theregression matchmaker postulates a control variable (or perhaps a set ofcontrols). Conditional on this control variable, the private and public earningscomparisonisapples‐to‐apples,at leastonaverage,sothosebeingcomparedhavethe same average ’s or ’s. This ceterisparibus‐type claim is embodied in theconditional independenceassumption that ultimately gives regression estimates acausalinterpretation:

| , | .

Notice that this is a weaker and more focused assumption than the traditionalpresentation,whichsaysthattheerrortermismean‐independentofallregressors,thatis | , 0.Weelaborateonthissubtledifferenceshortly.

IntheDaleandKruegerstudy,thevariable identifiestheschoolstowhichthecollege graduates in the sample had applied and were admitted. The conditionalindependenceassumptionssays that,havingapplied toDukeandUNCandhavingbeen admitted to both, those who chose to attend Duke have the same earnings

Page 10: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

7

potentialasthosewhowenttothestateschool.Althoughsuchconditioningdoesnotturncollegeattendanceintoarandomizedtrial, itprovidesacompellingsourceofcontrolforthemajorforcesconfoundingcausalinference.Studentspickschoolsinviewoftheirambitionandwillingnesstodotherequiredwork;admissionsofficeslookcarefullyatability.

Weclosethelooplinkingcausalinferencewithlinearregressionbyintroducinga functional form hypothesis, specifically that the conditional mean of potentialearnings when attending a public school is a linear function of . This can bewrittenformallyas | .Econometricstextsfretatlengthaboutlinearityand its limitations, butwe see such hand‐wringing asmisplaced. In the Dale andKrueger research design, the controls are a large set of dummies for all possibleapplicant groups. The key controls in this case come in the form of a saturatedmodel, that is an exhaustive set of dummies for all possible values of theconditioning variable. Suchmodels are inherently linear. In other cases,we cancome as close as we like to the underlying conditional mean function by addingpolynomial terms and interactions. When samples are small, we happily uselinearitytointerpolate,usingthedatawehavemoreefficiently.

Combining these three ingredients, constant causal effects, conditionalindependence, and a linearmodel for potential outcomes conditional on controls,producestheregressionmodel

,

which can be used to construct unbiased and consistent estimates of the causaleffectofprivateschoolattendance, .Thecausalstory thatbringsus to thispointrevealswhatwemeanby andwhywe’reusingregressiontoestimateit.

This final equation looks like many seen in market‐leading texts. But thisapparentsimilarityis lesshelpfulthanasourceofconfusion. Inourexperience, toshow this on a slide and call out assumptions about the correlationof regressorsand yieldscausalconfusion.Asfarasthecontrolvariablesgo,regressor‐residualorthogonalityisassuredratherthanassumed.Atthesametime,whilethecontrolsare surely uncorrelated with the residual term, it’s unlikely that the regressioncoefficientsmultiplyingthecontrolshaveacausalinterpretation.Wedon’timaginethat the controls are as good as randomly assigned andweneedn’t carewhetherthey are.The controls have a job to do: theyarethefoundation fortheconditionalindependenceclaim.Providedthecontrolsmakethisclaimplausible,thecoefficientcanbeseenasacausaleffect.Themodernregressionparadigmturnsonthenotionthattheanalysthasdata

oncontrolvariablesthatgenerateapples‐to‐applescomparisonsforthevariableofinterest.DaleandKrueger(2002)explainwhatthismeansintheirstudy:

If, conditional on gaining admission, students choose to attend schools forreasons that are independent of [unobserved determinants of earnings] thenstudents who were accepted and rejected by the same set of schools wouldhave the sameexpectedvalueof [thesedeterminants, the error term in their

Page 11: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

8

model].Consequently,ourproposedsolutiontotheschoolselectionproblemistoincludeanunrestrictedsetofdummyvariablesindicatinggroupsofstudentswho received the same admissions decisions (i.e., the same combination ofacceptancesandrejections)fromthesamesetofcolleges.OuranalysisoftheDaleandKruegerdata(reportedinChapter2ofAngristand

Pischke,2015) generates a large uncontrolled private school effect of 13.5 logpoints. This shrinks to 8.6 log points after controlling for the student’s own SATscores, his or her family income, and a few more demographic variables. Butcontrolling for the schools to which a student applied and was admitted (usingmanydummyvariables)yieldsasmallandstatistically insignificantprivateschooleffectoflessthan1percent.

Comparingregressionresultswithincreasingnumbersofcontrolsinthisway—thatis,uncontrolledresults,resultswithcrudecontrols,andresultswithacontrolvariablethatmoreplausiblyaddressestheissueofselectionbias—offerspowerfulinsights.Theseinsightshelpstudentsunderstandwhythelastmodelismorelikelytohaveacausalinterpretationthanthefirsttwo.

First, we note in discussing these results that the large uncontrolled privatedifferential inwages is apparentlydriven entirelyby selectionbias.We learn thisfromthefactthattheraweffectvanishesaftercontrollingforstudents’pre‐collegeattributes, in this case ambition and ability as reflected in the set of schools astudentappliestoandqualifiesfor.Ofcourse,theremaystillbeselectionbiasintheprivate‐publiccontrastconditionalonthesecontrols.Butbecause thecontrolsarecoded from application and admissions decisions that predate college enrollmentdecisions, they cannot themselves be a consequenceof private school attendance.They must be associated with differences in that generate selection bias.Eliminating these differences, that is, comparing students with similar ’s, istherefore likely to generate private school effects that are less misleading thansimplermodelsomittingthesecontrols.

Wealsoshowourstudentsthatafterconditioningonapplicationandadmissionsvariables, ability and family background variables in the form of SAT scores andfamilyincomeareuncorrelatedwithprivateschoolattendance.Thefindingofazeroprivate‐school return is therefore insensitive to furthercontrolbeyonda coreset.Thisargumentuses theomittedvariablesbias formula,whichweseeasakindofgolden rule for themodern regression practitioner (the rest of the framework iscommentary). Our regression estimates reveal robustness to further control thatwe’dexpecttoseeinawell‐runrandomizedtrial.

Usingasimilaromitted‐variables‐typeargument,wenotethatevenifthereareother confounders that we haven’t controlled for, those that are positivelycorrelatedwithprivateschoolattendancearelikelytobepositivelycorrelatedwithearnings aswell. Even if these variables remainomitted, their omission leads theestimatescomputedwith thevariablesathand tooverestimate theprivateschoolpremium,smallasitalreadyis.

Empiricalapplicationslikethisdemonstratethemodernapproachtoregression,highlighting the nuanced assumptions needed for a causal interpretation ofregression parameters.3 If the conditional independence assumption is violated,

Page 12: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

9

regression methods fail to uncover causal effects and are likely misleading.Otherwise, there’s hope for causal inference. Alas, the regression topics thatdominate econometrics teaching, including extensive discussions of classicalregression assumptions, functional form,multicollinearity, andmatters related tostatistical inference and efficiency, pale in importance next to this live‐or‐die factaboutregression‐basedresearchdesigns.

Whichisnottosaythatcausalinferenceusingregressionmethodshasnowbeenmadeeasy.Thequestionofwhatmakesagoodcontrolvariableisoneofthemostchallenging inempiricalpractice.Candidate controlvariables shouldbe judgedbywhether theymake theconditional independenceassumptionmoreplausible, andit’s often hard to tell. We therefore discuss many regression examples with ourstudents,allinteresting,butsomemoreconvincingthanothers.Aparticularworryis that not all controls are good controls, even if they’re related to both and .Specificexamplesanddiscussionquestions—“Shouldyoucontrolforoccupationina wage equation meant to measure the economic returns to schooling?” —illuminatethebadcontrolissueandthereforewarranttimeintheclassroom(andinourbooks,AngristandPischke2009,2015).TakeItorLeaveIt:ClassicalRegressionConcerns

It is easiest to derive a controlled regression using the potential outcomesframeworkandtheconditionalindependenceassumptionwhenthecausaleffectisthesameforeveryone,asassumedabove.Whilethisisanattractivesimplificationfor expository purposes, the key result is remarkably general. As long as theregression function is suitably flexible, the regression parameter capturing thecausal effect of interested is a weighted average of underlying covariate‐specificcausaleffects.Infact,withdiscretecontrols,regressioncanbeviewedasamatchingestimatorthatautomatestheestimationofmanypossiblyheterogeneouscovariate‐specifictreatmenteffects,producingasingleweightedaverageinoneeasystep.

Moregenerally,linearityoftheregressionfunctionisbestseenasaconvenientapproximation to more general underlying functional forms. This claim issupported by pioneering theoretical studies such as White(1980b) andChamberlain(1982). To the best of our knowledge, the first textbook to highlightthese properties isGoldberger(1991), a graduate text never inwide use andonerarelyseen inundergraduatecourses.Angrist(1998),AngristandKrueger(1999),andAngristandPischke(2009)developthetheoreticalargumentthatregressionisamatchingestimatorforaveragetreatmenteffects(seealsoYitzhaki,1996).

An important consequence of this approximation and matchmaking view ofregressionisthattheassumptionsbehindthetextbooklinearregressionmodelareboth implausible and irrelevant. Heteroskedasticity arises naturally as a result ofvariationontheclosenessbetweenaregressionfitandtheunderlyingconditionalmeanfunctionitapproximates,butthefactthatthequalityofthefitmayvarydoesnot obviate the value of regression as a summarizer of economically meaningfulcausalrelationships.

Classical regression assumptions are helpful for the derivation of regressionstandard errors. They simplify the math and the resulting formula reveals thefeaturesofthedatathatdeterminestatisticalprecision.Thisderivationtakes little

Page 13: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

10

of our class time, however. We don’t dwell on statistical tests for the validity ofassumptionsorongeneralizedleastsquaresfix‐upsfortheirfailures.Itseemstousthatmostofwhat isusuallytaughtoninferenceinan introductoryundergraduateclasscanbereplacedwiththephrase“userobuststandarderrors.”Withacautionaboutblindrelianceonasymptoticapproximations,wesuggestourstudentsfollowcurrentresearchpractice.AsnotedbyWhite(1980a)andothers,therobustformulaaddresses the statistical consequences of heteroskedasticity and non‐linearity incross‐sectionaldata.AutocorrelationintimeseriesdatacansimilarlybehandledbyNeweyandWest(1987)standarderrors,whileclustermethodsaddresscorrelationacross cross‐sectional units or in panel data (Moulton,1986; Arellano,1987;Bertrand,Duflo,andMullainathan,2004).

InAnotherLand:EconometricsTextsandTeaching

Traditional econometrics textbooks are thin on empirical examples. InJohnston’s(1972)classictext,thefirstempiricalapplicationisabivariateregressionlinkingroadcasualtiestothenumberoflicensedvehicles.Thisexamplefocusesoncomputation,anunderstandableconcernat the time,but Johnstondoesn’texplainwhy the relationship between casualties and licenses is interesting or what theestimatesmightmean.Gujarati’s(1978)firstempiricalexampleismoresubstantive,a Cobb‐Douglas production function estimated with a few annual observations.Production functions, implicitly causal, are a fundamental building block ofeconomictheory.Thediscussionherehelpfullyinterpretsmagnitudesandconsiderswhethertheestimatesmightbeconsistentwithconstantreturns toscale.But thisapplicationdoesn’tappearuntilpage107.

Decades later,realempiricalworkwasstillsparse inthe leadingtexts,andthepresentation of empirical examples often remained focused onmathematical andstatistical technicalities. In an essay published fifteen years ago, Becker andGreene(2001) surveyed econometrics texts and teaching at the turn of themillennium:

Econometricsandstatisticsareoftentaughtasbranchesofmathematics,evenwhen taught in business schools ... the focus in the textbooks and teachingmaterials is on presenting and explaining theory and technical details withsecondaryattentiongiventoapplications,whichareoftenmanufacturedtofitthe procedure at hand ... applications are rarely based on events reported infinancialnewspapers,businessmagazinesorscholarlyjournalsineconomics.

Followingabroadertrendtowardsempiricismineconomicresearch(documented,forexample, inHamermesh,2013andAngristetal.,2017), today’s textsaremoreempiricalthanthosethey’vereplaced.Inparticular,moderneconometricstextsaremore likely than those described by Becker and Greene to integrate empiricalexamples throughout, andoften comewithwebsiteswhere students can find realeconomicdataforproblemsetsandpractice.

Page 14: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

11

But the news on the textbook front is not all good.Many of today’s textbookexamples are still contrived or poorly motivated. From our viewpoint, moredisappointingthantheunevenqualityofempiricalapplicationsinthecontemporaryeconometrics library is the failure to discuss modern empirical tools. Other thanStock andWatson (2015),which comes closest to embracing themodern agenda,none of themodern undergraduate econometrics texts surveyed belowmentionsregression‐discontinuity methods, for example. Likewise, we see little or nodiscussionof the threats tovalidity thatmightconfounddifferences‐in‐differencesstyle policy analysis, even though empirical work of this sort is now ubiquitous.Econometricstextsappeartoriskbecomingirrelevanttoempiricalpractice.

To put these and other claims about textbook content on a firmer empiricalfoundation, we classified the content of 12 books, six from the 1970s and sixcurrentlyinwideuse.Ourlistofclassicswasconstructedbyidentifying1970s‐eraeditions of the volumes in Becker and Greene’s (2001) Table 1, which listsundergraduate textbooks in wide use when they wrote their essay. We boughtcopiesoftheseolderfirstorsecondeditionbooks.OurlistofclassictextscontainsKmenta (1971), Johnston (1972), Pindyck and Rubinfeld (1976), Gujarati (1978),Intriligator (1978), and Kennedy (1979). The divide between graduate andundergraduate books was murkier in the 1970s: unlike today’s undergraduatebooks, some of these older texts use linear algebra. Intriligator, Johnston, andKmenta are noticeably more advanced than the other three. We thereforesummarize1970sbookcontentwithandwithoutthesethreeincluded.

Ourcontemporarytextsarethesixmostoftenlistedbooksonreadinglistsfoundon the Open Syllabus Project web site (http://opensyllabusproject.org/).Specifically, our modern market leaders are those found at the top of a listgenerated by filtering the Project’s “syllabus explorer” search engine for“Economics” and then searching for “Econometrics.” The resulting list consists ofKennedy(2008),GujaratiandPorter(2009),StockandWatson(2015),Wooldridge(2016),Dougherty(2016),andStudenmund(2017).4

Recognizingthatsuchanendeavorwillalwaysbeimperfect,weclassifiedbookcontentintothecategoriesshowninTable1.Thisschemecoversthevastmajorityofthematerialinthe12booksdescribedinTable2,aswellasinmanyotherswe’veused or read. Our classification scheme also covers three of the tools for whichgrowth in usage appears most impressive in the bibliometric data tabulated byPanhans and Singleton(2016), specifically, instrumental variables, regression‐discontinuitymethods,anddifferences–in‐differencesestimators.5Ourclassificationstrategy counts pages devoted to each topic, omittingmaterial in appendices andexercises, and omitting remedial material on mathematics and statistics.Independently,wealso countedpagesdevoted to real empirical examples, that is,presentationsofeconometric resultscomputedusinggenuineeconomicdata.Thisschemeforcountingexamplesomitsthemanytextbookillustrationsthatusemade‐upnumbers.NotFadeAway

For the most part, legacy texts have a uniform structure: they begin byintroducing a linearmodel for an economicoutcomevariable, followed closelyby

Page 15: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

12

stating the assumption that the error term is either mean‐independent of oruncorrelatedwith regressors. The purpose of thismodel—whether it’s a causalrelationshipinthesenseofdescribingtheconsequencesofregressormanipulation,astatisticalforecastingtool,oraparameterizedconditionalexpectationfunction—isusuallyunclear.

The textbook introduction of a linear model with orthogonal or mean‐independent errors is typically followed by a list of technical assumptions likehomoskedasticity, variable (yet non‐stochastic!) regressors, and lack ofmulticollinearity. These assumptions are used to derive the good statisticalproperties of the ordinary least squares estimator in the classical linear model:unbiasedness,simpleformulasforstandarderrors,andtheGauss‐MarkovTheorem,(inwhichOLSisshowntobeabestlinearunbiasednessestimator,orBLUE).Aswereport in Table 2, this initial discussion of Regression Properties consumes anaverage of 11 to 12 percent of the classic textbooks. Regression Inference, whichusually comes next, gets an average of roughly 13 percent of page space in thesetraditionalbooks.

Themostdeeplycoveredtopicinourtaxonomy,accountingforabout20percentofmaterial,isAssumptionFailuresandFix‐ups.Thisincludesdiagnosticsandfirstaidfor problems like autocorrelation, heteroskedasticity, andmulticollinearity. Reliefformostofthesemaladiescomesintheformofgeneralizedleastsquares.Anotherimportant topic in legacy texts is Simultaneous EquationsModels, consuming 14percentofpagespaceinthemoreelementarytexts. Thepercentagegivenovertoorthodoxsimultaneousequationsmodelrisesto18whenthesampleincludesmoreadvanced texts. Theseolderbooks alsodevote considerable space toTimeSeries,whilePanelDatagetlittleattentionacrosstheboard.

AstrikingfeatureofTable2ishowsimilarthedistributionoftopiccoverageincontemporary market leading econometrics texts is to the distribution in theclassics.AsintheStones’Age,welloverhalfofthematerialincontemporarytextsisconcerned with Regression Properties, Regression Inference, Functional Form, andAssumptionFailuresandFix‐ups.TheclearestchangeacrossbookgenerationsisthereducedspaceallocatedtoSimultaneousEquationsModels.Thispresumablyreflectsthe decline in the use of an orthodox multi‐equation framework, especially inmacroeconomics.ThereducedcoverageofSimultaneousEquationshasmadespacefor modest attention to Panel Data and Causal Effects, but the biggest singleexpansion in modern texts has been in the coverage of FunctionalForm (mostlydiscretechoiceandlimiteddependentvariablemodels).

Someofthevolumesonourcurrentbooklisthavebeenthroughmanyeditions,withfirsteditionspublishedintheStones’Age. It’sperhapsunsurprisingthatthetopic distribution in Gujarati and Porter (2009) looks a lot like that in Gujarati(1978).Butmorerecententrantstothetextbookmarketalsodeviatelittlefromtheclassictemplate.Onthepositiveside,recentmarketentrantsaremorelikelytoatleastmentionmoderntopics.

ThebottomrowofTable2 reveals themoderateuseof empiricalexamples intheStones’Age:about15percentofpagesintheclassicsaredevotedtoillustrationsinvolvingrealdata.Thisaverageconcealsafairbitofvariation,rangingfromzero(no examples at all) tomore than a third of page space devoted to applications.

Page 16: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

13

Remarkably,themostempiricallyorientedtextbookinour12booksampleremainsPindyckandRubinfeld(1976),oneoftheclassics.Althoughthefieldhasmovedtoan average empirical content rate between 25‐30 percent, no contemporary textquitematchestheircoverageofexamples.6

BLUETurnstoGrey:EconometricsCourseCoverageMany instructors rely heavily on their lecture notes, using textbooks only as asupplementor a sourceof exercises.Wemight therefore seemoreof themodernempiricalparadigmincourseoutlinesandreadingliststhanweseeintextbooks.Toexplore this possibility, we collected syllabuses and lecture schedules forundergraduate econometrics courses from a wide variety of colleges anduniversities.7

Oursampleofsyllabusescoverscoursesat thetenlargestcampuses ineachofeight types of institutions.The eight groups are research universities (very highactivity), research universities (high activity), doctoral/research universities, andbaccalaureatecolleges,witheachofthesefoursplitintopublicandprivateschools.The resulting sample includes diverse institutions likeOhio StateUniversity,NewYorkUniversity,HarvardUniversity,EastCarolinaUniversity,AmericanUniversity,U.S.MilitaryAcademy,TexasChristianUniversity,CalvinCollege,andHopeCollege.Each of the eight types of schools we targeted is represented in the sample, butlargerandmoreprestigious institutionsareover‐represented.Mostsyllabusesareforcoursestaughtsince2014,buttheoldestisfrom2009.Afewschoolscontributemorethanonesyllabus,buttheseareaveragedsoeachschoolcontributesonlyoneobservation to our tabulations. The appendix lists the 38 schools included in thesyllabusdataset.

For each school contributing course information, we recorded whether thetopics listed in Table 1 are covered. A subset of schools also provided detailedlecture‐by‐lecture schedules that show the time devoted to each topic. It’s worthnoting that the amount of information that canbe gleaned from reading lists andcourseschedulesvariesacrosscourses.Forexample,mostsyllabusescovermaterialwe’veclassifiedasMultivariateRegression,butsomedon’t listRegressionInferenceseparately,presumablycoveringinferenceaspartoftheregressionmodulewithoutspellingthisoutonthereadinglist.Asaresult,broadertopicsappeartogetmorecoverage.

With this caveat inmind, the firstcolumnofTable3suggestsadistributionofeconometric lecture time thathasmuch in commonwith the topicdistribution intextbooks. Inparticular,welloverhalfofclass timegoes to lecturesonRegressionProperties, Regression Inference, Assumption Failures and Fix‐ups, and FunctionalForm.Consistentwiththisdistribution,column2ofthetablerevealsthat,exceptforRegressionProperties,thesetopicsarecoveredbymostreadinglists.TheRegressionPropertiestopicisverylikelycoveredunderotherregressionheadings.

Also paralleling the textbook material described in Table 2, our tabulation oflecture time shows that just under 6 percent of course schedules is devoted tocoverageoftopicsrelatedtoCausalEffects,Differences‐in‐differencesandRegressionDiscontinuityMethods. This is only a modest step beyond the modern textbook

Page 17: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

14

averageof3.6percentforthissetoftopics.Single‐equation InstrumentalVariablesmethods get only 3.9 percent of lecture time, less thanwe see in the average fortextbooks,botholdandnew.

Alwayslookingonthebrightsideoflife,wehappilynotethatTable3showsthatover a quarter of our sampled instructors allocate at least some lecture time toCausalEffectsandDifferences‐in‐differences.Ahealthyminority(nearly17percent)alsofindtimeforatleastsomediscussionofRegressionDiscontinuityMethods.Thissuggests thateconometric instructorsareaheadof theeconometricsbookmarket.Manyyounger instructorswillhaveusedmodernempiricalmethodsintheirPh.D.work, so they probablywant to share thismaterialwith their students. Textbookauthorsareprobablyolder,onaverage,thaninstructors,andthereforelesslikelytohavepersonalexperiencewithtoolsemphasizedbythemoderncausalagenda.

OutofTime

Undergraduate econometrics instructions is overdue for a paradigm shift inthreedirections.Oneisafocusoncausalquestionsandempiricalexamples,ratherthan models and math. Another is a revision of the anachronistic classicalregression framework, away from explaining economic processes and towardscontrolled statistical comparisons. The third is an emphasis on modern quasi‐experimentaltools.

Werecognize that change ishard.Ourownreading listsofadecadeor soagolookmuchlike thosewe’vesummarizedhere.Butourapproachto instructionhasevolved aswe’vepondered thedisturbing gapbetweenwhatwedo andwhatweteach. The econometrics we use in our research is interesting, relevant, andsatisfying.

Whyshouldn’tourstudentsgetsomesatisfactiontoo?

Page 18: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

15

AcknowledgementsOurthankstoJasperClarkberg,GinaLi,BeataShuster,andCarolynSteinforexpertresearchassistance,totheeditorsMarkGertler,GordonHanson,EnricoMoretti,andTimothyTaylor, and toAlbertoAbadie,DaronAcemoglu,DavidAutor,DanFetter,Jon Gruber, Bruce Hansen, Derek Neal, Parak Pathak, and JeffreyWooldridge forcomments.

Page 19: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

16

ReferencesAngrist,JoshuaD.1998.“EstimatingtheLaborMarketImpactofVoluntary

MilitaryServiceUsingSocialSecurityDataonMilitaryApplicants.”Econometrica66(2):249—88.

Angrist,JoshuaD.,PierreAzoulay,GlennEllison,RyanHill,andSusanLu.2017.“EconomicResearchEvolves:CitationstoFieldsandStyles.”AmericanEconomicReview107(5):forthcoming.

Angrist,JoshuaD.andAlanB.Krueger.1999.“EmpiricalStrategiesinLaborEconomics.”Chapter23inHandbookofLaborEconomics,editedbyOrleyAshenfelter,andDavidCard,volume3,1277—1366.Elsevier.

Angrist,JoshuaD.andJörn‐SteffenPischke.2009.MostlyHarmlessEconometrics:AnEmpiricist’sCompanion.PrincetonUniversityPress.

Angrist,JoshuaD.andJörn‐SteffenPischke.2010.“TheCredibilityRevolutioninEmpiricalEconomics:HowBetterResearchDesignIsTakingtheConoutofEconometrics.”JournalofEconomicPerspectives24(2):3—30.

Angrist,JoshuaD.andJörn‐SteffenPischke.2015.MasteringMetrics:ThePathfromCausetoEffect.PrincetonUniversityPress.

Arcidiacono,Peter,EstebanM.Aucejo,andV.JosephHotz.2016.“UniversityDifferencesintheGraduationofMinoritiesinSTEMFields:EvidencefromCalifornia.”AmericanEconomicReview106(3):525—62.

Arellano,Manuel.1987.“ComputingRobustStandardErrorsforWithin‐groupsEstimators.”OxfordBulletinofEconomicsandStatistics49(4):431—34.

Ayres,Ian2007.SuperCrunchers.BantamBooks.Becker,WilliamE.,andWilliamH.Greene.2001.“TeachingStatisticsand

EconometricstoUndergraduates.”JournalofEconomicPerspectives15(4):169—82.

Bertrand,Marianne,EstherDuflo,andSendhilMullainathan.2004.“HowMuchShouldWeTrustDifferences‐In‐DifferencesEstimates?”QuarterlyJournalofEconomics119(1):249—75.

Bowen,DonaldE.III,LaurentFrésard,andJérômeP.Taillard.Forthcoming.“What’sYourIdentificationStrategy?InnovationinCorporateFinanceResearch.”ManagementScience.

Brynjolfsson,Erik,andAndrewMcAfee.2011.“TheBigDataBoomIstheInnovationStoryofOurTime.”TheAtlanticNovember21.

Chamberlain,Gary.1982.“MultivariateRegressionModelsforPanelData.”JournalofEconometrics18(1):5—46.

Chamberlain,Gary.1984.“PanelData.”Chapter22inHandbookofEconometrics,editedbyZviGrilichesandMichaelD.Intriligator,volume2,1247—1318.Elsevier.

Christian,Brian.2012.“TheA/BTest:InsidetheTechnologyThat’sChangingtheRulesofBusiness,”WiredApril25.

Dale,StacyBerg,andAlanB.Krueger.2002.“EstimatingthePayofftoAttendingaMoreSelectiveCollege:AnApplicationofSelectiononObservablesandUnobservables.”QuarterlyJournalofEconomics117(4):1491—1527.

Page 20: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

17

Fuchs‐Schündeln,Nicola,andTarekA.Hassan.2016.“NaturalExperimentsinMacroeconomics.”Chapter12inHandbookofMacroeconomics,editedbyJohnB.TaylorandHaraldUhlig,volume2,923—1012.Elsevier.

Goldberger,ArthurS.1991.ACourseinEconometrics.HarvardUniversityPress.Hamermesh,DanielS.2013.“SixDecadesofTopEconomicsPublishing:Whoand

How?”JournalofEconomicLiterature51(1):162—72.Hayashi,Fumio.(2000):Econometrics.PrincetonUniversityPress.Kohavi,Ron.2015.“OnlineControlledExperiments:LessonsfromRunningA/B/n

Testsfor12Years.”Proceedingsofthe21thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining.ACM.

Maddala,G.S.1977.Econometrics.McGraw‐Hill.MoultonBrentR.1986.“RandomGroupEffectsandthePrecisionofRegression

Estimates”JournalofEconometrics32(3):385—97.Newey,WhitneyK.andKennethD.West.1987.“ASimple,PositiveSemi‐

definite,HeteroskedasticityandAutocorrelationConsistentCovarianceMatrix”Econometrica55(3):703—08.

Panhans,MatthewT.andJohnD.Singleton.2016.“TheEmpiricalEconomist’sToolkit:FromModelstoMethods”WorkingPaper2015‐3,CenterfortheHistoryofPoliticalEconomy.

Summers,AnitaA.andBarbaraL.Wolfe.1977.“DoSchoolsMakeaDifference?”AmericanEconomicReview67(4):639—52.

White,Halbert.1980a.“AHeteroskedasticity‐ConsistentCovarianceMatrixEstimatorandaDirectTestforHeteroskedasticity.”Econometrica48(4):817—38.

White,Halbert.1980b.“UsingLeastSquarestoApproximateUnknownRegressionFunctions.”InternationalEconomicReview21(1):149—70.

Yitzhaki,Shlomo.1996.“OnUsingLinearRegressionsinWelfareEconomics.”JournalofBusiness&EconomicStatistics14(4):478—86.

1SeealsothedescriptionsofmodernprivatesectoreconometricworkinAyres(2007),BrynjolfssonandMcAfee(2011),Christian(2012),andKohavi(2015). 2Wesay“onevariableatatime,”becausesomeoftheDaleandKrueger(2002)modelsreplacecollegeselectivitywithtuitionasthecausalvariableofinterest.3Inarecentpublication,Arcidiacono,Aucejo,andHotz(2016)usetheDaleandKruegerconditioningstrategytoestimatecausaleffectsofenrollingindifferentUniversityofCaliforniacampusesongraduationandcollegemajor.4ThesebooksarealsorankedhighlyinAmazon’seconometricscategoryand(atoneeditionremoved)aremarketleadersinsalesdatafromNielsenfor2013and2014.Dougherty(2016)isnumbereightonthelistyieldedbyOpenSyllabus,butthesixthbook,Hayashi(2000),isclearlyagraduatetextandtheseventh,Maddala(1977),isnotparticularlyrecent.Referencesforourbooklistappearinanappendixtable.5PanhansandSingleton(2016)alsodocumentgrowthinarticlesusingtheterms“naturalexperiment”and“randomizedcontroltrial.”6Ourviewofhowacontemporaryundergraduateeconometricstextcanbestructuredaroundempiricalexamplesisreflectedinourbook,AngristandPischke(2015).

Page 21: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

18

7OurthankstoEnricoMorettiforsuggestingasyllabusinquiry.

Page 22: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

Topic Whichincludes…BivariateRegression Basicexpositionofthebivariateregressionmodel,

interpretationofbivariatemodelparameters

RegressionProperties Derivationofestimators,classicallinearregressionassumptions,mathematicalpropertiesofregressionestimatorslikeunbiasednessandregressionanatomy,theGauss‐MarkovTheorem

RegressionInference Derivationofstandarderrorsforcoefficientsandpredictedvalues,hypothesistestingandconfidenceintervals,R 2,analysisofvariance,discussionandillustrationofinferentialreasoning

MultivariateRegression Generaldiscussionofthemultivariateregressionmodel,interpretationofmultivariateparameters

OmittedVariablesBias Omittedvariablesbiasinregressionmodels

AssumptionFailuresandFix‐ups Discussionofclassicalassumptionfailuresincludingheteroskedasticity,serialcorrelation,non‐normality,andstochasticregressors;multicollinearity,inclusionofirrelevantvariables,GLSfix‐ups

FunctionalForm Discussionoffunctionalformandmodelparametrizationissuesincludingtheuseofdummyvariables,logsontheleftandright,limiteddependentvariablemodels,othernon‐linearregressionmodels

InstrumentalVariables Instrumentalvariables,two‐stageleastsquares(2SLS),andothersingleequationIV‐estimatorslikeLIMLandk‐classestimators,theuseofIVforomittedvariablesanderrors‐in‐variablesproblems

SimultaneousEquationsModels Discussionofmulti‐equationmodelsandestimators,includingidentificationofsimultaneousequationsystemsandsystemestimatorslikeseeminglyunrelatedregressions(SUR)andthree‐stageleastsquares(3SLS)

PanelData Paneltechniquesandtopics,includingthedefinitionandestimationofmodelswithfixedandrandomeffects,poolingtimeseriesandcrosssectiondata,andgroupeddata

TimeSeries Timeseriesissues,includingdistributedlagmodels,stochasticprocesses,ARIMAmodeling,vectorautoregressions,andunitroottests.Thiscategoryomitsnarrowdiscussionsofserialcorrelationasaviolationofclassicalassumptions

CausalEffects Discussionofcausaleffectsandthecausalinterpretationofeconometricestimates,thepurposeandinterpretationofrandomizedexperiments,andthreatstoacausalinterpretationofeconometricestimatesincludingsampleselectionissues

Differences‐in‐differences Differences‐in‐differencesassumptionsandestimators

RegressionDiscontinuityMethods Sharpandfuzzyregressiondiscontinuitydesignsandestimators

Table1TopicDescriptions

Page 23: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

1970s 1970s ContemporaryExcl.Grad

Topic (1) (2) (3)BivariateRegression 2.5 3.6 2.8RegressionProperties 10.9 11.9 9.9RegressionInference 13.2 13.3 14.6MultivariateRegression 3.7 3.7 6.4OmittedVariablesBias 0.6 0.5 1.8AssumptionFailuresandFix‐ups 18.4 22.2 16.0FunctionalForm 10.2 9.3 15.0InstrumentalVariables 7.4 5.1 6.2SimultaneousEquationsModels 17.5 13.9 3.6PanelData 2.7 0.7 4.4TimeSeries 12.3 15.2 15.6CausalEffects 0.7 0.7 3.0Differences‐in‐differences ‐‐ ‐‐ 0.5RegressionDiscontinuityMethods ‐‐ ‐‐ 0.1

EmpiricalExamples 14.0 15.0 24.4

TopicsCoverageinEconometricsTexts,ClassicandContemporaryTable2

Notes:Thistablereportspercentagesofpagecountsbytopic.Column(2)excludesKmenta,Johnston,andIntriligator.Dashesindicatenocoverage.

Page 24: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

LectureTime CourseCoverage

Topic (1) (2)BivariateRegression 11.7 100.0RegressionProperties 8.7 43.4RegressionInference 12.4 92.1MultivariateRegression 10.5 94.7OmittedVariablesBias 1.9 28.5AssumptionFailuresandFix‐ups 20.2 73.7FunctionalForm 15.7 92.1InstrumentalVariables 3.9 51.8SimultaneousEquationsModels 0.4 19.3PanelData 3.6 36.8TimeSeries 5.0 45.6CausalEffects 2.5 25.4Differences‐in‐differences 2.0 27.2RegressionDiscontinuityMethods 1.4 16.7

NumberofInstitutions 15 38

Notes:Column1reportsthepercentageofclasstimedevotedtoeachtopic,forthe18coursesforwhichweobtainedadetailedschedule.Thiscolumnsumsto100percent.Column2reportsthepercentageofcoursescoveringaparticulartopicforthe38schoolsforwhichweobtainedareadinglist.

Table3CourseCoverage

Page 25: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

19

Appendix:SyllabusDataWecontactedthelargestschoolsineachofthefollowingfourgroups,definedby

aggregatingCarnegieClassificationcodesinIPEDSasfollows: 1,2ResearchUniversities(veryhighresearchactivity)(2010classification

code#15) 3,4ResearchUniversities(highresearchactivity)(2010classificationcode

#16) 5,6Doctoral/ResearchUniversities(2010classificationcode#17) 7,8BaccalaureateColleges—Arts&Sciences(2010classificationcode#21)Thetwocodesforeachofthesefourgroupsreflectadivisionintopublicand

privateschools,foratotalofeightcategories.Forexample,thefirstcategoryconsistsofpublicuniversitieswithveryhighresearchactivity.Weomittedprivatefor‐profitinstitutions.ThelatestCarnegieclassificationscheme(revisedFebruary2016)canbefoundathttp://carnegieclassifications.iu.edu/methodology/basic.php.Althoughweusedthe2010scheme,inpractice,thismatterslittle.

Weattemptedtocollectundergraduatereadinglistsandcourseschedulesforupto10ofthelargestschoolsineachofthe8categoriesdefinedabove.Thisinformationwasobtainedbysearchingforpubliclypostedmaterialand/orcontactinginstructors.Welookedforthemostrecentinformationavailable.

Weobtained45readinglistsfrom38schoolsandlectureschedulesfor15courses.Thelectureschedulesdetailthenumberoflecturesorweeksacoursespendsonparticulartopics.TableA2displaysthelistofuniversitiesfromwhichweobtainedreadinglists,andTableA3showssomesummarystatistics.

Page 26: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

EconometricsTextbooks

A.Classic

Kmenta,Jan.1971.ElementsofEconometrics .NewYork:TheMacmillanCompany.Johnston,J.1972.EconometricMethods, 2ndEdition.NewYork:McGraw‐Hill.Pindyck,RobertS.,andDanielL.Rubinfeld.1976.EconometricModelsandEconomicForecasts .

NewYork:McGraw‐Hill.Gujarati,Damodar.1978.BasicEconometrics .NewYork:McGraw‐Hill.Intriligator,MichaelD.1978.EconometricModels,Techniques,&Applications .

EnglewoodCliffs,NJ:PrenticeHall.Kennedy,Peter.1979.AGuidetoEconometrics .Cambridge,MA:TheMITPress.

B.Contemporary

Kennedy,Peter.2008.AGuidetoEconometrics .6thEdition,Malden,MA:BlackwellPublishing.Gujarati,DamodarN.andDawnC.Porter.2009.BasicEconometrics .5thEdition,

Boston:McGraw‐Hill.Stock,JamesH.andMarkM.Watson.2015.IntroductiontoEconometrics .3rdEdition,

Boston:Pearson.Wooldridge,JeffreyM.2016. IntroductoryEconometrics.AModernApproach. 6thEdition,

Boston:CengageLearning.

OxfordUniversityPress.Studenmund,A.H.2017.UsingEconometrics.APracticalGuide .7thEdition,Boston:

Pearson.

AppendixTableA1

Dougherty,Christopher.2016.IntroductiontoEconometrics .5thEdition,Oxford:

Page 27: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

SyllabusesandSchedules

Public/ Undergraduate NumberInstitutionName Category Private Enrollment ofSyllabiUniversityofCentralFlorida 1 Public 52,671 1 0TexasA&MUniversity 1 Public 47,093 1 1OhioStateUniversity 1 Public 44,741 2 0.5PennsylvaniaStateUniversity 1 Public 40,541 1 0ArizonaStateUniversity 1 Public 39,961 1 0TheUniversityofTexasatAustin 1 Public 39,523 3 0MichiganStateUniversity 1 Public 38,786 1 0RutgersUniversity 1 Public 34,544 3 0UniversityofMinnesota‐TwinCities 1 Public 34,351 1 0NewYorkUniversity 2 Private 24,985 1 1CornellUniversity 2 Private 14,282 1 0UniversityofPennsylvania 2 Private 11,548 1 0UniversityofMiami 2 Private 11,175 1 0GeorgeWashingtonUniversity 2 Private 10,740 1 1HarvardUniversity 2 Private 10,338 1 0NorthwesternUniversity 2 Private 9,048 1 0UniversityofNotreDame 2 Private 8,448 1 0FloridaInternationalUniversity 3 Public 41,009 1 0TheUniversityofTexasatArlington 3 Public 29,883 1 1UniversityofNorthTexas 3 Public 29,758 1 0SanDiegoStateUniversity 3 Public 28,394 1 0UtahStateUniversity 3 Public 24,271 1 0BrighamYoungUniversity 4 Private 27,163 1 1DrexelUniversity 4 Private 16,896 1 0SaintLouisUniversity 4 Private 12,374 1 0BostonCollege 4 Private 9,856 3 0.67EastCarolinaUniversity 5 Public 22,252 1 0UniversityofNorthCarolinaatCharlotte 5 Public 22,216 1 1IllinoisStateUniversity 5 Public 18,155 1 0DePaulUniversity 6 Private 16,153 1 0TexasChristianUniversity 6 Private 8,647 1 1MarquetteUniversity 6 Private 8,410 1 1AmericanUniversity 6 Private 7,706 1 1UnitedStatesNavalAcademy 7 Public 4,511 1 1UnitedStatesMilitaryAcademy 7 Public 4,414 1 1CalvinCollege 8 Private 3,894 1 0HopeCollege 8 Private 3,455 1 1UniversityofRichmond 8 Private 3,346 1 0

AppendixTableA2

LectureSchedules

Page 28: DIuIN PAPr SrI - IZA Institute of Labor Economicsftp.iza.org/dp10535.pdf · teach regression. Regression should be taught the way it’s now most often used: as a tool to control

AppendixTableA3

NumberofSyllabi 45NumberofInstitutions 38NumberofInstitutionswithLectureOutlines 15PercentPublic 50PercentPrivate 50AverageUndergraduateEnrollment 21,461PercentofSyllabifrom2016 67PercentofSyllabifrom2015 16PercentofSyllabifrom2009‐14 18PercentofSyllabiusingWooldridge 32PercentofSyllabiusingStockandWatson 18PercentofSyllabiusingGujarati 16PercentofSyllabiusingStudenmund 14PercentofSyllabiusingKennedy 0PercentofSyllabiusingotherBooks 20

SyllabiSummaryStatistics