Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Page 1 of 40
An Introduction to Quantitative Research in Political Science
6 March 2015
Robert J. Franzese, Jr. Professor of Political Science,
The University of Michigan, Ann Arbor
Modern academic commentators tend to characterize political-science scholarship prior to
World War II as legalistic, favoring categorical enumeration of constitutional and historical details
over positive-theoretical analysis, and offering anecdotal rather than systematic empirical analysis.
This dismissive characterization of the prewar study of politics as almost entirely unscientific is
surely unfair, but not until the behavioralism revolution swept political science in the 1950s,
following on the Parsonian approach that had analogously revolutionized sociology in the interwar
period, did a self-consciously positive-scientific approach to the study of politics grow ascendant
(Brady et al. 2011). This new, positive, political science, as opposed to earlier exclusively
normative or non-scientific approaches, offered theories about the systematic sources of
regularities in political behavior, i.e., theories that posited explanations for the systematic aspects
of the relationships that one could expect among socio-politico-economic variables. As theory in
political science became this sort of positive exposition of systematic relationships between
variables that, by the logical argument of the theory, one could expect to manifest regularly
empirically, the primary aim, focus, and purpose of empirical analysis shifted in parallel from
descriptive and factual inquiry – what happened? – to empirical evaluation of theory and empirical
estimation of the theoretically posited relationships – why and how, by what process, did it
happen?. Interestingly, this progression in the ends to which quantitative empirical research in
political science were put can be followed in the various names King (1991) notes have been given
the subfield before it took its current moniker, political methodology: from Rice’s interwar
Page 2 of 40
political statistics (Rice 1926) to Alker’s post-behavioral-revolution polimetrics (1975).1
Before the beginning of its emergence as a subfield of political science in the 1970s, works in
quantitative empirical-research methodology in the study of politics were scattered sporadically
across the journals of political and related social sciences. Moreover, these rare early instances of
political methodology were usually published only after hard-fought battles to defend each work’s
bona fides as a study of politics and the possibility of doing so with quantitative research-methods.2
Through these battles, the subfield began to coalesce and gain a self-awareness that was beginning
to show around the turn of that decade. Soon afterward, The Society for Political Methodology was
formed for the purposes of, as is the case for most social societies, self-defense, mutual support,
and advancement of the group’s aims and activities, or, as one of the founding figures tells it:
September 3, 1983 will not be the next national holiday, yet for some it is a day worthy of a celebration. Nor did it initiate ten days that shook the world, yet it marked the beginning of a quiet but profound revolution in Political Science. On that date, at the urging of a young Berkeley agitator turned methodologist named Steven Rosenstone, commenting on an APSA panel the day before, a group of young radicals gathered on the steps in the lobby of Chicago's Palmer House Hotel. (Efforts to place a plaque on the spot have been unsuccessful.) Accounts at the time put the number at twelve, but only seven have been positively identified in subsequent documents. Those seven are: Christopher H. Achen, John H. Aldrich, Larry M. Bartels, Henry E. Brady, John E. Jackson, George E. Marcus, Steven J. Rosenstone, and John E. Sullivan. Larry Bartels, the youngest possible member, now denies attending and claims he was not even in Chicago. The remaining individuals have remained unidentified and unindicted co-conspirators though there are rumors about their identity. It is also true that, much like Babe Ruth's famous Wrigley Field home run in 1932, the number of people who claim to have attended now exceeds the capacity of the lobby and some of those claims are from individuals who, at best, were in a pre-arithmetic state of development.
A subsequent manifesto identifies the cell's purposes as three-fold: 1) To organize a formal meeting to create an agenda for action; 2) To take over a set of panels at the 1984 meeting of the American Political Science Association; and 3) To institutionalize themselves as afield. There were surely other even more incendiary topics covered, but because the group was very careful not to take minutes so there could be deniability these
1 Note also Sir William Petty’s political arithmetic (Petty 1672) a quarter millennium before Rice and the politimetrics, , politometrics, and govemmetrics of Gurr (1972), Hilton (1976), and Papayanopoulos (1973). Presumably, the ‘metrics labels from the 1970s fell from favor as political methodologists sought to avoid giving the impression that their subfield merely borrowed with rote translation from econometrics. 2 Larry Bartels tells a story of a review of one of his first journal-submissions as a young graduate-student about this time informing the journal editor that his manuscript should obviously be rejected because one cannot trust any of its evidence with all its coefficients and standard errors and everything being estimated. [personal communications]
Page 3 of 40
are not documented. Though this manifesto hardly had the fiery rhetoric of the Port Huron Statement its intentions were as profound and its impacts more permanent. It initiated a bloodless revolution that without killing anyone or thing, save a few cross-tabs, some factor analyses, and a path analysis or two permanently altered the way empirical analysis would be done. (Jackson 2012: pp. 2-3)
The group did indeed convene a meeting for that next year. They met in Ann Arbor at the
University of Michigan in the summer of 1984 for the first academic conference of the subfield of
political methodology. Sixteen scholars discussed topics including psychometric measurement-
models for political-science applications, ecological inference, and statistical models for binary
outcomes (like voting, majoritarian elections, and wars). Three decades later, that annual Summer
Conference of the Society for Political Methodology now regularly hosts 150-225 attendees and
some 100-200 paper and poster presentations by faculty and graduate students, on topics ranging
from simulation-estimation methods for complex structural models to nonparametric causal-
inference, from Bayesian models for textual data to computational data-mining for real-time
streaming data, and from methods for individual-level data (and even sub-individual-level
neurological data) to time-series-cross-sectional data of aggregated units from medieval to modern
times. Routinely now, many of the eventually polished products of these conference papers and
posters are published in all the top journals of political science, including the Society’s own top-
rated journal, Political Analysis, in the top methodological journals across the social sciences, and
in the top journals in statistics as well.
This five-volume collection selects 55 “Major Works” from the academic literature of this fast
evolving and now enormous subfield called political methodology, focusing exclusively therein
on quantitative empirical-research methodology.3 These 55 selections are organized, following
the editor’s conceptualization of the central challenges of empirical analysis (quantitative or
3 I.e., the collection does not include so-called formal-theoretical methods, some of which are quantitative also, nor does it seek to encompass qualitative empirical-methods in its purview.
Page 4 of 40
otherwise) in the political and social sciences, in six parts across the five volumes:
a. Section 1 introduces the central challenges of quantitative empirical analysis in political science, and offers some overviews of one promising approach to tackling these challenges.
b. Section 2 starts this thusly-organized tour of quantitative empirical-research methodology from the starting point of all empirical analysis: measurement.
c. Section 3 reviews the main applied tools of multivariate analysis (which relates to the first central challenge: multicausality): least-squares linear-regression, maximum likelihood for limited and qualitative dependent-variables, and Bayesian methods of estimation and inference.
d. Section 4 covers the various ways empirical methodology has addressed heterogeneity (which relates to the second central challenge: context conditionality) of certain sorts – differing levels of outcomes, differing effects of variables – across the observational units.
e. Section 5 addresses empirical methods for dynamic and interdependent processes (which also relates to the second central challenge and to the third challenge: ubiquitous endogeneity), encompassing temporal (cross-time), spatial (cross-unit), and spatiotemporal (cross-unit-time) dependence and dynamics.
f. Section 6 surveys approaches to and methods for estimation and inference given endogeneity, i.e., simultaneity or mutual causality among the variables of the analysis (which is the third central challenge): systems estimation, instrumentation strategies, autoregressive/temporal-ordering based methods, discontinuity and difference-in-difference designs, matching methods, and experimentation.
The volume’s six parts are organized around the editor’s conception of the three central
challenges for empirical research in social science. However, “the [one] fundamental problem of
causal inference” (Holland 1986), empirical researchers and methodologists are now told, is that
we would like to know the difference in outcome, yi, for an observational unit, i, between getting
some treatment, xi=1, and not getting that treatment, xi=0, but unfortunately we can logically only
ever observe xi=1 or xi=0, never can both xi=1 and xi=0 happen to the same, single observed unit
(at the same time). With this difference of (yi|xi=1)-(yi|xi=0) understood to be, simply and
unambiguously, the quantity of interest in empirical analysis, this Neyman-Rubin-Holland causal-
model school of thought teaches that all challenges of empirical research come back to this one,
unavoidable issue: the unobservability of the counterfactual. Alternatively, empirical
methodologists and researchers in microeconometrics today are told that empirical analysis of
micro-level units (as opposed to aggregates) “fundamentally…brings to the forefront
Page 5 of 40
heterogeneity of individuals, firms, and organizations that should be properly controlled (modeled)
if one wants to make valid inferences about underlying relationships” (Cameron and Trivedi 2005:
p. 5). From this view, the one core problem is heterogeneity, and especially unobserved
heterogeneity (Wooldridge 2002). The editor’s view is that empirical methodologists and
researchers in the political and social sciences are hardly so lucky as to have only just one
fundamental or core problem. Rather, the central challenges of empirical analysis, quantitative or
otherwise, in the political and social sciences number (at least4) three. The organization of these
five volumes in six parts follows the methodological attacking of those three central challenges for
quantitative empirical research in political science.
In generating positive theoretical explanations of the systematic relations among socio-
politico-economic phenomena, we know or strongly suspect the following three features to
manifest regularly: multicausality—just about everything matters; context conditionality—the
way just about everything matters depends on just about everything else; and ubiquitous
endogeneity—just about everything causes just about everything else. The introductory selections
of Section 1 elaborate these challenges and emphasize one promising approach to successfully
managing them, if not to say surmounting them, which approach is essentially to lean very heavily
on the theory and substance of the scenario of study to specify empirical models for powerful
leverage on this complex reality.
In Section 1: Introduction, the editor’s own contribution, “Multicausality, Context-
Conditionality, and Endogeneity,” elaborates on the meaning and likely universal applicability in
4 A fourth and a fifth core challenge could be mentioned as well – fourth: far too little data, or useful variation to be more precise, to surmount the first three challenges purely empirically, i.e., non-parametrically; and fifth: the truth, i.e., the DGP, is likely moving while we’re trying to estimate it – although the fourth may be only typically and not universally severely manifest, and the fifth may be subsumable under the second as another form of heterogeneity in the process being analyzed.
Page 6 of 40
social science of these three challenges. In essence, the three fundamental challenges imply that,
with each empirical observation on however many variables, we will generally have some multiple
of that many empirical quantities (e.g., relationships or averages, i.e., parameters) to estimate. With
each empirical set of facts we learn, we have some times that many things more to estimate, or to
learn, unless, that is, we impose some structure on our inquiry, like that the observations are
generated according to a pre-specified model, a structural equation or set of equations, or that all
observations where x=0 come from some distribution with one, constant mean, and those where
x=1 come from some other distribution with one other, constant mean. The tremendously useful,
in fact: indispensable, point of these structural pre-impositions is that they greatly reduce the
parameterization, the number of parameters or things that we will have to estimate, with each
observation from the multiple of the number of things observed that a truly nonparametric
approach would require. The suggestion of the first article in the introduction section is that,
confronted with a multi causal, context conditional, dynamic and ubiquitously interdependent
world, we not only cannot avoid leaning on theoretically and substantively pre-imposed structure
in our empirical models, we must have sufficient such structure to learn anything empirically, and
we should therefore impose structure that reflects as well as possible what theory and substance
tell us about the phenomena. This is the promising way forward intimated above. The next two
articles in the introduction are summary or foundational in two specific ways of acting upon these
principles. Granato and Scioli, in “Puzzles, Proverbs, and Omega Matrices,” explain the EITM –
Empirical Implications of Theoretical Models – initiative, which is a now 12 year-old program,
initiated from the Political Science Program of the Social and Behavioral Sciences Directorate of
the National Science Foundation (under Scioli’s and Granato’s leadership at the time), training
young scholars in methods for integrating theoretical models (primarily formal ones) more tightly
Page 7 of 40
with empirical analyses (primarily quantitative ones). Bas, Signorino, and Walker’s “Statistical
Backwards Induction” is a foundational work in one particular kind of tight integration between
formal-theoretical and quantitative-empirical model, one in which the game-theoretic process of
determining strategic behavior by solving backward, from the actors’ optimal choices at the end
of the game through all earlier decision back to the initial choice node, becomes a statistical model
with empirical data input for unknown payoffs as parameterized functions to be estimated (by
maximum likelihood). (An example of a different kind of tight integration is Franzese’s “Multiple
Hands on the Wheel” in Section 4, which illustrates a theoretically specified model for a particular
manifestation of complex context-conditionality, estimated by nonlinear least-squares.)
Of course, before we can begin to tackle any of these of challenges empirically, we must have
measures of the phenomena to be analyzed. Measurement is therefore fittingly the subject of
Section 2, immediately following on the Introduction, as it is the first step in any empirical science:
To measure is to know. — Lord Kelvin (1883). [or, as elaborated:] In…science, the first essential step in the direction of learning any subject is to find
principles of numerical reckoning and practicable methods for measuring some quality connected with it. I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be. — Lord Kelvin (1883).
If you can think about something, you can conceptualize it; if you can conceptualize
it, you can operationalize it; and if you can operationalize it, you can measure it. — J. David Singer [as quoted in Clark, Golder, and Golder 2013: pp., 143.]
In political science, many of the objects of interest for study are abstract concepts like power
in international politics, democracy in comparative politics, ideology or policy preferences in
domestic politics. Some of the most important achievements in political methodology have been
advances in measurement. Indeed, measurement, along with specification or design, are the first-
order essentials of any empirical analysis, as Lord Kelvin was very fond of noting. The most
Page 8 of 40
advanced technical wizardry of advanced estimation methods, or the cleverest and most thoughtful
of experimental or observational research designs, can only yield useful results insofar as the
measures (and models) onto which they are applied are well taken (and specified). The selections
in Section 2a represent in various ways some of the important political-methodological
contributions in measurement. Achen’s (1983) “Toward Theories of Data” surveys the fledgling
subfield of political methodology and places measurement at the heart of its seed. Jackman’s
(2008) “Measurement” contribution to the Oxford Handbook of Political Methodology follows
with a modern overview of the methods and methodology of measurement in political science 25
years later. The two virtues measures should maximize are reliable accuracy5 — reliability being
low variance across different measures of same item, and accuracy being small average error and
so small bias of the measurement from the truth it was to measure — and coverage, which is
measuring many instances of the item and failing to measure few. The absence or relative lack of
these two virtues are, respectively, measurement error and missing data, two data problems that
arise immediately along with our coverage of measurement. Groves (1991), sociologist and survey
methodologist, and former Director of the U.S. Census Bureau, gives authoritative overview of
“Measurement Error across Disciplines.”6 King and distinct groups of co-authors then give us two
seminal contributions in political methodology, one toward each of those two ills. In “Enhancing
the Validity and Cross-cultural Comparability of Measurement in Survey Research,” King,
Murray, Salomon, and Tandon (2004) show how to use “vignettes” or descriptions of particular
scenarios, the same ones presented to different groups, to anchor respondents in the same places,
5 Since both together are desirable, I merge the two properties of measures most introductions to the topic emphasize, reliability and accuracy, to make space for a third quality, coverage. 6 In the first selection, Franzese had shown that, in general, arguments to reduce the number of observations by on grounds of high-quality measurement are unsustainable, so methods of improving measurement accuracy in non-tiny samples must instead be the way forward (see Table 1 and the surrounding text in that selection).
Page 9 of 40
at least objectively, when they give Likert-scale or other scales or index responses to survey
questions that, for instance, ask them to put themselves or other entities on left-right scales. A
tightening of the link from substantive concepts in theory to empirical measures of those concepts
as operationalized in variables, a tightening that in turn strengthens the connection from theoretical
argument to empirical evaluation by reducing measurement error. Lastly in Section 2a: King,
Honaker, Joseph, and Scheve (2001), in “Analyzing Incomplete Political Science Data: An
Alternative Algorithm for Multiple Imputation,” offer the most influential address in political
science to the challenge of missing data.
Section 2b turns to applications of important statistical models for measurement as developed
and applied to measure political-science abstractions like ideology or preferences or partisanship,
or democracy. Selection 9 provides a methodological introduction to perhaps the most-widely used
innovation in measurement in political science, Poole and Rosenthal’s Nominate score, which is a
method of using U.S. Congressional roll-call votes on bills to place Representatives and the bills
on a single left-right continuum or in multiple dimensions, based on how the legislators divide on
each vote and how each vote divides the legislators. Although (1) the method is most useful only
in legislatures where individual representatives have appreciable tendency to vote independently
(as opposed to purely in party blocks), and (2) the method cannot get to “underlying” preferences
but only those revealed by the presumably strategic votes of legislators, and (3) the method has
been criticized on other grounds – just for one example: a probable leaning toward detecting few
dimensions of difference among legislators and bills, usually just one dimension, which suggests
the explanatory power of the likely unidimensional results should probably not be used to justify
conclusions about the dimensionality of political contestation or its representation in the legislature
– the importance and influence of the nominate score methodology for offering concrete, and it
Page 10 of 40
seems accurate and reliable, measures of such a critical abstraction as political preferences or
ideology can hardly be overstated. Democracy is another hugely important abstraction in political
science, and so using Trier and Jackman’s “Democracy as a Latent Variable” to illustrate the
application of (Bayesian) item-response theory (IRT) measurement modeling, another of the most
widely used methodological approaches to measurement, is very appropriate. Lastly in Section 2b,
Stimson, MacKuen, and Erikson’s “Dynamic Representation” gives thorough introduction to their
long-term project measuring public mood, i.e., the general left-right leaning of the broad scope of
public opinion in a jurisdiction, and macro-partisanship, i.e., the general left-right leaning of
policy and governance in a jurisdiction, in the service of gauging the quality of democratic
representation and of evaluating theories involving those abstractions. The three applications
demonstrate the foundational importance of measurement methodology to the advancement of a
positive scientific study of politics.
Volume II and Section 3 of the collection turns from understanding the central challenges of
social-science empirical-analysis and proposing strategies to address them, and from the starting
point to political-science empirical-analysis, (operationalization and) measurement of its often
abstract theoretical conceptions, to the practical implementation of empirical-analytical methods
suggested by the former and made possible by the latter. The first central challenge, multicausality,
or that “just about everything matters,” is more or less precisely what the methods of multivariate
analysis covered in Section 3 are designed to address: to control for other potential explanators of
the outcomes of interest besides the one of the researcher’s main focus and interest.7,8 The
7 The editor would argue against conceiving that the entire or only aim for empirical analysis is to test for existence of one causal mechanism, and so that the only point of multivariate analysis is to control for other possible causes, for confounds to use the term currently en vogue, this is: omitted variables, to use the old and still perfectly serviceable term, when testing or estimating one’s preferred factor’s effect. 8 A modern focus on the possibility of unobservable, or at least unobserved, other causal factors, leads some to stress very strongly the tremendous advantage of experimental control over statistical control in observational data in that randomization of experimental treatment eliminates (in large samples, assuming perfect randomization) the possibility
Page 11 of 40
selections in this part overview the main workhorse techniques in applied empirical analysis in the
political and social sciences: least-squares linear-regression, maximum-likelihood estimation of
limited and qualitative dependent-variable models, and Bayesian methods for estimation and
inference. Achen’s (1982) Sage “Little Green Book”, Interpreting and Using Regression is the
classic, and remains the best, practical introduction to linear regression. The selection from that
wonderful monograph that is Section 3a here introduces regression as a tool for empirical analysis
of social science theory, which exactly fits the organizational theme of this collection. King’s
(1998 [1989]) Unifying Political Methodology, which leads off Section 3b, analogously served as
many subsequent generations of political methodologists’ introduction to maximum-likelihood
estimation and inference, in particular for limited and qualitative dependent-variable models. The
selections in this collection provide the highlights of that classic introduction. The selections from
Gill’s (2001) Generalized Linear Models (GLM) lucidly introduces and explains the GLM
approach to the broad class of limited and qualitative dependent-variable models that can be
encompassed under the generalized-linear framework (e.g., exponential-family and so log-linear
models). King, Tomz, and Wittenberg’s (2000) “Making the Most of Statistical Analyses”, which
concludes Section 3b, effectively pressed upon the by-then almost thoroughly positive-theoretical
and quantitative-empirical political science the crucial importance of working through the
substantive meaning of their estimates, in terms of how outcomes of interest, y, would respond to
hypothetical movements in explanators, x, which empirical methodologists and researchers will
have well-learned by then also (if they were paying attention), were not simply the estimated
coefficient on x, except in the simplest purely linear-additive-and-separable models. This was and
of any alternative cause, including unobserved or even as-yet unconceived ones, but the treatment. Again, the editor would argue against concluding that this great virtue, and the great peril of unobserved confounds, great though they are, being the only weighty considerations in choosing a method of empirical analysis.
Page 12 of 40
is a critically important lesson the selection imparts, along with providing a very useful software
tool for calculating these quantities of interest along with estimates of their standard errors.
Section 3c covers the Bayesian framework for estimation and inference, which framework is
highly amenable to the approach to the three central challenges pushed in the introduction, which
might be summarized as seeking empirical models to reflect, as accurately and statistically
powerfully as possible, the theoretical argument or model. Designing the empirical analysis into
the Bayesian posterior (priors times likelihood) in this way is logically straightforward (although
not uniquely so: one could also write the empirical version of the theoretical model into the
likelihood of that framework or as the expected outcome in least-squares framework). The
selection “Specifying Bayesian Model’s” from Gill’s Bayesian Methods: A Social and Behavioral
Science Approach introduces the framework in such terms. To elaborate: in the least-squares
framework, one can build the theoretical model of what we expect the outcome, y, to be as a
function of certain explanators or covariates, x, connected according to a theoretically specified
( , )f x β such that ( ) ( , )E y f= x β so that ( , )y f ε= +x β . In the baseline classical linear regression
model, we assume that ( , )f x β is linear-additive and separable, but later on we learn that the key
for least-squares estimation is not the linearity, but the separability (of the stochastic component,
ε, from the systematic, ( , )f x β ). In the likelihood framework, we can easily extend such an
approach further, modeling as some function of explanatory x’s the parameters of any distribution
that could characterize an outcome of interest, whether that distribution affords separability or not.
The Bayesian framework allows us to go a step further. The Bayesian posterior being simply the
likelihood times the prior, we have in the prior an additional place to put substantive and theoretical
information and structure, and that knowledge we bring to the empirical model can be of just about
any sort. The selection from Bartels, “Pooling Disparate Observations”, shows for example how
Page 13 of 40
incorporate substantive and theoretical knowledge we may have about the varying extents to which
different observations fit the ideal of the population to which the theory applies. Jackman’s
“Estimation and Inference Are Missing Data Problems: Unifying Social-Science Statistics via
Bayesian Simulation” then very nicely brings us back to a unified Bayesian perspective on the
measurement, specification, and estimation that form the core of any empirical analysis.
Multivariate analysis, whether by least squares regression, maximum likelihood estimation, or
Bayesian methods of estimation and inference, is the first and still greatest advance in empirical
methods for sciences, like the political and social, in which perfect isolation of objects of study to
perfectly insulated experimental laboratory, so that a multicausal world can be reduced to a
monocausal observational environment, is impossible and undesirable.9 Since we cannot study
much empirically by experimental control both well and with much applicability, these methods
of statistical control in observational data must instead be the main tool.10 Statistical control,
whether by multivariate analyses of the traditional methods covered in Section 3 or by matching
methods covered in Section 6e, requires that the multiple explanatory variables to be controlled
statistically – or, as better fits the approach of these volumes’ emphasis to say, requires that the
multiple explanatory variables whose partial associations are to be estimated – be observed.
Especially according to microeconometricians, as the quotes from Cameron and Trivedi and from
Wooldridge above testify, the great concern in any cross-section or panel (i.e., time-series cross-
section) dataset is cross-unit heterogeneity, especially unobserved heterogeneity.
Heterogeneity and heterogeneous effects are the subject of Section 4, starting with unit (and
9 Such isolation is undesirable because, even if it were possible, the resulting ideal causal inferences would be useless for application to the environments in which we would want to use the knowledge gained since we have no reason to believe the causal relations in the non-insulated reality would mimic those from the isolated laboratory. 10 Figure 1, equation 6, and the surrounding text in the first introductory selection from Franzese shows that, even in the simplest possible case – purely linear-additive and separable trivariate regression model properly accounting for all the variations and covariations among the three variables needed to produce the correct partial association of each x with y non-quantitatively would not seem humanly possible.
Page 14 of 40
period) fixed effects in Section 4a of Volume II. Oddly, what econometricians call unit effects are,
ion their essence, not actually effects at all as most empirical methodologists, ironically including
economists, understand that latter term. Effects, say the effect of x on y, are differences or
derivatives, like dy/dx or Δy/Δx, or ∂y/∂x . Unit effects, on the other hand, are the differences in
expected outcomes from one unit to the next, ceteris paribus , i.e., at the same x values):
( | ) ( | )i jE y E y−x x . Empirical researchers, especially microeconometricians, worry about the
possibility of these mean-level differences across units, especially if these differences are
unobserved or insufficiently modeled because of situations like those commonly depicted as in
Figure 1. In the Figure, the unit effects are the unmodeled differences in mean-level across units,
i.e., ( | ) ( | )i jE y E y−x x , which are also given by the y intercepts in the figure. The observations
in each unit show a strong positive relationship between x and y, as separate regression lines unit
by unit would reveal. Unfortunately, the mean values of x also correlate, strongly negatively in
this case, with the unit effects. If unmodeled, these unit effects would induce an enormous negative
bias on a regression line estimated from the pooled data.
Figure 1: Sign-Flipping Bias from Unmodeled Unit Effects
Page 15 of 40
This is the sort of horror story that leads the microeconometricians, and Green, Kim, and Yoon,
in the first selection of Section 4a “Dirty Pool,” to stress the fixed-effects guard against unobserved
heterogeneity. Very simply, if one includes indicator variables for each unit, or mean-differences
all the data unit-by-unit, the unit heterogeneity will be absorbed by these unit-dummy fixed effects
(or swept away in the mean differencing), and the nightmare seen in Figure 1 cannot materialize.
Period effects refers to some time-period by time-period difference, which, analogously, raises the
specter of bias from unobserved period-heterogeneity if those unmodeled or unobserved mean-
differences across time periods correlates with the means of the x’s (across units) period-by-
period. The problem against which unit effects so effectively guards can be real and can be as
severe as Figure 1 illustrates, but these fixed-effect cures also come at severe efficiency costs.
They “sweep away” all of the cross-unit variation, and the remaining useful cross-time variation
may often be small. This means that many, many time periods of data across the units may be
necessary to find evidence even of reasonable strong true associations between x and y. In the
most-extreme case, as Beck and Katz’s “Throwing Out the Baby with the Bath Water: A Comment
on Green, Kim, and Yoon” emphasize, units without over-time variation are effectively eliminated
from the sample, and this can make fixed-effect estimation biased in an attenuation direction if the
lack of variation is simply because variation is rare or small and sample sizes are non-infinite, as
may very well be the case in the substantive context of those papers: always-peaceful dyads in
interstate-conflict datasets. Furthermore, as Troeger shows in “Problematic Choices: Testing for
Correlated Unit Specific Effects in Panel Data,” these unit dummies tend to overfit in limited
samples, exaggerating any unit heterogeneity truly present and tending to find unit heterogeneity
even where it is not present, and this overfitting results in poor performance of statistical tests,
such as the Hausman Test of fixed-effect versus non-fixed-effect regression in panel or time-series
Page 16 of 40
cross-section data, of whether the scenario depicted in Figure 1 is manifest. The upshot of the
demonstration in Section 4a of a potential carcinogenic problem in unaddressed unit heterogeneity
but also of the grave side effects of the fixed-effects chemotherapy for that cancer is not that
empirical methodologists should freeze in indecision or give up in hopelessness, but to recognize
that thoughtless insistence on always applying fixed-effects because the Figure 1 scenario is so
scare reflects an insufficiently nuanced understanding of the problems and the methods for its
address. One simply must tread carefully, weighing the very real and sizable cons and not just the
remarkable pros, of sweeping or dummying away cross-unit and/or cross-time heterogeneity.
As Troeger also notes elsewhere (n.d.), this standard textbook heterogeneity of fixed mean
differences across units (or time period) is perhaps the least interesting and certainly the simplest
sort of heterogeneity that may concern us. The second of the central challenges for empirical
analysis in the political and social sciences, context conditionality, speaks to a much richer and
more interesting sort of heterogeneity, which is a bona fide heterogeneity in actual effects. That is,
the effect of x’s tend to vary across contexts. The selection from Kam and Franzese Modeling and
Interpreting Interactive Hypotheses in Regression Analysis shows how to go from theoretic
statements about how the effects of x on y depend on z to the simplest empirical models reflecting
such a consideration, the linear-interactive model. In this case, one very simply creates a third
variable, xz, by multiplying x and z together and includes x and z and xz into the regression (or
nonlinear, limited, or qualitative dependent-variable model). The result is a model in which the
effect of x on y, i.e. dy/dx or Δy/Δx, or ∂y/∂x, is no longer some single constant, given by the
coefficient on x, but rather the effect of x on y is a function of z (and the coefficients on x and xz).
The linear interaction is a simple yet powerful device for effectively modeling a limited amount
of relatively simple interactivity, but when the substantively and theoretically implicated nature of
Page 17 of 40
context conditionality is complex, such as for instance in modeling policy when multiple actors
have some hand in policymaking, the linear-interactive model can grow unwieldy with the
proliferation of numerous colinear (i.e., correlated) regressors. This is the same problem against
which highly nonparametric approaches to empirical analysis in the political and social sciences
sooner or later run afoul as well: when one eschews imposing structure one empirical estimation
models, the number of things to learn (estimate) empirically grows faster than one-for-one with
the number of contexts observed. Given this unavoidable fact that empirical analysts must impose
structure to make any headway, the approach providing the lens through which we view the
contributions to this collection suggests, as does Franzese in “Multiple Hands on the Wheel” that
we impose theoretically and substantively derived structure, so that the empirical results may be
theoretically and substantively informative. In the case of monetary policymaking in the open and
institutionalize do context, for example, control of monetary policy is shared between governments
and central banks, with the latter weighing more the greater the bank’s independence (Franzese
1999), and that domestic duo controls monetary insofar as they have not delegated it to foreign
banks and governments via an exchange-rate peg or effectively cannot maneuver domestically
(even to peg their exchange rate) because theirs is a small and financially open economy. Insofar
as those latter two conditions, explicit delegation via exchange-rate peg or de facto via small-
openness, hold, the country's monetary policy is effectively given by the peg-currency's policy or
some global equilibrium policy. All these considerations add to a conclusion that the effect of
every factor, domestic or , that might influence monetary policy depends on the degree of central
bank independence, the extent of exchange-rate-peg efficacy, and the small-openness of the
economy, all at home and abroad, and through those international connections on all the domestic
factors of all the foreign countries to which the current country is connected. This complex context-
Page 18 of 40
conditionality would be a multicolinear nightmare to model by linear interactions, but the selection
shows how to impose the structure implied by this nested set of “shared policy-controls” in an
empirical model that can be estimated by nonlinear least-squares with only very few more
parameters, 3 or 4, to estimate than a linear additive model. The selection thus illustrates both the
effectiveness of the theoretical-structure imposition approach for the second central challenge of
context conditionality, and introduces the extremely useful empirical-estimation tool of nonlinear
least-squares.
From this perspective, Section 4c redresses a limitation of the prior two subsections’
approaches to heterogeneity, which is that the cross-unit mean-differences or the heterogeneity in
effect parameters across contexts is modeled as varying deterministically. The oddness of this is
perhaps best seen in the linear-interaction model. In that case, we are modeling the effect of x on
y (and of z on y symmetrically) to depend without error on z (on x, symmetrically). Odd that we
do not know E(y|x) with certainty, instead y = E(y|x) + ε, but we do know with certainty
E[d(dy/dx)/dz], it’s just βxz, without error. One way to view the multilevel or hierarchical model,
the random effects and/or random coefficients model, the mixed effects model, the error
components model, or shrinkage estimators – these are all basically synonyms, alternative labels
for the same class of models that emphasize different aspects or uses for that single kind of model
with error terms or stochastic elements added to the intercept and slope coefficients – is that these
allow us to model the heterogeneity in the intercepts and coefficients across units, or across any
sets of observations really, to have random error in them, i.e., to be random (across repeated
samples). The selections in Section 4c, beginning with Western, “Causal Heterogeneity in
Comparative Research: A Bayesian Hierarchical Modelling Approach,” which is addressed to
comparative political economists in particular, introduce political scientists to this use and
Page 19 of 40
approach for the Bayesian hierarchical model: theory specifies how and why expect coefficients
to vary across contexts but there is also an error component added to these systematic components
modeled as interaction terms. In “Modeling Multilevel Data Structures,” Steenbergen and Jones
gives thorough introduction to the empirical methodological approach from an angle that
emphasizes the multilevel-data context, in which the researcher observes multiple microlevel
observations within each of multiple macrolevel units and a key part of the aim is, typically, to
model the variation in parameters of the microlevel model (often behavioral) depending on
differing macrolevel contexts or differing values of conditions in the macrolevel contexts (often
institutional). Lastly in section 4c, Park, Gelman, and Bafumi’s “Bayesian Multilevel Estimation
with Poststratification: State-Level Estimates from National Polls” represents a current emphasis
in the use of multilevel models, from leading figures in their development, one that returns in fact
to one of its original motivations in the education literature, which is to “borrow leverage” across
many macrolevel units (originally: across schools, for instance; in Park et al.: across states) to
estimate quantities at the microlevel wherein the number of observations is few (some or all
schools have few surveyed classrooms or students; some states have few respondents in national-
level polls).
Section 5 turns from heterogeneity across observational units, of various forms, modeled in
various ways, to dynamics, temporal dynamics within units and/or spatial cum network dynamics
across units. Dynamics, it turns out, also provide transition from the second central challenge of
context conditionality, because outcomes and effects of explanators vary in a sense over the
courses of dynamic processes, to the third central challenge, ubiquitous endogeneity, because
dynamic processes tend to leave outcomes in different observations interdependent with each
other, directly and without caveat in the spatial-interdependence / spatial-dynamics case and in a
Page 20 of 40
way or potentially in temporally dynamic contexts. The crucial features that change as in the move
to observations in dynamic processes are, first, that observations cannot be taken as independent,
i.e., each new observation is not wholly new information, as it depends on, overlaps with, comes
from previous or other-units’ observations. The empirical methodology, therefore, must properly
account the inter-observational dependence so as not to lie, usually egregiously, about the amount
of information in the data and so about the certainty of estimates. Secondly, and still at this rather
technical level, failure to model the dynamics omits an important part of the causal process from
the model, and so likely induces bias in the parameter estimates of the usual omitted-variable sort.
The omitted factor, the past or history in the temporal dynamics context for instance, is probably
among the, if not the, most important feature of the causal process, so its omission can have
enormous bias consequences.11 More importantly from the perspective of the present collection
even than these large technical issues of obtaining quality parameter and certainty estimates in
dynamic contexts, however, is that dynamic processes imply that the responses in outcomes to
causal factors, x, cannot be captured by one, scalar number, like a single coefficient, β, or a
difference in means, ( | 1) ( | 0)i i i iE y x E y x= − = . The response in y to some movement in x is
dynamic. This means that the response to y to some shift in x in a temporally dynamic process, for
example, will be a stream of how y responds over time to this hypothetical movement in x, a whole
set of y’s period by period after the shock to x. And the hypothetical needs to be specified more
11 In the temporal dynamics case, this “bias” is partly a small-sample issue (i.e., samples with T, the number of time-periods, small) and partly a “bias” of different sign in each sample, such that they may cancel across repeated samples. The upshot in that case is that the properties of estimates in temporally dynamic data that fail to model that process are unbiasedness and consistency and merely inefficiency. It’s just that the inefficiency in the temporal-dynamics case is generally enormous, such that the unbiasedness across many samples or in infinite samples is cold comfort since the only one estimate we have in a sample decidedly less than infinite is so variable that we have little reason to believe the estimates are anywhere close to the parameters. In the spatial dynamics case, biasedness (as well as inconsistency and inefficiency) from omitting the interdependence follows more directly and surely, without need of such further explanation, although the magnitude of these flaws tends be smaller than in the temporal dynamics case because cross-unit dependence is typically not as strong as over-time dependence within the same unit.
Page 21 of 40
precisely, i.e., dynamically, also: how exactly does the hypothetical stream of x’s values occur
over time, a +1 unit shift in x in some period t0, reversed back to 0 the next period and 0 thereafter,
or a permanent +1 shift starting in t0 or some other stream of movements in x? In space even more
so, we must specify the hypothetical movements in x across all units, as well as the connections
between units, before can begin to calculate the response in y, which likewise is a vector, and not
a scalar, of responses across all units i. All this adds to the important conclusion that we must
model the dynamic processes and then interpret the estimation results in a manner appropriate to
that dynamic context, i.e., dynamically.
Section 5a begins the treatment of dynamics with a focus on temporal dependence, first with
an extraordinarily helpful overview of the main temporally dynamic models in Beck’s “Comparing
Dynamic Specifications: The Case of Presidential Approval.” That article introduced political-
science empirical researchers and methodologists to the DHSY (Davidson, Hendry, Srba, and Yeo
1978) form of the error-correction model, which gives a single-equation expression of the possibly
complex differentiation of equilibrium and transitory dynamics in relationships of x to y. In
“Taking Time Seriously,” De Boef and Keele give an updated overview of alternative time-
dynamic models, showing the close relations between several of them and their possible grouping
under the general rubric of, as various specifications of, the autoregressive distributed lag model.
Both these first two selections also focus appealingly on the interpretation of estimation results
from dynamic models. In “Taking Time Seriously: Time-Series–Cross-Section Analysis with
a Binary Dependent Variable,” Beck, Katz, and Tucker offer a simple way to account time
dependence in models for binary outcomes. Carter and Signorino’s “Back to the Future: Modeling
Time Dependence in Binary Data” gives an even simpler expedient that can properly account time
dependence on binary-outcome serial observations at least as effectively. These are crucially
Page 22 of 40
important contributions insofar as binary outcomes are very common in political science and that
the first-order directive to empirical methodologists and researchers is, Hippocrates-like, “First,
do not lie” with statistics about how much information you have. Either of these two methods
makes feasibly expedient the accounting of time dependence in binary outcomes that had
previously been rather intractably challenging methodologically. Directly autoregressive
specification of temporal dependence would afford fuller interpretation of time-dynamic processes
in binary outcomes, but these more directly modeled attacks on dynamics in binary outcomes
would have to await methodological and computational advances that appear in this collection
under the topic of spatiotemporal binary-outcome models. In “Time Is of the Essence: Event
History Models in Political Science,” Box-Steffensmeier and Jones overview event-history models
in which the time and timing of events, like the end of parliamentary governments or of wars or
the timing of position-taking by legislators or by candidates, is the outcome variable of interest.
Event-history modeling is one of the most flexible and powerful tools in the empirical
methodologist’s toolbox for making temporal processes directly the object of study, and Box-
Steffensmeier and Jones are the leading political methodologists in the event-history domain.
Section 5b in Volume IV moves from time series, generally of single units, to time-series cross-
section or panel data in which several or many units are observed for multiple time periods.12
Analysts typically distinguish subtypes of time-series cross-section datasets by numerous different
overlapping considerations that are, unfortunately, not applied universally across literatures or
authors in the same way. Some emphasize whether the units are the same from one period to the
next – in which case, by one definition in usage, the data are panel or time-series cross-section
12 Technically, all data are time-series cross-section data, since everything observed is observed in some place at some time. It simply happens that some datasets have only one unit on the cross-sectional dimension and some datasets have only one time-period on the temporal dimension.
Page 23 of 40
data – or each time period is a new sample of cross-sectional units, in which case the data are
“repeated cross-sections”. Others differentiate between panel datasets, in which by this schema the
number of units, N, is large (often a microlevel survey and so around or more than a thousand) and
the number of time periods, T, is small (typically literally just a few), and time-series cross-section
datasets in which the number of time periods is relatively large (at least around 20 or so) and
typically the number of units is smaller (not more than a couple hundred, and usually in the 20-50
range, in most political-science applications). None of these distinctions makes any difference in
terms of what sorts of theoretical and empirical models we might write, but, in practice, the sizes
of the cross-sectional and temporal dimensions make a great deal of difference in what statistical
procedures for estimating these models are advisable or even feasible. Summarizing in brief,
longer-T datasets can afford richer temporally dynamic specifications, larger-N datasets are
necessary for models that imply and estimation procedures that rely on variation across units.
Stimson’s “Regression in Space and Time: A Statistical Essay” introduced empirical
researchers and methodologists in political science to the complications that arise along with the
enormous advantages from having data across multiple units and multiple time-periods.
Ultimately, these complications are the increased strain on the maintained assumptions of constant
coefficient parameters, of observations drawn from distributions all with the same constant
stochastic-term variance, and of independence of those observations (conditional on the
explanators, x). Especially cross-sectionally, we tend to suspect non-constant variance (i.e.,
heteroskedasticity across units13) and cross-unit dependence; and with respect to over-time: we
already know that the existence of temporally autoregressive dependence is a certainty. Stimson
follows the first mooted strategies for this complex terrain, which were to model the cross-unit
13 Heteroskedasticity over time is also possible, and methods for modeling and/or properly accounting such cross-temporal nonconstant conditional variance (i.e., heteroskedasticity) also exist elsewhere in the literature.
Page 24 of 40
heteroskedasticity and interdependence, and the temporal autocorrelation as features of the
stochastic error-term, to be estimated by Feasible Generalized Least-Squares (FGLS) in the so-
called Parks (1967) Procedure. In two enormously influential articles, Beck and Katz (1995, 1996)
famously explained the shortcomings of Parks Procedure FGLS in samples with T not notably
greater than N (i.e., T two times N or better). In a nutshell, the first stage of FGLS estimates the
error-covariance structure, the “inverse of the square-root of which error variance-covariance”
(loosely speaking) weights the data to obtain in a second stage asymptotically efficient coefficient
and proper standard-error estimates. When T is not some multiple two or greater times larger than
N, the number of parameters estimated in the error covariance specified by the Parks Procedure is
high relative to the effective number of observations available for estimating them, and the
asymptotic efficiency and quality standard-error estimates of FGLS do not materialize. In these
seminal articles, Beck and Katz (1995, 1996) argue for using robust standard-error estimation,
specifically what they call Panel-Corrected Standard-Errors or PCSE’s in samples of these
dimensions instead since the asymptotic efficiency of Parks’ FGLS will not materialize and the
standard errors are even worse. This collection includes a later review article by Beck and Katz,
“Modeling Dynamics in Time-Series–Cross-Section Political Economy Data,” which reviews
these issues and more in summarizing how “best practices” in time-series-cross-section data-
analysis are understood to look today.
When the number of time periods in a time-series cross-section dataset are few, Hurwicz-
Nickell (1950, 1981) bias – a small-sample bias of order 1/T, in which temporal-dependencies
expressed as coefficients on autoregressive time-lagged dependent variables are underestimated in
time-series models with intercepts (Hurwicz 1950) and, by exactly the same set of issues, in panel-
data models with fixed effects, i.e., unit-specific intercepts (Nickell 1981) – can be appreciable.
Page 25 of 40
For instance, in panel datasets, where T is often not more than 4 or 5, the downward bias in
estimated autoregressive parameters, ρ, is on the of -20% to -25%. The bias in estimated long-run
multipliers, which are equal to 1/ρ, are even more worrisome. The selection “Estimating Dynamic
Panel Data Models in Political Science” from Wawro explains the instrumental-variable strategies
(associated in econometrics with the work of Arellano and Bond 1991 among others) that leverage
the panel structure of the dataset to obtain consistent estimates of these temporal dynamics. In
time-series cross-section data analysis, then, we return to the thorny issue of “To FE (or RE) or
Not To FE (or RE)” that was raised in the context of cross-unit heterogeneity in Section 3. If we
include unit fixed-effects, they will overfit and jeopardize thereby our estimation of substantive
effects, perhaps most especially manifesting we now learn in an underestimation of temporal
dependence. If we omit unit fixed-effects, we remember from the fright-inducing Figure 1, we may
be vulnerable to badly biasing our estimates of substantive effects, including perhaps especially
we now note over-estimating temporal dependence. And, if we try random effects instead, we
should have learned, those random-effects estimates are identified off the assumption that the
unobserved cross-unit heterogeneity is uncorrelated with the observed heterogeneity (i.e., included
regressors), and so we simply assume there would be no bias from any unmodeled cross-unit
heterogeneity. Brandon Bartels, in a selection entitled “Beyond ‘Fixed versus Random Effects’: A
Framework for Improving Substantive and Statistical Analysis of Panel, Time-Series Cross-
Sectional, and Multilevel Data,” tries to offer a strategic framework for navigating between this
Scylla and Charybdis of random versus fixed effects specifically in dynamic models.
Section 5c turns our attention toward cross-unit, i.e., spatial interdependence, and spatial and
spatiotemporal dynamics. In “Empirical Models of Spatial Inter-Dependence,” Franzese and Hays
emphasize first the likely ubiquity of cross-unit interdependence, i.e., that the outcomes in some
Page 26 of 40
units depend on outcomes in other units, in social science. They then show that failing to model
adequately this spatial interdependence will bias coefficient estimates of other substantive, non-
spatial, explanatory factors. Worse, failing to recognize the spatial (cross-unit) interdependence of
social-science data neglects the dynamic nature of effects – i.e., of responses in outcomes, y, to
hypothetical changes in X – implied by the spatial interdependence (by completely omitting these
dynamics). Spatial interdependence gives rise to spatial dynamics, which means the effects of any
change in some x in any unit i affects not only the outcome in i but also the outcomes in other units
j to which i is connected, and, from there, to the units with which j is interdependent, including, as
space is omnidirectional, feeding back into i, and from there back out to the original j’s, and so on
and so forth, in feedback ripples expanding outward and bouncing back inward. Franzese and Hays
then provide overview of some of the statistical methods for properly estimating empirical models
with spatial interdependence (and temporal dependence), namely: spatial-autoregressive models,
and then explain appropriate methods for interpreting the spatially and spatiotemporally dynamic
implications of such spatial-autoregressive models.
Spatial-statistical analysis and network analysis are closely related, and overlap in some ways,
although they are not identical topics. The cross-unit spatial dependencies and the resulting
spatially dynamic effects that Franzese and Hays emphasize are identical with one sort of network
dependence and network effects to which network analysts refer. Indeed, the spatial-weights matrix
of relative connectivity between units from spatial analysis, W, is the network of connections
(a.k.a., edges, ties) between units (a.k.a., nodes) from network analysis. One sort of network effect
of which network analysts speak is exactly that which spatial-lag autoregressive models specify:
the effect on the outcome or behavior of unit i of the outcomes or behaviors of units j (spatial-lag
y model) and/or other characteristics of units j (spatial-lag X model). Network effects, however,
Page 27 of 40
may include also a second and/or third sort of effects. Second, effects on units (nodes) of the
network structure itself – e.g., networks dense with ties, or with ties in particular configurations
like hub-and-spike, may induce different unit behaviors than networks sparse with ties, or in other
configurations like rings or chains. And third, effects on units/nodes of their position within or
relation to the broader network structure – e.g., nodes with many ties may behave differently than
nodes with few, or nodes in lynchpin connecting positions between clusters of connected other
nodes may behave differently than others by virtue of that position (a strategically advantageous
one in many substantive contexts). Furthermore, the spatial-econometric approaches that the
selection from Franzese and Hays cover emphasize contagion processes, generally taking the
connections between units as given exogenously, and stress Galton’s problem of distinguishing
contagion processes from correlated exposure to exogenous environmental conditions as the
source of, or mechanism underlying, diffusion or cross-unit correlation; network analytic
approaches, contrarily, tend to emphasize the formation of the network, i.e., of these connections
between units as the outcome of interest to be explained, and stress the challenge of distinguishing
this network-tie selection/network formation from contagion as the central empirical challenge.14
The selection from Ward, Stovel, and Sacks, “Network Analysis and Political Science,” gives very
helpful introductory survey of network analysis and network-analytic methods for political-science
empirical researchers and methodologists. Cranmer and Desmarais’ “Inferential Network Analysis
with Exponential Random Graph Models” gives closer methodological coverage of network-
analytic methods, with an approach that emphasizes substantive and theoretical specification and
14 Notice, too, that each approach tends to neglect, relatively at least, the third process from its perspective. Spatial econometrics generally ignores network-selection by assuming tie formation given exogenously; network analysis, conversely, in focusing on selection versus contagion, as in the classic case of whether smoking is contagious among friends or that smokers and nonsmokers are more likely to become and remain friends, tend to relatively downplay the possibility that some exogenous other factors, like certain values of individuals’ sociodemographics may make them both more likely to smoke or not to smoke and to become and stay or not to become and stay friends.
Page 28 of 40
interpretation of the network models in a way that very well befits the orientation of this collection.
The selection by Franzese, Hays, and Cook, lastly in this section, tackle squarely the computational
and methodological challenges of estimating and interpreting empirical models of binary outcomes
that directly specify the simultaneity of spatial-, temporal-, and spatiotemporal-autoregressive
processes.
Simultaneous cross-unit dependence connects the topic of Section 5, dynamics, to the topic of
Section 6, endogeneity. In terms of the issues raised for statistical inference, simultaneous cross-
unit dependence – yi causes yj simultaneously with yj causing yi – is isomorphic with the textbook
endogeneity of xi causing yi simultaneously with yi causing xi. Both can be represented as systems
of causal equations. From the time scholars first coined the very old adage that correlation does
not imply causality, empirical researchers have known that endogeneity renders estimates of
empirical relationships potentially misleading as estimates of causal relationships. This sharp
segregation of empirical relationships from causal relationships hints at a different fundamental
problem of empirical causal inference: namely, causality is fundamentally a theoretical property,
not an empirical one. The logical fact that causality is a theoretical property shows also how we
might gain empirical leverage on the theoretical entity of causal relationships. As a simple example
illustrates, each observation of a pair of x and y when x may cause y and y cause x simultaneously,
each one paired observation of an x and y, brings with it two parameters to estimate, the partial
relationship of x to y and the partial relationship of y to x:
1
1
are solutions.
x yx y y x x y y x
y x
x y y x
y xy y
x yβ
β β β ββ
β β
→
→ → → →
→
→ →
−
= = == =any
(1).
To get empirical estimates of causal relationships, therefore, we must somehow add some
extra-empirical information; we must bring more information to the analysis beyond the empirical
Page 29 of 40
information we can find from observation, be that observation of things in the “real world” or of
things in laboratories. The empirical strategies for causal inference covered in Section 6 all do
exactly this: they bring some extra-empirical information, some imposition as known fact, rather
than unknowns to be estimated, of some aspects of the analytic variables’ relationships.
Experimental methods, e.g., rely on knowledge that the researcher controlled application or
not of the value of treatment x that observation i received and so the outcome of interest, y, being
measured could not cause the values of x. Randomization of that controlled treatment, then
suffices, in sufficient sample-sizes, if the randomization was effective, to ensure that the controlled
values of x that were assigned also could not correlate with other potential causes of y, even if
these other potential causes are unobserved or even as yet unconceived. This is truly an enormous
pair of advantages in ensuring that, if one finds a relationship between x and y, the relationship
must contain a causal process. The careful phrasing here is intentional: we know there’s some
causal process connecting x to y if we find an (well-designed and -conducted) experimental
association; we of course do not know that the causal process is the one the researcher thinks it is.
In social science, the researcher’s experiment and treatment, for instance, are typically very small
parts of the much larger games the subjects are playing. (Students in classrooms are playing “18-
22 year-olds in college,” for example; not the professor’s ultimatum game.) This is sometimes
called the problem of external validity of the treatment: the treatment as felt by the subject may
not correspond to the treatment as conceived by the experimenter. Another well-known converse
external-validity cost to the great internal-validity advantage of experimentation in establishing
causal-effect existence is that the experimental subjects may not be representative of the population
to which we want to apply the experimental result. (Again, students in college classrooms may not
be a representative sample of voters, never mind of policymakers…) This is external validity of
Page 30 of 40
the experimental sample concern. A less well-appreciated form of external-validity concern with
experimentation might be labeled the external validity of the context concern, and this one gets
back to the simultaneous-equations representation of multidirectional causality in equation (1) and
the dynamics of Section 5. In the presence of feedback, such as the cross-unit feedback of spatial
interdependence, or even just the forward dependence of temporal dependence, or the cross-
variable interdependence of simultaneous causality like in equation (1), the experimental
demonstration of the existence of some causal process connecting x to y is very poor answer to
what is the effect of x on y because the experimental result does not and cannot estimate these
dynamics or feedback in the response of y to x. Cannot because the experimental researcher
controls x; therefore, the response in y which in the true simultaneous system would feedback to
further movement in x is broken by the experimental control. This is another way of understanding
the external invalidity of context inherent in experimental social science: we don’t want to know
the effect of randomly assigned U.N. mosquito-netting programs; we’d like to know what the
effect of an actual U.N.-administered mosquito-netting program would be.
As was the case regarding the explanation of the weighty tradeoffs between controlling or
omitting cross-unit heterogeneity by fixed or random effects, the point here of explaining the very
real, and at least sometimes very severe, external-validity costs, and costs in terms of well-
estimating causal dynamic responses with feedback, of empirical causal-inference strategies based
in experimentalism – and the sometimes-argued notion that internal validity trumps external
validity is nonsensical, or perhaps even exactly wrong: perfect determination of causality with zero
applicability beyond the sample is at least as useless as an estimated association with zero causal
content but universal applicability – the point of emphasizing these limitations is not that we should
not do experimental social science, or that we should give up on causality in social science, but
Page 31 of 40
rather that we understand and recognize the unavoidable necessity of bringing extra-empirical
information to the analysis to obtain causal leverage and that the numerous different strategies for
doing so, some more “structural” and some more “non-parametric”, bring different strengths in
these tradeoffs between causal-effect inference and causal-response estimation, and between
internal validity and external validity (both of which are generally desirable, of course, the latter
at least equally so as the former).
Section 6, therefore, reviews a wide gamut of empirical strategies for causal inference or causal
estimation, starting in Section 6a with the classical instrumental-variables (IV) approach (a more-
structural approach). A reader following these volumes and selections sequentially who needs an
introduction to instrumental-variables estimation will want to jump ahead to read first the selection
from Jackson that leads Section 6b to get excellent coverage of standard IV-methodology without
embellishments or critical amendments. Bartels’ “Instrumental and ‘Quasi-Instrumental’
Variables,” which leads 6a, shows that the standard proof that IV estimation is consistent and
asymptotically efficient, regardless of the strength of covariance between the instruments and the
instrumented variables, as long as the covariance between the instruments and the true residual is
exactly zero, i.e., if the instruments are perfectly exogenous – this selection from Bartels shows
first that this result, of the consistency and asymptotic efficiency of IV regardless of the strength
of the instruments, requires perfect exogeneity (and infinite sample). If the instruments are to any
degree, however tiny, imperfect, if their residual correlation is not perfectly zero, then the
asymptotic bias and inefficiency of the IV estimator grows with the ratio of the instruments
imperfectness (correlation with ε) to its strength (correlation with the instrumented variables). In
other words, in practical, applied terms, the quality of IV estimates deteriorates proportionately
with the ratio of the instruments’ imperfection to their strength. Bartels’ piece then underscores an
Page 32 of 40
advantage of Bayesian estimation in this context that one could easily apply the “exogeneity of the
instruments assumption” as a matter of continuous degree in the Bayesian framework, whereas,
strictly speaking, exogeneity/endogeneity is indivisibly dichotomous in classical approaches:
instruments either are endogenous or are exogenous. In “Instrumental Variables Estimation in
Political Science,” Sovey and Green then give “a [skeptical] reader’s guide” survey of IV usage in
the field, and in “Model Specification in Instrumental-Variables Regression,” Dunning highlights
a hidden assumption in IV estimation (in all empirical causal strategies) that the causal relationship
identified by the instrumentation (by the extra-empirical information imposed on the empirical
analysis) is the same causal relationship that applies beyond that identification context. In IV, as
the article shows, this means that “the instrumented part of endogenous regressor x” must have the
same relationship to y as the “endogenous part of x” “instrumented away”. If we use rainfall to
instrument for economic growth in an empirical model trying to estimate the causal effect of
economic growth on civil war, to use an example from the selection, we need that rainfall/drought-
induced economic booms and busts have the same effect on civil conflict as does economic growth
or recession of other provenance. This is an important point to keep in mind, and it applies in
general to causal estimation, not just to IV. (In a perfect randomized controlled trial, for example,
we must assume that treatment applied randomly has the same effect as would the nonrandomly
applied treatment the response to which we actually want to know.)
A more-efficient means of estimating causal systems of endogenous equations, working from
the same or similar spirit as IV estimation is Full-Information Maximum-Likelihood (FIML). As
the name suggests, the FIML strategy in a nutshell is to specify whole systems of endogenous
equations and estimate them in some fashion simultaneously. From a perspective that would
emphasize causal estimation of dynamic responses in outcomes of interest to x’s, and to each other,
Page 33 of 40
inclusive of the cross-unit and cross-variable feedback that perhaps ubiquitously characterizes
socio-politico-economic reality, FIML would be the gold standard and, in this sense, the structural
end of a causal-estimation / causal-inference spectrum, the other end of which would be the non-
parametric experimental analysis. Jackson’s “Endogeneity and Structural Equation Estimation in
Political Science” covers FIML from an instrumental-variables perspective. Hays and Kachi’s
“Interdependent Duration Models in Political Science” shows the statistical isomorphism of cross-
unit and cross-variable simultaneity, and through this isomorphic lens shows how the multivariate
change-in-variables theorem used in spatial analysis to yield the joint likelihood for the N (number
of units) equations of a spatial cross-section can be applied as well to estimate the N×M (number
of endogenous variables) simultaneous equations of a spatially (or spatiotemporally) dynamic
system of endogenous equations. In single-equation instrumental-variables estimation, we impose,
ideally from theoretical or substantive knowledge but extra-empirically in any case, some features
from other equations to “tie down” some aspects of the equation with endogenous regressors to be
estimated. In FIML, we specify the entire system (ideally, in as theoretically and substantively
motivated fashion as possible) for joint-likelihood maximization. Reed’s “A Unified Statistical
Model of Conflict Onset and Escalation” provides another substantive example of application of
this “whole-systems” approach, one that emphasizes temporal sequencing as well and so nicely
transitions to a third strategy for causal estimation.
Another strategy that yields some potential for empirically estimated relationships to be give
some causal interpretation is to rely on the time sequencing of cause and effect, i.e., cause before
effect, being hopefully reflected in the time sequencing of measured movements in variables.
Miller’s “Temporal Order and Causal Inference” is the classical statement for political science
empirical researchers and methodologists of what we need, at a general and abstract level, for this
Page 34 of 40
statistical post hoc ergo propter hoc strategy to shed causal light. Vector Autoregression (VAR)
(Granger 1969) takes a full-throated temporal-sequencing approach to estimating simultaneous
relationships. In brief, simple terms, VAR regresses each endogenous variable on its lags and lags
of all the other variables to conduct Granger-Causality tests, and to estimate the temporally
dynamic responses of the endogenous variables to each other, called impulse-response paths, and
to ascribe shares of variation in each variable’s dynamic responses to shocks to each variable,
called variance decompositions. Freeman, Williams, and Lin give authoritative coverage of vector
autoregression (VAR and relatives like VEC, vector error-correction) for political science in
“Vector Autoregression and the Study of Politics”. Sattler, Brandt, and Freeman’s “Democratic
Accountability in Open Economies” is an application to the extremely interesting question of how
globalization may be affecting democratic-government accountability for economic performance
of a modern variant, Bayesian Structural VAR, in which the structural modifier indicates the
imposition of, in addition to the temporal-ordering imposed by VAR, instrumental-variable type
exclusion restrictions on which endogenous and exogenous variables enter which equation.
These first three approaches to the challenge of ubiquitous endogeneity emphasize causal
estimation and perhaps prioritize external validity. Being more structural, and in particular yielding
results that are estimates of (causal) models, they afford estimates of dynamic responses, inclusive
of feedback in complex contexts. Starting from the other end, the approaches that emphasize causal
inference, i.e., of internal validity in identifying existence of causal effects, the gold standard is
the randomized controlled experiment. In Section 6d, McDermott surveys “Experimental Methods
in Political Science” and Druckman, Green, Kuklinski, and Lupia celebrate “The Growth and
Development of Experimental Research in Political Science.” These reviews include laboratory,
field, and survey experimentation in their purview. Laboratory experimentation in political science
Page 35 of 40
presumably comes closest to being able to approximate the ideal of the randomized controlled trial,
the lacks being perhaps minimal and related perhaps solely to the addition of the human element
in the experimental subjects from the natural-science laboratory. Survey experimentation limits
the sorts of treatments that can be applied (question wording or ordering and the like) and adds
some distance from the treatment as applied to as conceived. (Wording can typically only “prime”
certain considerations, like partisanship of a bill supporter for instance, to some respondents and
not others, when what we may want to know is the response to actual sponsorship change not to
more or different mention of it.) The outcomes measured in survey experiments are also often
more distant in this way from the outcomes of interest: survey-response statement about policy
preference, not any actual political action taken, like votes cast, for example. Field
experimentation, finally, has much to argue for it, in that the experiment occurs closer to or exactly
in the substantive context of interest. A certain amount of control and effective pureness of
randomization is typically sacrificed in moving to the field from the laboratory, but perhaps not
always over-much. (One cannot control what other get-out-the-vote stimuli voters may or may not
have received than the experimenters’ randomly sent postcards or phone calls, for example, nor
whether recipients read/answer them. Nor is to what else subjects may be exposed randomized,
but rather a matter of subject choice just as much as in observational data, perhaps even choice
made in response to receipt or not of treatment.) These limitations, though real and non-negligible,
do however come along with some steps toward the converse enormous advantages in blocking
reverse-causality or confounding (i.e., omitted variables), even unobserved or as-yet unconceived
confounding that perfect experimentation provides fully.
Sections 6e through 6g review methods for empirical causal-inference in “observational”, i.e.,
non-experimental, studies. In “Matching as Nonparametric Preprocessing for Reducing Model
Page 36 of 40
Dependence in Parametric Causal Inference,” Ho, Imai, King, and Stuart emphasize a particular
use to which to put matching methods, which are strategies based on control not (directly) by the
partialing of multiple regression, but rather by a pre-estimation procedure of finding comparison
pairs or sets of observations that “match” values or distributions of values on (observable) potential
confounds. In multiple regression, one controls for linear relations of (observable) potential
confounds, called control variables in that context; in matching, one controls any sort of relations
of (observable) potential confounds since we compare observations that get our treatment to ones
that do not but with both treated and controlled having effectively (as determined by the matching
procedure) the same values on the confounds. The selection from Ho et al. then notes that one
could therefore view matching as a way to control non-parametrically in observational studies.
(Notice how, from a causal-inference view like this, “model dependence” is pejorative. From a
causal-estimation view, estimation of models was essential, inherent to the aim of estimating
dynamic causal responses inclusive of feedback.) In “Opiates for the Matches: Matching Methods
for Causal Inference,” Sekhon critically surveys matching methods for political science empirical
researchers and methodologists from a causal-inference perspective. The fundamental weakness
of matching methods as a tool for causal inference from this perspective is that, exactly as in
regression analysis, matching can only control for observables. Unobserved heterogeneity or
causal confounding, the popular core challenge from the microeconometric perspective
emphasized in Section 4, remains unaddressed.
The last two methods of the collection, the Regression Discontinuity Design (RDD) and
Difference-in-Difference (D-in-D) Methods, address most centrally this challenge of unobserved
heterogeneity. Regression discontinuity uses the as-if randomized feature of natural experiments
Page 37 of 40
(or, to the critic, the hoped-for as-if randomization15). If we have some observable random variable,
called the index variable, above some threshold value of which index variable the treatment of
interest is applied and below which critical value the treatment is not applied, then exactly at the
threshold value, only the randomness of index variable determined whether a unit got the treatment
or did not. So, right at that threshold on the index, it is as-if the treatment were randomly assigned.
Near the threshold, mostly the random noise determined if the treatment is assigned. Therefore,
one could perhaps use observations near the threshold to achieve equivalently nearly the
experimental advantages of randomization, including the control for “unobserved confounds”. The
classic origins of the design are an exam, the score on which determines a scholarship awarded or
not above or below some threshold. Comparing students who got the scholarship from those who
did not just to either side of the threshold should effectively randomize assignment of scholarship
and so lend strong internal validity to a causal interpretation of the difference in means across the
threshold. In “A Regression Discontinuity Design Analysis of the Incumbency Advantage and
Tenure in the U.S. House,” Butler applies this approach to estimate incumbency advantage, which
has been its most-common application in political science. Caughey and Sekhon give another
application to that subject, sounding a cautionary note in “Elections and the Regression
Discontinuity Design: Lessons from Close U.S. House Races, 1942–2008,” about the most-likely
violation of the conditions under which RDD can support causal interpretation, which is “non-
sorting”. Essentially, observations near the threshold cannot be special, and in particular they
cannot self-select or be selected to be near the threshold. Caughey and Sekhon find some warning
signs in the (cynically unsurprising) correlation of the partisanship of local election officials and
that of the winners of close elections.
15 The critic’s favorite joke about natural experiments may be revealing: like the Moral Majority, natural experiments are neither. –anonymous.
Page 38 of 40
Finally, difference-in-difference designs use the fact that time-differencing observations on a
unit will difference away any fixed unit characteristics. (This is the same fact deployed in the fixed-
effect estimation of Section 4a.) If we observe two groups at times t1 and t0, one of which received
the treatment of interest and another that did not in time t1, then the difference between the units
of the difference from t1 and t0 will be the causal effect from that beginning to end period for that
treated group, under the assumption of parallel trends, which boils down to the two groups’
outcomes trending the same way absent the treatment, which in turn means only the treatment
differentiates the two groups’ different experiences of t0 versus t1.16 Donald and Lang’s “Inference
with Difference-in-Differences and Other Panel Data” covers D-in-D for this collection.
In spanning the range of political science empirical methodology, from its origins historically
and substantively in, first, conceptualizing the domain of political science and the central
challenges to empirical research in that domain, to modern emphases on sophisticated models for
causal estimation and careful strategies for causal inference, two very different and both very
important enterprises, this collection aims to provide a library for empirical political-science
methodology. In doing so, it also seems to have highlighted the enormous progress of the subfield,
from extremely productive and forward-moving efforts at its origins to continuing diversification
of that fruitfulness and its ever-further advancement. If the next thirty years of political
methodology can sustain the vibrancy of the first thirty even while advancing and diversifying, the
future for empirical research in political science will be a very bright one.
16 Or anything else that differentiates the two units’ differences from t0 to t1 is either irrelevant to the outcome or uncorrelated with treatment (i.e., the usual conditions for an omitted variable not to bias included variables’ estimates).
Page 39 of 40
REFERENCES
Arellano, Manuel, and Stephen Bond. 1991. “Some Tests of Specification for Panel Data: Monte Carlo Evidence and an Application to Employment Equations.” The Review of Economic Studies 58(2): pp., 277-297.
Beck, Nathaniel, and Jonathan N. Katz. 1995. “What to Do (and Not to Do) with Time-Series Cross-Section Data,” American Political Science Review 89(3): pp., 634-647.
Beck, Nathaniel, and Jonathan N. Katz. 1996. “Nuisance vs. Substance: Specifying and Estimating Time-Series-Cross-Section Models." Political Analysis 6(1): pp., 1-36.
Brady, Henry E., David Collier, and Janet M. Box-Steffensmeier. 2011. “Overview of Political Methodology: Post-Behavioral Movements and Trends,” in Oxford Handbook of Political Science, Robert E. Goodwin, ed.
Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications. Cambridge: Cambridge University Press.
Clark, William Roberts, Matt Golder, and Sona Nadenichek Golder. 2013. Principles of Comparative Politics, 2nd ed. Los Angeles: Sage Publications / CQ Press.
Davidson, J. E., Hendry, D. F., Srba, F., & Yeo, S. 1978. “Econometric modelling of the aggregate time-series relationship between consumers' expenditure and income in the United Kingdom.” The Economic Journal 88, pp. 661-692.
Franzese, Robert J., Jr. 1999. “Partially Independent Central Banks, Politically Responsive Governments, and Inflation,” American Journal of Political Science 43(3): pp. 681-706.
Granger, Clive W. J. 1969. "Investigating Causal Relations by Econometric Models and Cross-Spectral Methods." Econometrica 37(3): pp., 424-38.
Holland, Paul W. 1986. "Statistics and Causal Inference." Journal of the American Statistical Association 81(396): pp. 945-960.
Hurwicz, Leonid. 1950. “Least-Squares Bias in Time Series,” in Statistical Inference in Dynamic Economic Models, Tjalling C. Koopmans, ed. (New York: Wiley).
Jackson, John E. 2012. “On the Origins of the Society [We’re Not Lost, But How Did We Get Here],” The Political Methodologist 19(2): pp. 2-7.
Kelvin, Lord (Sir William Thomson, Baron Kelvin of Largs). 1883. Popular Lectures and Addresses, “Electrical Units of Measurement” [1883-05-03].
King, Gary. 1991. “On Political Methodology,” Political Analysis, Vol. 2: pp. 1-30.
Nickell Stephen. 1981. “Biases in Dynamic Models with Fixed Effects.” Econometrica 49(6): pp., 1417-1426
Page 40 of 40
Parks, Richard W. 1967. “Efficient estimation of a system of regression equations when disturbances are both serially and contemporaneously correlated.” Journal of the American Statistical Association 62(318): 500-509.
Troeger, Vera. (n.d.) Lecture Notes: “Analysis of Panel and Time-Series Cross-Section Data,” Essex University. Shared by personal correspondence.
Wooldridge, Jeffrey. 2002. Econometric Analysis of Cross-Section and Panel Data. Cambridge, MA: MIT Press.