Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Unreported Trade Flows and Gravity Equation
Estimation
Thomas Baranga∗
May 15, 2009
Abstract
Some widely used trade databases do not distinguish between zero and
unreported trade flows. The number of unreported trade flows is high but
they account for a small volume of world trade, so the distinction may
be unimportant for traditional gravity equation estimation. However,
techniques that separately estimate the intensive and extensive margins
of trade may be more sensitive to the distinction. This paper develops a
methodology to consistently estimate the Helpman, Melitz and Rubinstein
model when some trade is unreported. This also breaks the relationship
between the sample selection and heterogeneity correction terms, reducing
collinearity of the regressors. A natural exclusion restriction identifies the
model, removing the need to distinguish fixed from variable costs of trade.
1 Introduction
The new literature on firm-level heterogeneity has revived interest in distin-
guishing the intensive and extensive margins of trade. Helpman, Melitz and
Rubinstein (2008) have demonstrated how, in the Melitz model in which firms∗My thanks for many invaluable conversations and support to Elhanan Helpman, Emilie
Feldman, Ian Martin and Yona Rubinstein. Any mistakes remain my own.
1
vary in their productivity, traditional estimates of the gravity equation confound
the effects of the two margins. HMR develop a methodology with which to con-
sistently estimate both margins, separating out the effects of trade barriers on
firms’ decisions to enter export markets from their influence on the quantities
that firms will export.
HMR’s methodology exploits the presence of country-pairs which do not
trade at all to estimate the role of fixed costs that prevent entry into export
markets, using probit to estimate the determinants of whether there is any trade
in the aggregate. With consistent estimates of the role of fixed costs of trade,
one can also estimate trade barriers’ influence on the intensive margin, taking
into account that exporting firms may have very different productivity levels.
The first stage of HMR’s procedure relies on the presence of zero trade flows
at the aggregate level to estimate the role of fixed costs in firms’ entry decisions.
However, the reliability of this data is questionable. The quality of the reporting
of trade data is very variable over time, and across different countries. If all trade
partners accurately reported their trade flows, we would have two corroborating
reports for each trade flow, from both the exporter and importer. However, it
is well known that there is wide variation in the level of trade reported by each
partner1.
Less well recognised is that countries frequently fail to report their trade
at all. In 1986, the baseline year of HMR’s study, only 112 countries reported
any of their trade to the UN’s Comtrade database, which forms the basis of
Feenstra et al’s dataset, World Trade Flows, used by HMR. Out of HMR’s
original sample of 158 countries, only 103 reported their trade. Furthermore,
reporting is not necessarily complete even among those countries which report
some of their trade. Of the 8927 positive trade flows between partners that both
1For example, Feenstra and co-authors assume that reporting of imports is more accurate
than reporting of exports, and reconcile the difference between the numbers by adopting the
importer’s report when it exists, and the exporter’s if there is no report by the importer.
2
made a report of some of their trade to the UN, 2155 (24%) were reported by
only one partner.
Feenstra’s dataset does not distinguish between flows that are zero and flows
that are unreported, and for traditional estimates of the gravity equation, this
distinction was probably quite unimportant. However, if one does not try to
take this into account when estimating HMR’s model, one would automatically
classify a large number of flows as zero, with implications for the estimates of
the fixed costs of trade. Since 55 countries in HMR’s sample did not report their
trade at all, 2970 observations for which there was definitely no report may be
misclassified. Given that even countries which report some of their trade do not
usually report all of it, the status of trade flows between country-pairs in which
only one side reports will also be somewhat unreliable.
The reliability of reporting depends in part on characteristics of the country,
and also on the size of the trade flow: small trade flows may be more likely to
go unreported than large ones. Since the reporting decision is correlated with
the underlying trading relationship, failing to account for the sample selection
driven by non-reporting may bias estimates.
A second reason to take into account non-reporting is that it weakens the
collinearity of regressors in HMR’s model. In HMR’s original framework, the
correction for sample selection is estimated from the same probit as that for
omitted productivity heterogeneity. While this simplifies the estimation, it gen-
erates collinearity in the model, as discussed further below. Controlling for the
additional sample selection due to non-reporting breaks this connection.
HMR’s original framework used factors that affect fixed but not variable
costs of trade to separately identify the effects of productivity-heterogeneity and
selection. However, finding such variables can be challenging. The introduction
of an additional source of sample selection allows identification of the intensive
and extensive margins without finding a factor that only affects fixed but not
variable trade costs. Some countries do not report any of their trade in a
3
particular year, and for a pair of these countries, we know for certain that a
trade-flow will not be observed. However, a country’s decision not to participate
in the Comtrade database is uncorrelated with their bilateral trade, and so is
excludable from the other two equations.
The following sections of the paper document the quality of reporting and
develop a methodology to consistently control for the sample selection induced
by some countries’ failure to report their trade. The final section compares
estimates derived from HMR’s original technique to the modified approach.
2 Reporting of Trade Flows
There are three major databases for global trade flows. The UN and the IMF
both collect trade data from their members, in the Comtrade and Direction of
Trade Statistics databases respectively. In addition, Feenstra and co-authors
have assembled and maintained a large database, World Trade Flows, which
is derived from both the UN and IMF databases, supplemented by data from
some national trade records.
The Feenstra database makes a number of corrections to the original UN and
IMF data, reconciling importers’ and exporters’ differing reports into a single
number, and correcting entrepot trade flows. It also establishes concordances
between SITC1, SITC2 and SIC codes, allowing matching of disaggregated trade
flows over time and between trade and industrial production. As part of their
procedure for making adjustments to commodity level trade flows, Feenstra et
al benchmark the aggregate trade flow to the level reported in the IMF’s DOTS
data2. The number of countries reporting their trade to the IMF is consistently
2“The decision was to benchmark each country’s total exports to the world to the world
total of imports from that country reported in the International Monetary Fund volumes on
The Direction of Trade ... Data by partner country and by commodity were then adjusted in
various ways so as to be compatible with these control totals,” Feenstra, Lipsey and Bowen
[6], pp.3-4; Feenstra [7], p.3.
4
lower than that to the UN, and this procedure appears to lead Feenstra et al to
omit a large number of small trade flows.
Table 1. DATA COVERAGE, 2001–2005
DATA REPORTED FOR:
Complete Year Part of the Year DATA NOT REPORTED
Number of Percent of Number of Percent of Number of Percent of Countries1 World Trade Countries World Trade Countries World Trade
Exports 2005 96 (72) 92 0 0.00 86 82004 105 (81) 93 1 0.01 76 72003 116 (92) 95 0 0.00 66 52002 116 (92) 96 0 0.00 66 42001 120 (97) 95 0 0.00 62 5
Imports 2005 99 (75) 95 0 0.00 83 52004 107 (83) 95 1 0.01 74 52003 117 (93) 96 1 0.04 64 42002 119 (95) 97 0 0.00 63 32001 123 (100) 97 0 0.00 59 3
1The figures in parentheses indicate the number of developing countries that reported complete data for the respective year.
Figure 1: Taken from the DOTS database’s documentation
Complete documentation of reporting to the UN is available online from the
UN’s Comtrade database. Unfortunately less documentation is available for the
IMF’s DOTS, but Figure 1, taken from the DOTS’ supporting documentation,
summarises the extent of reporting for 2001-2005. The striking feature of Figure
1 is that almost as many countries did not report their trade to the IMF as did.
Although I have not been able to find data on the extent of reporting to
the IMF for HMR’s sample period, it is clear that significantly more countries
reported their trade to the UN than to the IMF. Reporting to the UN from the
Comtrade database is presented in Figure 2 and Table 1. Reporting to both
institutions follows the same trend from 2001-2005 (the years for which the IMF
figure is available), but the number reporting to the IMF is significantly lower.
Figure 3 illustrates the extent of the missing data problem in the sample,
comparing the trade-flows in Feenstra, for the 158 countries in HMR’s sample,
5
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 201080
90
100
110
120
130
140
150
160
170Number of countries reporting trade-flows to the UN and IMF
UNIMF
Figure 2: Extent of Reporting to the UN and IMF
Year UN IMF2001 166 1232002 164 1192003 162 1172004 159 1072005 153 99
Table 1: Comparison of Reporting of Trade-Flows to the UN and IMF
6
1970 1975 1980 1985 1990 1995 20000
1000
2000
3000
4000
5000
6000
7000unreported to UNunreported to UN, pos in Feenstra pos in UN, 0 in Feenstrareported 0 to UN, pos in Feenstratotal missing in Feenstratotal missing/0 in UN, pos in Feenstra
Figure 3: Positive and Missing Trade Flows in Feenstra’s Data
with the data direct from the UN’s Comtrade database3. The blue line shows
how many of the bilateral trade-flows in the UN data are definitely missing
because neither partner reported its trade to the UN in that year4. This moves
inversely with the number of reporters shown in Figure 25. The missing data
problem peaks in the sub-set of Feenstra’s data used by HMR. The green line
illustrates the number of zero trade-flows in Feenstra’s data that the UN records
as positive. The reason these trade-flows are missing from the Feenstra data is
almost certainly because they were not reported to the IMF and so have been
omitted per Feenstra’s benchmarking procedure6.
3To produce a single number from both an importer and exporter’s report, I followedFeenstra’s convention of adopting the importer’s report.
4This is a conservative estimate. When neither partner reports any of their trade we canbe certain that the flow is not observed. Since reporting is incomplete even for countries thatreport some of their trade, it is likely that additional flows are also unreported, particularlythose for which only one partner makes any reports.
5The correlation is not perfect because Figure 2 shows the total number of reporters inthe world rather than the sample. For example, in 1996 the number of reporters to the UNincreased slightly while the amount of missing trade also increased because the number ofreporters in the sample fell slightly.
6To be fair to Feenstra, he does not record non-positive trade-flows as zero - this is aninterpretation imposed by HMR - and in the documentation for the latest revision of his tradedata, he explicitly recognises this problem. In table 1 of Feenstra et al (2005) [8] he lists
7
The black line is the sum of the blue and the green, and shows the number
of observations treated as zero by HMR that are actually either positive, or
should be treated as missing. With 158 countries in the sample, there are 24806
possible trade-flows per year. A conservative estimate of the average number
misclassified over HMR’s sample period of 1980-89 is about 5000, or roughly
20% of the sample, a significant number.
The other three lines in Figure 3 shows how many trade-flows recorded as
positive by Feenstra are missing or zero in the UN data. There are a handful of
such observations, many of which are associated with Taiwan, whose trade was
not officially recorded by the UN but was for a time by the IMF. It seems that
any country that reported its trade to the IMF also reported it to the UN, but
not necessarily vice versa.
0 1 2 3 4 5 6 7 8 9 10 11 120
10000
20000
30000
40000
50000
60000
70000
80000
90000
10x <= trade < 10x+1
Distribution of trade-flows in the data sets, 1970-1997
UN dataFeenstra data
Figure 4: Distribution of Positive Trade Flows
Figure 4 shows the distribution of the positive trade flows in the two data
the countries (only 65 of which are in HMR’s sample) for which he has reported data for1984-2000, and notes “When the two countries are both not included in Table 1, however, thetrade flows for 1984-2000 are entirely missing from the dataset,” pp.2-3
8
sets, and that the positive flows unrecorded in Feenstra (the green line in Figure
3) are mostly small. Feenstra’s data contains no flows less than $1000, and most
of the missing flows are less than $1 million.
1970 1975 1980 1985 1990 1995 20000
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2Volume of Trade Missing from Feenstra and Recorded as Positive by the UN, as % of Total World Trade
% o
f Wor
ld T
rade
Figure 5: % of World Trade Missing from Feenstra by Volume
The small magnitude of the missing flows is reflected in the small percentage
of world trade that is missing from the Feenstra data but included in the UN
data, as represented in Figure 5, averaging less than 1% of world trade over the
whole of Feenstra’s sample, and less than 1% for the sub-sample used by HMR.
For many applications, ignoring these trade flows might be unproblematic,
as they are relatively close to zero and treating them as such should not be a
source of bias. However, HMR’s procedure distinguishes importantly between
zero and positive trade flows. Figure 6 illustrates how many zero trade flows
there are in the two data sets - the difference is quite steady at around 5000,
or 20%, in each year. This suggests that although in terms of trade volume the
missing trade is not significant, in terms of the number of observations, about
20% are misclassified in HMR’s original dataset.
9
1970 1975 1980 1985 1990 1995 20000
2000
4000
6000
8000
10000
12000
14000
16000Zero trade flows in the different data sets
UN dataFeenstra data
Figure 6: Zero Trade Flows by Year
1970 1975 1980 1985 1990 1995 20000.75
0.8
0.85
0.9
0.95
1Correlation of Feenstra and UN trade data
Conditional on both flows positiveConditional on both flows observed
Figure 7: Correlation of Trade in UN and Feenstra datasets
10
Feenstra’s dataset is not appropriate for HMR’s analysis, as one cannot
distinguish between flows that are missing or are really zero. Fortunately the
UN’s data has thorough documentation that shows which countries reported
and which did not. A drawback of using the UN data is that Feenstra has made
adjustments to the UN data which are lost by reverting to the UN data. Many
of the adjustments relate to the disaggregated trade data with which we need
not be concerned. However, some relate to the aggregate data and adjustments
for entrepot trade. I have not attempted to make corrections for entrepot trade.
However, these affect only a small minority of trade flows, and so on balance
ignoring this seems to be a cost worth bearing. Figure 7 shows the correlation
between Feenstra’s and the UN’s data. The blue line indicates the correlation
conditional on both data sets recording a positive flow; the red line indicates
the correlation conditional on the data that is observed in the UN dataset. The
correlation is reassuringly high (above 0.99 for HMR’s sub-sample), and suggests
that Feenstra’s adjustments to the aggregate trade flows have been relatively
small.
2.1 Estimating the Propensity to Report Trade
We can distinguish three groups of observations, based on whether the countries
involved report their trade: (A) neither partner reports; (B) both partners re-
port; (C) only one partner reports. For group (A), trade is certainly unreported.
For group (B), if both countries fully reported their trade we should have two
numbers for each trade flow, but as alluded to above, 24% of these flows that are
positive only have one partner reporting. This variation in reporting standards
allows one to estimate the propensity of countries to report their trade.
For the subset of countries that file some report of their trade, we can observe
reports of a particular trade flow from both the importer and exporter. In
particular, conditional on a trade flow being reported by at least one partner,
we can observe whether the flow is reported by the other side. The propensity
11
[1] [2] [3] [4]Importer Exporter Importer Exporter
log(trade) 0.229** 0.235** 0.3** 0.323**(0.009) (0.006) (0.012) (0.009)
R-sq 0.255 0.27 0.377 0.419n 7301 8398 7301 8398
** p<0.01, * p<0.05
Data for 1986. Columns [3] and [4]: reporter fixed effects
Probit of reporting trade > 0, conditional on partner reporting trade > 0
Table 2: Influence of the level of trade on the propensity to report
of a partner to report, conditional on the flow being reported by their partner,
can be estimated by probit.
Table 2 shows that the size of the underlying trade flow has a strong influence
on a country’s propensity to report it. Columns [3] and [4] include reporter
fixed effects to control for idiosyncratic differences in reporting quality across
countries. These account for a significant amount of the variation, pushing up
the R-squared of the regressions.
Since by definition trade is unobserved if it is unreported, we cannot extend
this approach to estimate the unconditional probability of a flow being reported.
But the strong correlation between trade and reporting is prima facie evidence
that the attrition in the sample of trade flows that are missing due to a failure
to report is not random, leading to a classic sample selection problem.
3 Controlling for Unreported Trade Flows
HMR’s technique addresses the problem of sample selection due to the omission
of observations without trade with Heckman’s (1976, 1979) sample selection
correction. This two-step procedure controls for the possibility that sample
selection may bias estimates of the main equation of interest by first estimating
a selection equation, and then controlling for possible correlation between the
selection and main equations.
Taking into consideration non-reporting of trade as an additional level of
12
sample selection, we face a simultaneous sample selection problem, in which
an observation could be missing from the log-linearised gravity equation either
because it is zero, or because it is unreported.
3.1 Sample Selection in the Original HMR Model
The first step in controlling for sample selection is to specify a structural model
relating the selection and main equations. In the original HMR framework, this
was
mij = β0 + λj + χi − γdij + ln(eδ(z∗ij+ˆη∗ij) − 1) + eij (1)
Tij = 1[z∗ij > 0] (2)
z∗ij = γ∗0 + ξ∗j + ζ∗i − γ∗dij − κ∗φij + η∗Zij(3)
Equation (1) is HMR’s intensive-margin gravity equation for imports mij of
i from j. λj and χi are exporter and importer dummies, controlling for the
multilateral resistance terms analysed by Anderson and van Wincoop (2003).
dij is a vector of trade barriers that affect variable costs of trade. The error
terms eij and η∗Zijare jointly normally distributed eij
η∗Zij
∼ N(0,Σ),Σ =
σ2e σeηZ
σeηZ1
z∗ij is the fitted value of the latent variable of the probit specified in equations
(2) and (3). Tij is an indicator of whether there are any imports from j to i, as
determined by the latent variable z∗ij . This may be influenced by the same trade
barriers that affect the intensive-margin, dij , and possibly additional factors,
denoted φij .
A sample selection bias can arise if equation (1) is estimated by OLS, because
what we can estimate is E[mij |Tij = 1]7. If E[eij |Tij = 1] = E[eij ] = 0
then there is no sample selection problem, and OLS is unbiased. However, if
7All the conditional expectations are also conditional on the full set of regressors. This issuppressed for notational convenience only.
13
E[eij |Tij = 1] 6= 0 there is a problem. Fortunately, under the assumptions of
equations (1)-(3) we have a consistent estimator of E[eij |Tij = 1]
E[eij |Tij = 1] =σeηZ
σ2e
E[η∗Zij|Tij = 1] =
σeηZ
σ2e
ˆη∗ij
where ˆη∗ij = φ(z∗ij)
Φ(z∗ij
) is the inverse Mills ratio estimated from the first-stage probit,
and which gives a consistent estimate of E[η∗Zij|Tij = 1].
ln(eδ(z∗ij+ˆη∗ij)−1) is the term introduced by HMR to control for the potential
correlation between the trade barriers, dij , and the average productivity of firms
in j that have chosen to enter the export market to i. The productivity of these
firms will affect the volume of their sales; and this productivity will be correlated
with trade barriers, because higher trade barriers will induce less productive
firms to exit the market.
HMR show that the latent variable z∗ij of whether or not there is trade is
related to the productivity level of the marginal exporter, and so can be used
to estimate the unobserved heterogeneous productivity term in the aggregate
gravity equation. Under the assumption that the distribution of firm produc-
tivity is Pareto, this has the form max{(Z∗ij)δ−1, 0}. To include this term in an
estimation we need to simplify the ‘max’ term. Fortunately, only observations
for which trade is positive are observed, so in equation (1) all observations will
have (Z∗ij)δ − 1 > 0. We do not observe, Z∗ij , but can estimate E[z∗ij ] with z∗ij
8.
To simplify the ‘max’ we need E[z∗ij |Tij = 1], but this is E[z∗ij ] + E[η∗ij |Tij = 1],
which as discussed above in the context of sample selection can be estimated
with the inverse Mills ratio, giving ˆz∗ij = z∗ij + ˆη∗ij .
Thus HMR elegantly show how to address both the sample selection and
productivity heterogeneity biases in a simple two-stage procedure, using the
same first-stage probit. However, this elegance comes at the cost of the model
being potentially underidentified. The estimated latent variable z∗ij is a linear
combination of the regressors included in equation (3). If the regressors in
8z∗ij = log(Z∗ij)
14
equations (1) and (3) are the same, then z∗ij is perfectly collinear with the trade
barriers in equation (1). Since ˆη∗ij is also included as a regressor in the gravity
equation to control for the zero-trade sample selection, this collinearity extends
to ˆz∗ij = z∗ij + ˆη∗ij .
This collinearity is a particular problem for a non-parametric estimate of
the heterogeneity productivity bias, which would proceed by including a high-
degree polynomial of ˆz∗ij instead of the term derived from the assumption of
a Pareto distribution for firms’ productivity. Potentially the non-linearity of
ln(eδ(z∗ij+ˆη∗ij) − 1) means that the perfect collinearity between z∗ij + ˆη∗ij and the
other trade barriers does not prevent identification of the parametric model.
However, for ‘large’ δ, ln(eδ(z∗ij+ˆη∗ij) − 1) ≈ δ(z∗ij + ˆη∗ij) and the regressors are
collinear again.
For HMR’s sample the non-linearity of the heterogeneity-bias term is insuf-
ficient to identify the model, and this motivates their search for an additional
exclusion restriction. This is a variable φij , which enters into equation (3) but
not equation (1). Such a variable breaks the collinearity problem, by introduc-
ing an extra source of variation into z∗ij that is not collinear with the regressors
of equation (1). In economic terms, this would be a factor that affects the fixed,
but not the variable, costs of trade.
HMR proposed two potential exclusion restrictions: measures of the costs of
starting a firm, as compiled by Djankov et al (2002); and an index of religious
similarity. There are drawbacks to using either of these exclusion restrictions.
Regulatory costs seem like they would be correlated with fixed costs of entry
into business, and possibly by extension with the fixed costs of entering export
markets, although this is less clear. However, they may also be correlated with
factors affecting variable costs of trade9, violating the exclusion restriction. A
9For example, a country with higher regulatory barriers may also be more likely to be ahigher tax environment, which would be expected to reduce the profitability of exporting atthe intensive margin too. Countries with more regulation might also be more likely to usequantitative trade restrictions such as import or export licenses, or other non-tariff barriers,which would also affect the intensive margin, but are typically not controlled for.
15
second weakness of the regulatory data is that it is only available for a sub-set
of countries (116 out of HMR’s full sample of 158), and so cannot be used in a
broad panel setting10.
The conceptual case for the validity of religion as a factor affecting fixed
but not variable costs of trade is very unclear. In their original paper HMR
justify the exclusion on the grounds that their religion variable is not statis-
tically significant in a benchmark OLS gravity equation, suggesting that it is
broadly uncorrelated with trade. Unfortunately, there is a problem with their
original data11. Replacing their data with a similar index compiled from Bar-
rett et al (2001) indicates that religion is a highly significant variable in the
benchmark gravity equation, undermining the prima facie case for the validity
of the exclusion restriction.
The difficulty of finding valid or practical exclusion restrictions is a potential
pitfall of HMR’s methodology. One motivation for controlling for the sample
selection induced by the non-reporting of trade is that it weakens the collinearity
between the productivity-hetereogeneity and sample-selection correction terms.
This allows more general identification of the model.
3.2 Controlling for Non-Reporting of Trade
Heckman’s sample-selection correction can be extended to the case of multiple
selection decisions, by jointly estimating the underlying selection relationships.
HMR’s system of equations (1)-(3) is extended to include an equation specifying
10Reduction of the sample size is a potentially serious problem for HMR’s methodology,which relies on the presence of zero trade flows in the aggregate. If country j exports to allother countries in the sample, then its exporter-specific dummy perfectly predicts trade inthe first-stage. This is problematic because it implies a fitted value of infinity for z∗ij , whichmeans that all observations of exports from j must be dropped from the second-stage, as theheterogeneity correction term cannot be estimated. For 1986, the baseline year for HMR’sstudy, the reduction in sample size only led to the dropping of 9 importers or exporters (11when one uses the more complete UN data which includes some positive trade flows omittedfrom the Feenstra dataset). However, given the growth in trade over time, in later yearsthis reduction in the sample could be a critical problem for applying the technique, as morecountries will trade with all members of this sub-sample.
11The measure is an index of religious similarity of a country-pair, and as such should bethe same for the observation of country i’s exports to j as for the observation of i’s importsfrom j. Unfortunately this is not the case, which indicates that there has been a corruptionof their data.
16
the reporting decision
Rij = 1[r∗ij > 0] (4)
r∗ij = τ∗0 + ϕ∗j + ω∗i − ν∗dij + η∗Rij(5)
where the latent variable driving the reporting decision, r∗ij , depends on the
trade barriers dij , importer/exporter dummies, and a normally distributed error
η∗Rij. The errors of equations (1), (3) and (5) are assumed jointly Normally
distributed12e
η∗Zij
η∗Rij
∼ N(0,Σ),Σ =
σ2e σeηZ
σeηR
σeηZ1 σηZηR
σeηRσηZηR
1
The selection relationships follow Poirier’s (1980) model of a bivariate pro-
bit13. The bivariate probit with partial observability treats the dependent vari-
able as 1 if we observe a positive trade flow, and 0 otherwise. The dependent
variable is assumed to be 1 if both the underlying probits are 1, and 0 if either
is 0. This yields the log-likelihood function
ln(L) =∑n
yij ln(F (z∗ij , r∗ij , ρ)) + (1− yij) ln(1− F (z∗ij , r
∗ij , ρ))
where for notational convenience ρ denotes the covariance between η∗Zijand
η∗Rij, previously denoted as σηZηR
, and F (z∗ij , r∗ij , ρ) is the CDF of the bivariate
normal distribution with unit variances and covariance ρ. yij is the dependent
variable, which is 1 if a positive trade flow is observed.
The parameters of both underlying selection equations can be jointly esti-
mated from this log-likelihood function. Poirier (1980) discusses the identifia-
bility of the model. The reduced form parameters are locally identified except12The assumption of unit variances for η∗Zij
and η∗Rijis without loss of generality. The
coefficients of a probit can only be estimated up to scale, so the coefficients in equations (3)and (5) are normalised by the variance of their respective errors. The other key parameter inwhat follows is the covariance σηZηR , but this enters the following equations as the correlation,which is the covariance automatically scaled by the variances.
13The model is also treated very approachably in Maddala (1983), pp. 278-283. See Grilli(2005) for an application.
17
in pathological cases14. However, there can be a labelling problem, as if the
regressors are identical in both equations it is not possible to identify which co-
efficients correspond to which selection relationship due to the symmetric nature
of the problem. However, as long as there is at least one variable excludable
from one of the selection equations, this labelling problem is resolved.
Controlling for the simultaneous selection in the intensive-margin log-linearised
gravity equation is straightforward. When estimating equation (1) by OLS, we
estimate E[mij |Tij = 1, Rij = 1]. As in the single selection case, we need to
take into account the possibility that E[eij |Tij = 1, Rij = 1] 6= E[eij ] = 0.
E[eij |Tij = 1, Rij = 1] = βZH∗Zij
+ βRH∗Rij
where
βZ ≡ σeηZ
σ2e
βR ≡ σeηR
σ2e
H∗Zij
≡φ(z∗ij)Φ
(r∗ij−ρz
∗ij√
1−ρ2
)F (z∗ij , r
∗ij , ρ)
H∗Rij
≡φ(r∗ij)Φ
(z∗ij−ρr
∗ij√
1−ρ2
)F (z∗ij , r
∗ij , ρ)
These expressions are analogous to the inverse Mills ratio, extended to the
two-stage selection case. If there is no correlation between the two selection
stages (ρ = 0), then the expressions simplify down to the single-variable selection
correction. F (z∗ij , r∗ij , 0) = Φ(z∗ij)Φ(r∗ij), so
H∗Zij
=φ(z∗ij)Φ(z∗ij)
H∗Rij
=φ(r∗ij)Φ(r∗ij)
14Such as equality of the coefficients across the two equations.
18
and the dual sample selection is controlled for simply by including a standard
inverse Mills ratio for each stage. The objects z∗ij , r∗ij , and ρ are all estimable
quantities from the first stage, and so the sample selection corrections can be
made in a two-step procedure analogous to HMR’s original method.
Defining
ˆη∗ij ≡ E[η∗Zij|Tij = 1]
eij ≡ eij − βZH∗Zij− βRH∗
Rij
˜eij ≡ eij − βHMRˆη∗ij
we consistently estimate the intensive margin gravity equation (1) by
mij = β0 + λj + χi − γdij + βZH∗Zij
+ βRH∗Rij
+ ln(eδ(z∗ij+ˆη∗ij) − 1) + eij (6)
Comparing this to HMR’s original corrected equation
mij = β0 + λj + χi − γdij + βHMRˆη∗ij + ln(eδ(z∗ij+ˆη∗ij) − 1) + ˜eij (7)
the key difference is the change in the sample selection corrections, H∗Zij
and
H∗Rij
instead of ˆη∗ij . This difference weakens the collinearity between ˆz∗ij =
z∗ij + ˆη∗ij and the other regressors. In HMR’s original equation, the ‘coincidence’
that the control for the productivity term being positive and trade being positive
was the same meant that both were controlled for by the same inverse Mills ratio,
ˆη∗ij .
Using the modified inverse Mills ratios breaks this coincidence. The cor-
rection for the productivity term is the ‘original’ inverse Mills ratio15, which
controls for the fact that productivity is only in the regression when above its
cutoff. This inverse Mills ratio is conditional on Tij > 0, but not conditional on
Rij > 0 too, as the reporting decision is irrelevant to the underlying productiv-
ity cut-off, once the parameters of equation 3 have been consistently estimated.
15‘Original’ in the sense that it has the same functional form. Its value will be different,as the estimates of the parameters of equation (3) will have changed, as they will reflect theestimates from the joint estimation which controls for some observations being unreported.
19
However, the corrections for sample selection in the main gravity equation re-
quire both of the new modified inverse Mills ratios. Since ˆη∗ij is a non-linear
function of z∗ij , and is no longer itself included as a regressor in the main equa-
tion, ˆz∗ij is now a non-linear function of the other regressors, and no longer
collinear with them.
Controlling for both dimensions of sample selection not only consistently
estimates the parameters of the underlying structural models (assuming that
they are correctly specified), but also separates the correction for sample selec-
tion from the correction for heterogeneity, allowing both effects to be identified
without distinguishing fixed versus variable trade costs.
4 Empirical Results
Table 3 reports ‘traditional’ OLS gravity equation estimates on two samples.
Column [1] reports for the full sample of 175 countries. Out of a possible 30450
trade flows, 14503 were observed positive.
Column [2] repeats this for the sub-set of 116 countries for which there is
data on regulatory costs of entry. Out of a possible 13340 flows, 8583 were
observed positive. Some countries in the Reg sample export or import to all
other partners in the sample16. This makes their country exporter or importer
dummy a perfect predictor of the outcome in the first-stage probit, which implies
an infinite coefficient on the dummy and for the latent variable in the probit.
The second stage estimation cannot proceed with an infinite value for ˆz∗ij , so
these observations must be dropped, reducing the number of useable positive
observations in the second stage to 732717. To maintain consistency between the
second stage sample and the ‘benchmark’ gravity equation these observations
are also dropped here.
Column [3] repeats the regression of Column [2], but includes the measures
16The exporters are Japan, Hong Kong, Denmark, France, Germany, Italy, the Netherlands,Sweden, the UK, and Norway. The importer is Japan.
17The same issue arises in HMR’s original paper. See the discussion on pp.461-462.
20
[1] [2] [3]
All Reg Excl Reg Incl
log(Distance) -1.305*** -1.278*** -1.295***(0.0269) (0.0413) (0.0415)
Border 0.0605 0.216 0.212(0.123) (0.154) (0.154)
Island 0.719*** 0.750*** 0.732***(0.0858) (0.176) (0.176)
Landlock 0.265 0.0838 0.0889(0.186) (0.198) (0.197)
Colonial 0.925*** 0.558*** 0.553***(0.110) (0.158) (0.157)
Language 0.371*** 0.356*** 0.351***(0.0570) (0.0809) (0.0808)
Legal 0.321*** 0.372*** 0.384***(0.0438) (0.0603) (0.0603)
Religion 0.443*** 0.682*** 0.690***(0.0897) (0.127) (0.127)
CU 1.884*** 1.466*** 1.531***(0.222) (0.370) (0.370)
FTA 0.446*** -0.273 -0.214(0.116) (0.181) (0.181)
Reg: cost -0.331***(0.0973)
Reg: days -0.234**(0.111)
Observations 14503 7327 7327R2 0.706 0.683 0.684
*** p<0.01, ** p<0.05, * p<0.1
Standard errors in parentheses
Column [1]: all 175 countries
Columns [2] and [3]: 116 countries with regulation data
Importer and Exporter dummies
Table 3: Benchmark ‘Traditional’ OLS Gravity Equations
21
of regulatory costs of starting a business, which HMR use as their exclusion
restriction to identify the intensive margin. The regulation variables are highly
significant in the traditional OLS regression. Although this does not necessar-
ily invalidate the exclusion restriction (their statistical significance could reflect
omitted variable bias, through their correlation with the omitted heterogeneity-
productivity term), it is prima facie evidence that they are strongly correlated
with the volume of trade, which is suggestive that they might affect both inten-
sive and extensive margins.
Table 4 reports estimates for the latent variables for the first-stage probits
on the regulation sub-sample. Column [1] gives the estimates for a univariate
probit on the observed positive trade flows, following HMR’s methodology.
Columns [2] and [3] of Table 4 report the joint maximum likelihood estimates
of the positive trade and reporting probits using the partially observed bivari-
ate probit model. Comparing the coefficients of the underlying positive-trade
and reporting probits to the univariate probit, those of the univariate probit
generally lie inbetween those of the two bivariate probits, suggesting that the
outcome of the univariate probit reflects a mixture of the two selection processes.
The trade barriers are highly correlated with the reporting decision, which sug-
gests that the selection induced by non-reporting should not be ignored. The
correlation between the errors in equations 3 and 5 is estimated to be 1.
One concern with partial observability and the Poirier model is that there is
a loss of efficiency relative to the full information estimation18. Unfortunately
we cannot compare the partial information to the full information estimates, but
for most variables the standard errors are very similar to those for the univariate
probit, suggesting that augmenting the first stage to control for non-reporting
does not lead to a large efficiency loss.
An exception to this is for the Island, Colonial and Currency Union variables
in the Positive Trade probit, for which standard errors cannot be computed. In
18See Meng and Schmidt (1985).
22
[1] [2] [3]
Univariate Bivariate ProbitProbit Positive Trade Reporting
log(Distance) -0.582*** -1.296*** -0.430***(0.0356) (0.0723) (0.0540)
Border -0.378*** 0.168 0.0893(0.133) (0.376) (0.226)
Island 0.314** 40.38 -0.0725(0.150) (∞) (0.178)
Landlock 0.105 -0.0577 0.789***(0.132) (0.180) (0.233)
Language 0.416*** 1.208*** -0.601***(0.0632) (0.103) (0.103)
Colonial -0.0856 50.12 5.582(0.292) (∞) (530.3)
Legal 0.149*** 0.0757 0.583***(0.0440) (0.0678) (0.0745)
Religion 0.390*** 0.202 0.578***(0.102) (0.147) (0.163)
CU 0.844*** 12.70 0.726(0.230) (∞) (0.445)
FTA 1.819*** -0.668 8.447(0.533) (0.841) (75246)
Reg: Cost -0.403*** -0.127 -0.337***(0.0857) (0.161) (0.130)
Reg: Days -0.0939 0.106 -0.546***(0.0762) (0.109) (0.132)
Neither Reports -∞
ρ 1(0)
Observations 13340 13340 13340Standard errors in parentheses
*** p<0.01, ** p<0.05, * p<0.1
Importer and Exporter dummies
Table 4: Zero-trade and Reporting Probits: Regulation Sample
23
a univariate probit, a dummy variable that perfectly predicts the dependent
variable is estimated to have an infinite coefficient, and the probit is estimated
after dropping those observations. In the bivariate case, it is possible for a
variable to be a perfect predictor of one of the underlying probits, but not the
other. In this case, the coefficient cannot be precisely estimated for the probit
for which it is a perfect predictor, but the variable will not perfectly predict
the imperfectly observed dependent variable of the bivariate probit, because
the dependent variable may not be observed due to the second equation19. It
would be inappropriate to drop these observations from the bivariate probit,
as firstly we do not know which variables will be perfect predictors in one of
the underlying equations, and the variables should be included in the second
equation.
Table 5 reports the estimates for the parametric specification of the gravity
equation given in equation (1), based on a Pareto distribution of productivity.
Column [1] is based on the estimate of ˆz∗ij derived from column [1] of Table 4,
following the standard HMR procedure, and excluding the regulation variables
from the second stage in order to identify the model. Comparison of column
[1] of Table 5 and column [2] of Table 3 broadly replicates HMR’s finding that
the absolute magnitude of most trade barriers is smaller in the bias-corrected
estimates than the traditional OLS gravity equation20. This motivates their
conclusion that heterogeneity-bias inflates standard OLS estimates.
The standard errors for all the second stage regressions given are somewhat
impressionistic, as they have not been corrected to take into account the gen-
erated regressors from the first-stage. The coefficient given on ˆz∗ij is actually
for ∆ = log(δ)21. This implies an estimate of δ of 0.2526. This is somewhat
19For example, all colonial powers might trade with their former colonies, making the colo-nial dummy a perfect predictor in the positive trade probit. However, if they do not also allreport their trade, some colonial country-pairs will not have positive trade flows recorded, andthe colonial dummy will not perfectly predict the overall dependent variable.
20This is true for distance, island, landlock, legal, language, currency union, and religion,but not for border, colonial or FTA.
21δ must be positive, and to impose this constraint it is convenient to replace it with the
24
[1] [2] [3]
HMR TB - Reg excl TB - Reg incl
ln(Distance) -0.994*** -0.712*** -0.700***(0.1029) (0.0692) (0.0713)
Border 0.430*** 0.174 0.162(0.1626) (0.1440) (0.1442)
Island 0.540*** -13.618*** -14.363***(0.1707) (2.0205) (2.1286)
Landlock 0.030 0.099 0.106(0.1994) (0.1956) (0.1952)
Legal 0.301*** 0.377*** 0.385***(0.0646) (0.0609) (0.0610)
Language 0.144 -0.203** -0.224**(0.1065) (0.0954) (0.0974)
Colonial 0.610*** -16.867 -17.796(0.1193) (-16.8675) (-17.7961)
Currency union 1.051** -3.608*** -3.780***(0.4842) (0.7280) (0.7516)
FTA -0.747* 0.357** 0.381***(0.3934) (0.1461) (0.1467)
Religion 0.489*** 0.475*** 0.480***(0.1445) (0.1295) (0.1296)
Reg Cost -0.101(0.0974)
Reg Days -0.284**(0.1129)
η∗ -0.083(0.1920)
H∗Zij
-3.106*** -0.063(0.5354) (0.0978)
H∗Rij
0.268 -0.005(0.4254) (0.0041)
z∗ -1.376 -1.060*** -1.008***(1.0404) (0.1470) (0.1467)
Observations 7327 7327 7327*** p<0.01, ** p<0.05, * p<0.1
Standard errors in parentheses
Importer and Exporter dummies
Table 5: Intensive Trade Margin: Regulation Sample
25
lower than HMR’s original estimate of 0.84. I conjecture that one reason for
this difference is that I do not follow their practice of censoring ˆz∗ij above 5.199,
which has the effect of increasing their estimate of δ22.
Column [2] of Table 5 reestimates equation (7), maintaining the regulation
variables as excluded from the second stage, but using the estimate of ˆz∗ij de-
rived from column [2] of Table 4 and the dual sample selection correction terms
H∗Zij
and H∗Rij
. The variables Island, Colonial and Currency Union whose co-
efficients in the bivariate positive trade probit were estimated very imprecisely
suffer a large loss of efficiency in the modified procedure, presumably reflect-
ing the imprecision of the first stage. The opposite seems to be true for the
other variables, whose standard errors diminish somewhat. The results support
HMR’s finding of a significant productivity-heterogeneity bias, as the coeffi-
cients in Table 5 generally have a smaller absolute magnitude than the OLS
benchmarks.
There is an interesting difference in the coefficient on membership of an FTA,
which is negative in HMR’s specification, but quite economically and statisti-
cally significant using the modified procedure. A positive coefficient seems more
economically intuitive. Column [3] of Table 4 shows that countries sharing an
FTA are much more likely to report their trade, which is also quite intuitive,
since most FTAs have strict rules of origin clauses which necessitate careful doc-
umentation of intra-FTA trade. Distinguishing this effect of higher reporting
quality from the influence on the extensive trade margin seems to also make a
significant difference to the intensive margin estimates.
Column [3] repeats the estimation of column [2] but includes measures of
regulation in the second stage. As discussed above, the second-stage is still
identified even without an exclusion restriction, and there is no loss of precision
unconstrained parameter ∆ = log(δ) and estimate ln(ee∆(z∗ij+ˆη∗ij) − 1). To recover δ, the
coefficient in ln(eδ(z∗ij+ˆη∗ij)−1), ∆ should be exponentiated. The delta method could be used
to derive a standard error for δ from that of ∆.22This would affect 396 observations in this sample.
26
in the standard errors from relaxing the exclusion restriction. The point esti-
mates are also very similar to those of column [2], which is encouraging. Column
[3] gives mixed support for the validity of HMR’s exclusion restriction, as one
of the regulation variables is found to be statistically significant in the intensive
margin, although the other is not. This suggests that Reg Cost can be validly
excluded, but Reg Days should not be used for identification.
Table 6 repeats estimation of the first-stage probits on the full sample. The
results are broadly similar to those on the Regulation sample, but the greater
variation in the dataset means that none of the trade barriers appear to be per-
fect predictors in either of the underlying probits, so that all are estimated with
relatively tight standard errors. There appears to be very little efficiency loss be-
tween the univariate and bivariate specifications, and the univariate coefficients
mostly lie between those of the two bivariate equations.
One noticeable difference between the two samples is that on the full sample
ρ is estimated to be negative, whereas on the regulation sample it was estimated
to be 1. It is hard to have a strong prior as to what the correct sign for the
correlation based on unobserved variables should be. The estimate of 1 lies on
the boundary of the coefficient space, and an interior solution may be more
appealing. Although it is possible that the value could change a lot with the
underlying sample, the difference in these results suggests that the correlation
coefficient may not be very precisely estimated by this procedure.
Table 7 reports estimates from a non-parametric approximation of the het-
erogeneity bias correction term, using a seventh-order polynomial of ˆz∗ij . Columns
[1]-[3] replicate the non-linear estimates of columns [1]-[3] of Table 5. Column [4]
reports the polynomial approximation using the full sample and the first-stage
estimates in columns [2] and [3] of Table 6.
Columns [1] and [2] of Table 7 use the regulation data as an exclusion re-
striction to identify the second stage, while columns [3] and [4] are estimated
without an additional exclusion restriction on fixed versus variable trade costs.
27
[1] [2] [3]
Univariate Bivariate ProbitProbit Positive Trade Reporting
log(Distance) -0.709*** -1.027*** 0.0753(0.0198) (0.0301) (0.0581)
Border -0.505*** 0.754*** 0.139(0.0986) (0.293) (0.211)
Island 0.274*** 0.209*** 0.416**(0.0544) (0.0701) (0.188)
Landlock 0.180 0.0900 0.582(0.111) (0.159) (0.383)
Language 0.344*** 0.747*** -0.876***(0.0371) (0.0520) (0.131)
Colonial -0.366** 0.0760 0.186(0.157) (0.343) (0.266)
Legal 0.107*** 0.0417 0.631***(0.0274) (0.0366) (0.114)
Religion 0.202*** 0.116 0.293(0.0580) (0.0749) (0.183)
CU 0.524*** 0.422* 3.380***(0.153) (0.231) (0.657)
FTA 1.458*** 1.307*** 1.120***(0.168) (0.257) (0.408)
Neither Reports -∞
ρ -0.548(0.103)
Observations 30450 30450 30450*** p<0.01, ** p<0.05, * p<0.1
Standard errors in parentheses
Importer and Exporter dummies
Table 6: Zero-trade and Reporting Probits, Full Sample
28
[1] [2] [3] [4]HMR TB - Reg excl TB - Reg incl TB - Full
log(Distance) -0.919*** 0.0230 0.119 -3.252***(0.137) (0.197) (0.202) (0.486)
Border 0.572*** 0.0408 0.0107 1.578***(0.175) (0.156) (0.156) (0.378)
Island 0.279 -35.94*** -39.21*** 1.055***(0.190) (6.064) (6.193) (0.128)
Landlock 0.0157 0.186 0.200 0.543***(0.196) (0.196) (0.196) (0.187)
Colonial 0.591*** -44.65*** -48.67*** 1.073***(0.156) (7.530) (7.688) (0.114)
Language 0.120 -0.857*** -0.954*** 1.771***(0.126) (0.194) (0.198) (0.361)
Legal 0.262*** 0.330*** 0.335*** 0.407***(0.0677) (0.0611) (0.0611) (0.0467)
Religion 0.420*** 0.416*** 0.406*** 0.644***(0.153) (0.129) (0.129) (0.104)
CU 0.989** -10.75*** -11.73*** 2.343***(0.405) (1.928) (1.969) (0.290)
FTA 0.689 0.753*** 0.821*** 3.884***(0.505) (0.206) (0.207) (0.630)
Reg: cost -0.0362(0.0995)
Reg: days -0.377***(0.112)
ˆη∗ij 0.130
(0.668)
H∗Rij
0.000186 0.000827 -0.605***
(0.00743) (0.00743) (0.169)
H∗Zij
-0.450*** -0.483*** 3.741***
(0.0963) (0.0977) (0.426)
ˆz∗ij -2.683 1.519*** 1.610*** 1.918**
(4.398) (0.186) (0.191) (0.877)
ˆz∗2ij 3.385 -0.0572*** -0.0583*** -1.110***
(3.204) (0.00718) (0.00730) (0.280)
ˆz∗3ij -1.345 0.00225*** 0.00230*** 0.150**
(1.238) (0.000348) (0.000352) (0.0630)
ˆz∗4ij 0.269 -4.45e-05*** -4.54e-05*** -0.0105
(0.270) (8.07e-06) (8.14e-06) (0.00751)
ˆz∗5ij -0.0293 4.64e-07*** 4.74e-07*** 0.000357
(0.0331) (9.59e-08) (9.66e-08) (0.000484)
ˆz∗6ij 0.00164 -2.44e-09*** -2.49e-09*** -4.22e-06
(0.00212) (5.63e-10) (5.66e-10) (1.59e-05)
ˆz∗7ij -3.69e-05 0*** 0*** -1.60e-08
(5.49e-05) (0) (0) (2.07e-07)
Observations 7327 7327 7327 14503R2 0.695 0.690 0.691 0.719
Standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.1Importer and Exporter dummies
Table 7: Polynomial Approximations of Intensive Margin29
There is very little loss of efficiency from including the regulation data in col-
umn [3] relative to column [2], or from estimating the polynomial equation on
the full sample without an additional exclusion restriction.
For the regulation sample, the variables Island, Colonial and Currency Union
continue to be poorly estimated, as in the non-linear estimation. The higher
order polynomial terms in this sample are very statistically significant, which
suggests that the approximation may be somewhat inaccurate and even more
high-order terms should be included. This may also explain why the coefficients
of some of the trade barriers here (notably distance) are somewhat different from
those in the non-linear specification. The implications for the validity of HMR’s
exclusion restriction are similar to those from the non-linear specification, with
Reg Cost appearing excludable but not Reg Days.
Column [4] presents results for the full sample. The polynomial approxima-
tion appears to be less dependent on higher order terms here, and the coefficients
on all of the trade barriers are relatively tightly estimated. The sample correc-
tion and heterogeneity-bias terms are much more statistically significant using
the modified procedures than under HMR’s original, which suggests that the
modifications are helping to identify these effects more accurately.
5 Conclusions
Distinguishing effects of trade barriers on the intensive and extensive margins of
trade is a growing area of research, and HMR have provided an elegant frame-
work with which to disentangle these effects. However, when bringing their
approach to the data, it is important to recognise the limitations of existing
databases. In particular, insufficient attention has been given so far to distin-
guishing trade flows that are actually zero from those that are unreported. This
issue is likely to be even more urgent for scholars using disaggregated trade data,
for which the likelihood of underreporting is presumably somewhat higher.
30
While this distinction is unimportant for traditional gravity equation estima-
tion, it is potentially very important when using the presence of positive trade
in the aggregate to identify fixed costs that deter entry into export markets, as
suggested by HMR. Classifying unreported trade as zero will tend to misclassify
some positive trade flows as zero, reducing the accuracy of first-stage probits.
The use of generated regressors derived from these estimates may then transfer
this noise into second-stage estimates.
Since the decision to report a flow seems likely to be correlated with its
size, the omission of some observations due to non-reporting is a classic sample
selection problem. Heckman’s correction procedure is extended to the case of
multiple selection criteria, using Poirier’s partially observed bivariate probit
model. A natural exclusion restriction exploiting information on countries that
did not participate in the Comtrade database at all allows us to distinguish the
effects of non-reporting from non-trading.
In addition to correcting for possible bias due to sample selection, this dual
selection model provides a means of identifying HMR’s intensive margin of trade,
without needing an exclusion restriction that distinguishes fixed from variable
trade costs. This is appealing, as the two exclusion restrictions they propose
in their original paper both have drawbacks. This may facilitate application of
HMR’s technique in a broader set of applications, where a suitable alternative
exclusion restriction is unavailable.
The results from the modified procedure are supportive of HMR’s findings
that unobserved and hetereogeneous firm-level productivity is a determinant of
the volume of trade flows, and that failure to control for this may bias estimates
of the intensive margin of trade. Distinguishing a failure to report from a failure
to trade helps to estimate both the intensive and extensive margins of trade with
higher accuracy.
31
A Countries in the Sample
Albania Ecuador Madagascar S.African CUAlgeria Egypt Malawi S.KoreaAngola El Salvador Malaysia Saudi Arabia
Argentina Ethiopia Maldives SenegalAustralia Fiji Mali Sierra LeoneAustria Finland Mauritania Singapore
Bangladesh France Mexico Solomon IslandsBelgium Ghana Mongolia SpainBenin Greece Morocco Sri Lanka
Bhutan Guatemala Mozambique SwedenBolivia Guinea Nepal SwitzerlandBrazil Haiti Netherlands Syria
Bulgaria Honduras New Zealand TanzaniaBurkina Faso Hong Kong Nicaragua Thailand
Burundi Hungary Niger TogoCambodia India Nigeria TunisiaCameroon Indonesia Norway Turkey
Canada Iran Oman UAECAR Ireland Pakistan UgandaChad Israel Panama UKChile Italy Papua New Guinea UruguayChina Jamaica Paraguay USA
Colombia Japan Peru USSRCosta Rica Jordan Philippines Venezuela
Cote d’Ivoire Kenya Poland VietnamCzechoslovakia Kiribati Portugal West Germany
Denmark Kuwait Rep Congo YugoslaviaDominican Rep Laos Romania Zambia
DR Congo Lebanon Rwanda Zimbabwe
Table 8: 116 Countries with Regulation Data
32
Albania Fiji Morocco UgandaAlgeria Finland Mozambique United KingdomAngola France Nepal United States
Antigua & Barbuda Gabon Netherlands UruguayArgentina Gambia New Zealand VenezuelaAustralia Germany, West Nicaragua VietnamAustria Ghana Niger Zambia
Bahamas Greece Nigeria ZimbabweBahrain Grenada Norway Maldives
Bangladesh Guatemala Oman SomaliaBarbados Guinea Pakistan New CaledoniaBelgium Guinea-Bissau Panama French PolynesiaBelize Guyana Papua New Guinea MacaoBenin Haiti Peru Marshall Islands
Bermuda Honduras Philippines MicronesiaBhutan Hong Kong Poland VanuatuBolivia Hungary Portugal ReunionBrazil Iceland Qatar St.Pierre & MiquelonBrunei India Rwanda Guadeloupe
Bulgaria Indonesia Samoa MartiniqueBurkina Faso Iran Saudi Arabia Neth.Antilles
Burundi Iraq Senegal French GuianaCameroon Ireland Seychelles East Germany
Canada Israel Sierra Leone Faeroe IslandsCape Verde Italy Singapore Cayman Islands
CAR Jamaica Solomon Islands CubaChad Japan S. Africa CU N.KoreaChile Jordan Spain MyanmarChina Kenya Sri Lanka Turks & Caicos
Colombia Kiribati St.Kitts & Nevis Western SaharaComoros S.Korea St.Lucia North Yemen
DR Congo Kuwait St.Vincent AfghanistanRep Congo Lao Sudan CambodiaCosta Rica Liberia Suriname Czechoslovakia
Cote d’Ivoire Libya Sweden DjiboutiCyprus Madagascar Switzerland Greenland
Denmark Malawi Syria LebanonDominica Malaysia Thailand Paraguay
Dominican Rep Mali Togo RomaniaEcuador Malta Tonga USSREgypt Mauritania Trinidad & Tobago Tanzania
El Salvador Mauritius Tunisia YugoslaviaEquatorial Guinea Mexico Turkey South Yemen
Ethiopia Mongolia UAE
Table 9: 175 Countries in Full Sample
33
Algeria Egypt Libya Saudi ArabiaArgentina El Salvador Macao SenegalAustralia Ethiopia Madagascar SeychellesAustria Faeroe Is Malawi Singapore
Bahamas Fiji Malaysia Solomon IsBahrain Finland Malta South Korea
Bangladesh France Martinique SpainBarbados French Guiana Mauritius Sri LankaBelgium Greece Mexico St. Pierre & MiquelonBelize Greenland Morocco St.Kitts & NevisBolivia Grenada Nepal St.LuciaBrazil Guadeloupe Netherlands SwedenBrunei Guatemala Netherlands Antilles Switzerland
Cameroon Honduras New Zealand SyriaCanada Hong Kong Nicaragua ThailandChile Hungary Nigeria TogoChina Iceland Norway Tonga
Colombia India Oman Trinidad & TobagoCosta Rica Indonesia Pakistan Tunisia
Cyprus Ireland Panama TurkeyCzechoslovakia Israel Papua New Guinea UAE
Denmark Italy Paraguay UKDjibouti Jamaica Peru UruguayDominica Japan Philippines USA
Dominican Rep Jordan Poland VenezuelaDR Congo Kenya Portugal West Germany
East Germany Kiribati Rep Congo YugoslaviaEcuador Kuwait Reunion Zimbabwe
Table 10: 112 Countries That Report Some Trade in 1986
34
B The Devil is in the Detail
While the most important changes to the data originally used by HMR are to the
trade data, switching from the Feenstra to the UN data, for the reasons outlined
above, there are some ‘quibbles’ with the covariates of the gravity equation in
the original dataset. Since HMR’s data is derived from the widely-used Glick
and Rose dataset, it seems worthwhile to describe the extensive changes made
to the data used here.
B.1 Distance
There are many ways one could try to measure the distance between countries23.
HMR describe their distance variable as being the log distance in km between
country capitals24. Unfortunately it is clear that the data they use does not
correspond to this. Their minimum value, associated with the log(distance)
between the capitals of Qatar and Bahrain, Doha and Manama, is -0.1505518,
which implies a distance of 0.86 km. The true great circle distance between these
cities is 142 km25. The maximum value in the HMR data is 5.660652, between
the capitals of New Zealand and Canada, Wellington and Ottawa, which implies
a distance of 287 km. The true great circle distance between these cities is 14452
km.
The shortest distance in the sample is between the capitals of the Republic
of Congo and the Democratic Republic of Congo, Brazzaville and Kinshasa,
which are divided by the Congo river, and are 8 km apart 26. The next closest
pair of capitals are Damascus, Syria, and Beirut, Lebanon 27. The most distant
23The measure depends both upon the choice of point from which to measure a country’slocation, such as capital city or largest city, and upon the choice of measure, such as greatcircle distance, or distance in degrees. On an ellipsoid the great circle distance gives the mostaccurate measure of physical separation.
24HMR, [13], p.47925I use Vicenty’s algorithm [18] to calculate the great-circle distances. A useful website
for calculating distances between world cities is http://www.infoplease.com/atlas/calculate-distance.html
262.14 in logs2784 km, 4.43 in logs. HMR measure 12 capital-pairs as being closer to each other than
Damascus and Beirut.
35
capital-pair is Taipei, Taiwan, and Asuncion, Paraguay28. The next furthest
pair is Madrid, Spain, and Wellington, New Zealand29.
There is no obvious transformation of HMR’s variable that can align it with
the true great-circle distances, as both the scales and ordering are distorted. The
positive correlation between their measure and the great-circle distances shows
that they are related somehow, but nevertheless their measure is surprisingly
inaccurate as a measure of distance between national capitals.
B.2 Common Border
Sharing a common border is a well-defined concept: sharing a land border.
Table 11 lists the instances in which HMR’s dataset either codes two countries
as sharing a border when they do not, or vice versa.
B.3 Island
There is a little room for interpretation in what constitutes an island, and
Table 12 sets out how I have interpreted this differently from HMR. The Turks
& Caicos are quite straightforwardly islands. Indonesia is a vast archipelago
of over 17500 islands. Papua New Guinea, Ireland, Haiti and the Dominican
Republic all share small landmasses with another neighbour. In my view this
still qualifies them to be islands, but one could certainly make the case that an
island should not share any land boundaries. In this case there would have to
be some reclassifications in the other direction to ensure consistency, such as the
UK, which shares a land boundary with the Republic of Ireland. Hong Kong
is divided between its historic island and the New Territories on the mainland
which border China. Given the central role that the island plays in Hong Kong,
I classified it as an island. Some might argue that as a continent Australia
should not be considered an island; however, in my view from an economic
point of view it seems to qualify (in that it doesn’t share a land border; and its2819894 km, 9.9 in logs2919829 km, 9.89 in logs. HMR measure 449 capital-pairs as being further apart than
Madrid and Wellington
36
HMR codes border; TB does not HMR does not code border; TB doesEgypt Jordan Tanzania RwandaEgypt Saudi Arabia Tanzania Uganda
Bahrain Qatar Tanzania DRCBahrain Saudi Arabia Tanzania KenyaLebanon Turkey Tanzania Malawi
UAE Qatar Tanzania MozambiqueTrinidad Venezuela Tanzania Burundi
El Salvador Nicaragua El Salvador HondurasMalaysia Singapore Malaysia BruneiSweden Denmark Malaysia Indonesia
Czechoslovakia Norway Czechoslovakia AustriaHungary AustriaHungary Yugoslavia
USSR NorwayUSSR IranAngola CongoRwanda BurundiTanzania ZambiaDjibouti SomaliaColombia Peru
Belize GuatemalaCambodia LaosCambodia ThailandCambodia Vietnam
China Hong KongChina India
North Korea South Korea
Table 11: Differences in coding of the common border dummy
37
HMR codes island; TB does not HMR does not code island; TB doesTurks & Caicos
Papua New GuineaIreland
IndonesiaHong Kong
HaitiDominican Republic
Australia
Table 12: Differences in coding of the island dummy
geographic size is not reflected in a proportionally large population, so that one
would not expect it to be unusually autarchic).
In their paper HMR describe their island variable as being one if both coun-
tries are an island, and zero otherwise30. However, the variable that they ac-
tually use in their empirical work is one if at least one country is an island,
and zero if neither are. While this may be of limited importance in interpreting
the coefficient, it does mean that a change in classification of a country from
not-island to island affects quite a lot of observations (the number of countries
less the number already coded as islands).
B.4 Landlock
Table 13 reports differences in the landlock dummy. Although modern Ethiopia
is a land-locked country, this is only since the secession of the province of Er-
itrea in 1993. Since all the data precedes this period and applies to the united
Ethiopia, it should be classified as not landlocked.
Similarly to the island variable, in their paper HMR report that landlock
is 1 if both countries are landlocked and zero otherwise. However in their em-
pirical work it is 1 if either country is landlocked, and zero if neither are. For
consistency I follow this formulation also.
30HMR [13], p.480
38
HMR codes landlocked; TB does not HMR does not code landlocked; TB doesSyria Rwanda
Ethiopia
Table 13: Differences in coding of the landlock dummy
B.5 Common Legal System
HMR’s data is based on the dataset of La Porta et al [14], which in turn is
derived from the work of Flores and Reynolds [9], and there do not appear to
be any inconsistencies in this variable.
B.6 Common Language
There is a lot of scope for interpretation as to whether two countries share a
common language, especially in the absence of reliable international data on
what percentage of the population speak particular languages.
HMR is misleading on the construction of this variable, which they suggest
is one if both countries share the same primary language31, as indicated by the
CIA World Factbook. However, in constructing the variable they designated a
common language in many cases when only a small minority of people in either
country could possibly share a common language. In part this is because the
CIA World Factbook is quite inconsistent in its description of languages across
countries32, and does not generally distinguish between primary and secondary
languages. On this basis, English would form a global lingua franca and the
dummy would be one for all countries and would not correspond to a measure
of the ability of citizens of different countries to communicate.
The dummy is reconstructed based on countries’ official languages only. This
leads to so many changes that it is not possible to present them concisely in a
table. 1694 country-pairs are reclassified as not sharing a language33. There are
31HMR [13] p.47832Compare Libya: “Arabic, Italian and English, all are widely understood in the major
cities”; Argentina: “Spanish (official), Italian, English, German, French”; USA: “English82.1%, Spanish 10.7%, other Indo-European 3.8%, Asian and Pacific island 2.7%, other 0.7%(2000 census)”; Greece: “Greek 99% (official), other 1% (includes English and French)”
33Representative examples are Greece-Chad, Laos-Sierra Leone, and Argentina-Syria
39
also 584 country-pairs reclassified as sharing an official language34.
B.7 Colonial Heritage
Defining a colonial relationship is not straightforward, and coding a dummy
should also reflect how long the relationship lasted, and how long ago it ended.
In previous work35 I built up a dataset of annual colonial relationships since
1500, where I interpreted a colonial relationship as administration of one (or a
major part of one) country by another, often accompanied by settlement or mil-
itary occupation. Discounting and summing up colony-years gives a measure of
the extent of the colonial legacy today. I applied a cut-off to the data to exclude
colonial relationships that were very brief/a long time ago. Italy-Ethiopia and
Iraq-UK were not classed as colonial because they both fell below the cut-off of
10 discounted colony-years36 Table 15 gives the other country-pairs classed as
colonies by HMR that on balance I didn’t count as colonies.
Table 14 lists the colonial relationships that I felt HMR had omitted. Even
though the Ottoman empire ended in 1918, the length of its rule over its do-
minions has left a strong colonial heritage into the modern era. The USSR’s
relationships with its satellites in Eastern Europe has many of the hallmarks of
traditional colonialism and is included. Some of the smaller European colonial
powers were also left out by HMR, such as Belgium’s African colonies.
B.8 Currency Unions
Countries can share a currency because they jointly choose to adopt the same
currency, because one country unilaterally adopts the currency of another, like
Liberia and the US dollar, or because two countries have both adopted the
currency of a third, like Liberia and Bermuda, which both use the US dollar.
34For example Congo-Central African Republic (French), UK-India (English), Austria-Germany (German)
35Baranga, “Identifying Relationships Between Income and Faith” [3]36Ethiopia was invaded by Italy in 1936 and occupied by them until January 1941; Iraq
was a British mandate between 1918 and 1932. After discounting this came in just under thethreshold.
40
HMR does not code colonial; TB doesEquatorial Guinea Portugal Algeria Turkey
Ghana Portugal Libya TurkeyKenya Portugal Tunisia Turkey
Tanzania Portugal Egypt TurkeyOman Portugal Sudan Turkey
Sri Lanka Portugal Israel TurkeyIndonesia Portugal Cyprus TurkeyMalaysia Portugal Iraq TurkeyGhana Denmark Jordan TurkeyNorway Denmark Kuwait Turkey
South Africa Netherlands Lebanon TurkeyGhana Netherlands Qatar Turkey
Mauritius Netherlands Saudi Arabia TurkeyGuyana Netherlands Syria Turkey
Sri Lanka Netherlands Yemen TurkeyMalaysia Netherlands Hungary TurkeyMaldives Netherlands Yugoslavia TurkeyLiberia USA Western Sahara Spain
Dominican Rep USA USA SpainHaiti USA Uruguay Spain
Philippines USA Nicaragua SpainVietnam USA Haiti Spain
USA France Jamaica SpainMauritius France Philippines SpainSeychelles France Belgium SpainCanada France Netherlands Antilles SpainSt.Kitts France Trinidad Spain
Germany France Italy SpainSouth Korea Japan Cameroon GermanyNorth Korea Japan Rwanda Germany
Taiwan Japan Togo GermanyBahrain Iran Tanzania Germany
Hong Kong China PNG GermanyBelgium Austria Cameroon UK
Italy Austria Tanzania UKCzechoslovakia Austria UAE UK
Hungary Austria Bangladesh UKGermany USSR Burundi BelgiumFinland USSR DRC BelgiumBulgaria USSR Rwanda Belgium
Czechoslovakia USSR Finland SwedenHungary USSR Norway SwedenPoland USSR Romania USSR
Table 14: Differences in coding of the colonial dummy
41
HMR codes colonial; TB does notEthiopia Italy
Iraq UKGermany UKNicaragua Columbia
Bangladesh PakistanBhutan India
Table 15: Differences in coding of the colonial dummy
These should be equivalent for the effects of sharing a currency on bilateral
trade flows, as they have the same effect on bilateral exchange rate volatility.
Including only center-periphery members as sharing a currency and excluding
the periphery-periphery pairs is likely to bias up estimates of the effects of
sharing a currency, as trade is likely to be higher (for other reasons) between
country pairs that have selected into a direct currency sharing arrangement.
Table 16 lists the country pairs sharing a single currency omitted by HMR.
Western Sahara has used the Moroccan dirham since 1976 and the Turks and
Caicos have used the US dollar since 196937.
Liberia, Bermuda, the Bahamas, the Turks and Caicos and Panama were all
using the US dollar during the 1980s, and shared a currency with each other as
well as with the United States. Guadeloupe, Reunion and French Guiana had
a similar relationship because of their mutual use of the French Franc.
Table 17 lists the country-pairs coded as sharing a currency by HMR which I
have excluded. The country-pairs in Table 17 have mostly been members of the
CFA Franc at some point. Under the umbrella of the CFA Franc are actually
several separate currencies in a fixed exchange rate arrangement, of which the
two main groups are the West African CFA Franc38 and the Central African
CFA Franc39. Both these CFA Francs have occasionally devalued against the
Franc, but they have done so simultaneously, and they have maintained their
fixed rate since inception in 1946, so it is reasonable to treat the two groups as
37See https://www.globalfinancialdata.com/gh/GHC Histories.xls38Members in 1980 were Benin, Burkina Faso, Cote d’Ivoire, Niger, Senegal and Togo39Members in 1980 were Cameroon, Central African Republic, Chad, Congo and Gabon.
42
HMR does not code CU; TB does YearsWestern Sahara Morocco 1980-89
USA Turks & Caicos 1980-89Liberia Bermuda 1980-89Liberia Bahamas 1980-89Liberia Turks & Caicos 1980-89Liberia Panama 1980-89
Bahamas Bermuda 1980-89Turks & Caicos Bermuda 1980-89
Panama Bermuda 1980-89Bahamas Turks & Caicos 1980-89Bahamas Panama 1980-89
Turks & Caicos Panama 1980-89Guadeloupe Reunion 1980-89
Reunion French Guiana 1980-89Guadeloupe French Guiana 1980-89
Table 16: Differences in coding of the currency union dummy
forming one currency area.
The Comorian Franc has also been fixed against the French Franc since its
independence from Madagascar, but it has not always coordinated its devalu-
ations with other CFA members40. This suggests that the Comorian Franc is
more appropriately thought of as having a fixed exchange rate against the CFA
Franc rather than participating in a currency union with the other members.
Madagascar withdrew from the CFA Franc in 1972; Mali withdrew in 1962
and rejoined in 1984; Equatorial Guinea first joined in 1984; Guinea-Bissau did
not join until 199741. This accounts for the bulk of the disagreements in Table
17.
Both Reinhart and Rogoff and the Global Financial Data database suggest
that during the 1980s the Dominican Republic and Guatemala had indepen-
dent currencies loosely pegged to the US $ but not in a currency union. The
Netherlands Antilles and Suriname also appear to have had independent cur-
rencies. Qatar and the UAE separated their currencies in 1973. South Africa
40For example in 1994 the Comorian Franc devalued by 1/3 against the French Franc whileother CFA members devalued by 1/2.
41See https://www.globalfinancialdata.com/gh/GHC Histories.xls and the appendix ofReinhart and Rogoff [16]
43
and Switzerland have never had a link between their currencies42.
B.9 Free Trade Agreements
Coding FTAs is also far from straightforward, as frequently agreements are
signed but not transparently implemented, and the degree of implementation
might be close to zero. This has been the case for many FTAs between de-
veloping countries, such as LAFTA43, the Andean Pact and Central American
Common Market44, and the many African FTAs45.
HMR ignore nominal developing world FTAs with the exception of CACM.
Following Pomfret, I exclude Honduras from the CACM FTA. Although the
Bahamas joined the Caribbean Community (CARICOM) in 1983, they opted
out of the customs union so I dropped their FTA dummy. The Australia New
Zealand Closer Economic Relations Trade Agreement came into force January
1st 1983, so I dropped the FTA dummy for 1981 and 1982.
The largest group of countries with FTAs added to the sample arise from
recognising EFTA as a fully functional FTA, and its bilateral FTA with the EC
following the accession of several of its members to the EC in 197346. These
country-pairs are listed in Table 19.
In addition there are several country pairs belonging to FTAs that HMR
recognise, but which have been omitted in error, sometimes for a few years
42This may be the result of a confusion between Switzerland and Swaziland, which is in acurrency union with South Africa, but is not in the sample.
43‘LAFTA was a far looser arrangement than the EEC. By 1967 disenchantment with itsprogress led to fragmentation as five Andean nations began negotiations, which culminated inthe 1969 Cartagena Agreement creating the Andean Pact In 1980 LAFTA was replaced bythe more “flexible” (i.e. with little binding content) Latin American Integration Association,’Pomfret [15], pp.99-100.
44‘The more homogeneous Andean Pact was initially more dynamic [than LAFTA], but inlate 1975 a crisis arose over the external tariff, sector plan, and election of new Pact authorities,after which the integration process slowed down. The Central American Common Marketfunctioned reasonably well for most of its first decade, but in 1968 Nicaragua introduceddiscriminatory measures against the other CACM members and in 1970 Honduras seceded,’Pomfret [15], p.100.
45‘Despite their profusion, the African integration schemes have had little impact on worldtrade... the schemes themselves have been ravaged by distributional conflicts as new nationshave been jealous of their policy-making autonomy,’ Pomfret [15], p.102.
46‘In July 1972 the EC signed bilateral free trade agreements in non-agricultural goods withthe remaining EFTA countries and Iceland,’ Pomfret [15] p.92; ‘For trade policy purposes,the economic integration of Western Europe was essentially complete,’ Pomfret [15] p.93.
44
HMR codes CU; TB does not Years HMR codes CU; TB does not YearsComoros Benin 1980-89 Mali Benin 1980-84Comoros Burkina Faso 1980-89 Mali Burkina Faso 1980-84Comoros Cameroon 1980-89 Mali Cameroon 1980-84Comoros CAR 1980-89 Mali CAR 1980-84Comoros Chad 1980-89 Mali Chad 1980-84Comoros Congo 1980-89 Mali Congo 1980-84Comoros Cote d’Ivoire 1980-89 Mali Cote d’Ivoire 1980-84Comoros Eq. Guinea 1980-89 Mali Eq. Guinea 1980-84Comoros Gabon 1980-89 Mali Gabon 1980-84Comoros Mali 1980-89 Mali Guinea-Bissau 1980-84Comoros Madagascar 1980-81 Mali Madagascar 1980-81Comoros Niger 1980-89 Mali Niger 1980-84Comoros Senegal 1980-89 Mali Senegal 1980-84Comoros Togo 1980-89 Mali Togo 1980-84
Eq. Guinea Benin 1980-84 Guinea-Bissau Benin 1980-89Eq. Guinea Burkina Faso 1980-84 Guinea-Bissau Burkina Faso 1980-89Eq. Guinea Cameroon 1980-84 Guinea-Bissau Cameroon 1980-89Eq. Guinea CAR 1980-84 Guinea-Bissau CAR 1980-89Eq. Guinea Chad 1980-84 Guinea-Bissau Chad 1980-89Eq. Guinea Congo 1980-84 Guinea-Bissau Congo 1980-89Eq. Guinea Cote d’Ivoire 1980-84 Guinea-Bissau Cote d’Ivoire 1980-89Eq. Guinea Gabon 1980-84 Guinea-Bissau Eq. Guinea 1980-89Eq. Guinea Mali 1980-84 Guinea-Bissau Gabon 1980-89Eq. Guinea Niger 1980-84 Guinea-Bissau Niger 1980-89Eq. Guinea Senegal 1980-84 Guinea-Bissau Senegal 1980-89Eq. Guinea Togo 1980-84 Guinea-Bissau Togo 1980-89Madagascar Benin 1980-81 USA Dominican Rep 1980-84Madagascar Burkina Faso 1980-81 USA Guatemala 1980-85Madagascar Cameroon 1980-81 Qatar UAE 1980-89Madagascar CAR 1980-81 Neth. Antilles Suriname 1980-89Madagascar Chad 1980-81 South Africa Switzerland 1980-89Madagascar Congo 1980-81Madagascar Cote d’Ivoire 1980-81Madagascar Gabon 1980-81Madagascar Niger 1980-81Madagascar Senegal 1980-81Madagascar Togo 1980-81
Table 17: Differences in coding of the currency union dummy
45
and sometimes for the full sample. Greenland joined the EC with Denmark in
1973 and withdrew in 198547. Costa Rica and El Salvador remained members
of CACM through the 1980s. Jamaica, Guyana and St.Kitts were members of
CARICOM. The other entries are country-pairs that were members of the EC
but were not coded by HMR as being in an FTA.
B.10 Religion
HMR report that their measure of religious proximity is constructed as (%
Protestants in country i· % Protestants in country j) + (% Catholics in country
i· % Catholics in country j) + (% Muslims in country i· % Muslims in country
j), with the data taken from the CIA World Factbook48. This measure has the
limitation that countries for which these religions do not constitute a substan-
tial share will automatically be classified as religiously unrelated to all other
countries. This is misleading as there are significant concentrations of other
religious groups across country borders49.
More seriously, HMR’s actual data is not consistent with their stated mea-
sure. Clearly the value for a particular country pair should be the same whichever
country is the importer or exporter. Unfortunately this is not the case in their
data, suggesting that something has gone wrong at their data compilation stage.
HMR use the low degree of correlation between religion and trade flows in
their regressions as justification for excluding the religion variable from their
gravity equation50. The scrambling of the data provides an alternative expla-
nation for the low correlation, and casts doubt on their claim that religious
proximity satisfies the exclusion restriction and can be omitted from the sec-
ond stage of their estimation, and hence on the estimates derived under that
assumption.
I reconstruct the data using Barrett et al. [4] which contains comprehensive
47http://europa.eu/abc/history/1980-1989/1985/index en.htm48HMR [13] pp.478,48049e.g. Orthodox Christians in Eastern Europe, or Buddhists in Asia50HMR [13] p.466
46
HMR codes FTA; TB does not Years HMR does not code FTA; TB does YearsCosta Rica Honduras 1980-89 Belgium Greenland 1980-85El Salvador Honduras 1980-89 Denmark Greenland 1980-85Guatemala Honduras 1980-89 France Greenland 1980-85Nicaragua Honduras 1980-89 Germany Greenland 1980-85Bahamas Barbados 1983-89 Greece Greenland 1981-85Bahamas Jamaica 1983-89 Ireland Greenland 1980-85Bahamas St.Kitts 1984-89 Italy Greenland 1980-85
New Zealand Australia 1981-82 Netherlands Greenland 1980-85UK Greenland 1980-85
Denmark Belgium 1980-89France Belgium 1980-89
Germany Belgium 1980-89Greece Belgium 1981-89Ireland Belgium 1980-89Italy Belgium 1980-89
Netherlands Belgium 1980-89Portugal Belgium 1986-89
Spain Belgium 1986-89UK Belgium 1980-89
France Denmark 1980-89Portugal Denmark 1986-89
Spain Denmark 1986-89France Ireland 1980-89France UK 1980-81France UK 1983-89
Germany Greece 1981-89Germany UK 1980-89Greece Italy 1981-84Greece Italy 1986-89Greece Netherlands 1981-89Italy Ireland 1980-89Italy Spain 1986-88UK Spain 1986-89
Costa Rica El Salvador 1980-89Jamaica St. Kitts 1980-88Guyana St. Kitts 1980-89
Table 18: Differences in coding of the FTA dummy: non-EFTA
47
HMR does not code FTA; TB does Years HMR does not code FTA; TB does YearsItaly Portugal 1980-85 Portugal Greenland 1980-85Italy Austria 1980-89 Austria Greenland 1980-85Italy Finland 1986-89 Iceland Greenland 1980-85Italy Iceland 1980-89 Norway Greenland 1980-85Italy Norway 1980-89 Switzerland Greenland 1980-85Italy Sweden 1980-89 Portugal Belgium 1980-85Italy Switzerland 1980-89 Austria Belgium 1980-89
Portugal Netherlands 1980-85 Finland Belgium 1986-89Austria Netherlands 1980-89 Iceland Belgium 1980-89Finland Netherlands 1986-89 Norway Belgium 1980-89Iceland Netherlands 1980-89 Sweden Belgium 1980-89Norway Netherlands 1980-89 Switzerland Belgium 1980-89Sweden Netherlands 1980-89 Portugal Denmark 1980-85
Switzerland Netherlands 1980-89 Austria Denmark 1980-89UK Portugal 1980-85 Finland Denmark 1986-89
Austria Portugal 1980-89 Iceland Denmark 1980-89Finland Portugal 1986-89 Norway Denmark 1980-89Iceland Portugal 1980-89 Sweden Denmark 1980-89Norway Portugal 1980-89 Switzerland Denmark 1980-89Sweden Portugal 1980-89 France Portugal 1980-85
Switzerland Portugal 1980-89 France Austria 1980-89Austria Spain 1986-89 France Finland 1986-89Finland Spain 1986-89 France Iceland 1980-89Iceland Spain 1986-89 France Norway 1980-89Norway Spain 1986-89 France Sweden 1980-89Sweden Spain 1986-89 France Switzerland 1980-89
Switzerland Spain 1986-89 Germany Portugal 1980-85UK Austria 1980-89 Germany Austria 1980-89UK Finland 1986-89 Germany Finland 1986-89UK Iceland 1980-89 Germany Iceland 1980-89UK Norway 1980-89 Germany Norway 1980-89UK Sweden 1980-89 Germany Sweden 1980-89UK Switzerland 1980-89 Germany Switzerland 1980-89
Austria Finland 1986-89 Greece Portugal 1981-85Austria Iceland 1980-89 Greece Austria 1981-89Austria Norway 1980-89 Greece Finland 1986-89Austria Sweden 1980-89 Greece Iceland 1981-89Austria Switzerland 1980-89 Greece Norway 1981-89Iceland Finland 1986-89 Greece Sweden 1981-89Norway Finland 1986-89 Greece Switzerland 1981-89Sweden Finland 1986-89 Portugal Ireland 1980-85
Switzerland Finland 1986-89 Austria Ireland 1980-89Iceland Norway 1980-89 Finland Ireland 1986-89Iceland Sweden 1980-89 Iceland Ireland 1980-89Iceland Switzerland 1980-89 Norway Ireland 1980-89Sweden Norway 1980-89 Sweden Ireland 1980-89
Switzerland Norway 1980-89 Switzerland Ireland 1980-89Switzerland Sweden 1980-89
Table 19: Differences in coding of the FTA dummy: EFTA
48
data on world religious adherence, using the data for 199051 and including a
broad range of 20 religious groups instead of just 3.
B.11 Regulation
HMR use the dataset of Djankov et al [5], and there are no anomalies to report
in the dataset. However, HMR misreport which countries have regulation data
available in footnote 2952. Kiribati, the Maldives and Zaire are included in
the list although they are in the Djankov et al dataset, while the Netherlands
Antilles are not included in the list although the regulation data for them is
not available. In HMR’s regressions including regulation data they omit the
observations of country-pairs including the Maldives, Netherlands Antilles and
Zaire, but include those of Kiribati.
References
[1] J.Anderson and E.van Wincoop, “Gravity with Gravitas: A Solution to
the Border Puzzle,”, American Economic Review, Mar 2003, v.93 no.1,
pp.170-192
[2] J.Baffoe-Bonnie, “Black-White Wage Differentials in a Multiple Sample
Selection Bias Model,” Atlantic Economic Journal, March 2009, v.37 no.1,
pp.1-16
[3] T.Baranga, “Identifying Relationships Between Income and Faith”, mimeo,
2007
[4] D. Barrett, G. Kurian and T. Johnson, “World Christian Encyclopedia,
2nd edition”, 2001, Oxford University Press
[5] S.Djankov, R.La Porta, F.Lopez-de-Silanes and A.Shleifer, “The Regula-
tion of Entry,” Quarterly Journal of Economics, 2002, v.117, pp.1-37
51Supplementing this with data from the 1st edition for Hong Kong52HMR [13], p.461
49
[6] R.Feenstra, R.Lipsey and H.Bowen, “World Trade Flows 1970-1992, with
Production and Tariff Data,” NBER Working Paper 5910, Jan 1997
[7] R.Feenstra, “World Trade Flows, 1980-1997,” Center for International
Data, March 2000
[8] R.Feenstra, R.Lipsey, H.Deng, A.Ma and H.Mo, “World Trade Flows, 1962-
2000,” NBER Working Paper 11040, Jan 2005
[9] A.Flores and T.Reynolds, Foreign Law Guide
[10] R.Glick and A.Rose, “Does a Currency Union Affect Trade? The Time
Series Evidence,” European Economic Review, v.106, pp.1125-1151
[11] J.Heckman, “The Common Structure of Statistical Models of Truncation,
Sample Selection, and Limited Dependent Variables and a Simple Esti-
mator for Such Models,” Annals of Economic and Social Measurement 5,
475-492
[12] J.Heckman, “ Sample Selection Bias as a Specification Error,” Economet-
rica 47, 153-161
[13] E.Helpman, M.Melitz and Y.Rubinstein, “Estimating Trade Flows: Trad-
ing Partners and Trading Volumes,” Quarterly Journal of Economics, May
2008, v.123 no.2, 441-487
[14] R.La Porta, F.Lopez-de-Silanes, A.Shleifer and R.Vishny, “The Quality of
Government,” Journal of Law Economics and Organisation, 1999
[15] R.Pomfret, The Economics of Regional Trading Arrangements, 1997
[16] C.Reinhart and K.Rogoff, “The Modern History of Exchange Rate Ar-
rangements: A Reinterpretation”, NBER Working Paper 8963
50
[17] I.Tunali, “A General Structure for Models of Double-Selection and an Ap-
plication to a Joint Migration/Earnings Process with Remigration,” Re-
search in Labor Economics, v.8, 1986, part B, pp.235-283
[18] T.Vicenty, “Direct and Inverse Solutions of Geodesics on the Ellipsoid with
Application of Nested Equations”, Survey Review, vol. 23, no. 176, April
1975, pp 88-93
51