Unreported Trade Flows and Gravity Equation Estimation...Unreported Trade Flows and Gravity Equation Estimation Thomas Baranga May 15, 2009 Abstract Some widely used trade databases

Unreported Trade Flows and Gravity Equation

Estimation

Thomas Baranga∗

May 15, 2009

Abstract

Some widely used trade databases do not distinguish between zero and

unreported trade flows. The number of unreported trade flows is high but

they account for a small volume of world trade, so the distinction may

be unimportant for traditional gravity equation estimation. However,

techniques that separately estimate the intensive and extensive margins

of trade may be more sensitive to the distinction. This paper develops a

methodology to consistently estimate the Helpman, Melitz and Rubinstein

model when some trade is unreported. This also breaks the relationship

between the sample selection and heterogeneity correction terms, reducing

collinearity of the regressors. A natural exclusion restriction identifies the

model, removing the need to distinguish fixed from variable costs of trade.

1 Introduction

The new literature on firm-level heterogeneity has revived interest in distin-

guishing the intensive and extensive margins of trade. Helpman, Melitz and

Rubinstein (2008) have demonstrated how, in the Melitz model in which firms∗My thanks for many invaluable conversations and support to Elhanan Helpman, Emilie

Feldman, Ian Martin and Yona Rubinstein. Any mistakes remain my own.

1

vary in their productivity, traditional estimates of the gravity equation confound

the effects of the two margins. HMR develop a methodology with which to con-

sistently estimate both margins, separating out the effects of trade barriers on

firms’ decisions to enter export markets from their influence on the quantities

that firms will export.

HMR’s methodology exploits the presence of country-pairs which do not

trade at all to estimate the role of fixed costs that prevent entry into export

markets, using probit to estimate the determinants of whether there is any trade

in the aggregate. With consistent estimates of the role of fixed costs of trade,

one can also estimate trade barriers’ influence on the intensive margin, taking

into account that exporting firms may have very different productivity levels.

The first stage of HMR’s procedure relies on the presence of zero trade flows

at the aggregate level to estimate the role of fixed costs in firms’ entry decisions.

However, the reliability of this data is questionable. The quality of the reporting

of trade data is very variable over time, and across different countries. If all trade

partners accurately reported their trade flows, we would have two corroborating

reports for each trade flow, from both the exporter and importer. However, it

is well known that there is wide variation in the level of trade reported by each

partner1.

Less well recognised is that countries frequently fail to report their trade

at all. In 1986, the baseline year of HMR’s study, only 112 countries reported

any of their trade to the UN’s Comtrade database, which forms the basis of

Feenstra et al’s dataset, World Trade Flows, used by HMR. Out of HMR’s

original sample of 158 countries, only 103 reported their trade. Furthermore,

reporting is not necessarily complete even among those countries which report

some of their trade. Of the 8927 positive trade flows between partners that both

1For example, Feenstra and co-authors assume that reporting of imports is more accurate

than reporting of exports, and reconcile the difference between the numbers by adopting the

importer’s report when it exists, and the exporter’s if there is no report by the importer.

2

made a report of some of their trade to the UN, 2155 (24%) were reported by

only one partner.

Feenstra’s dataset does not distinguish between flows that are zero and flows

that are unreported, and for traditional estimates of the gravity equation, this

distinction was probably quite unimportant. However, if one does not try to

take this into account when estimating HMR’s model, one would automatically

classify a large number of flows as zero, with implications for the estimates of

the fixed costs of trade. Since 55 countries in HMR’s sample did not report their

trade at all, 2970 observations for which there was definitely no report may be

misclassified. Given that even countries which report some of their trade do not

usually report all of it, the status of trade flows between country-pairs in which

only one side reports will also be somewhat unreliable.

The reliability of reporting depends in part on characteristics of the country,

and also on the size of the trade flow: small trade flows may be more likely to

go unreported than large ones. Since the reporting decision is correlated with

the underlying trading relationship, failing to account for the sample selection

driven by non-reporting may bias estimates.

A second reason to take into account non-reporting is that it weakens the

collinearity of regressors in HMR’s model. In HMR’s original framework, the

correction for sample selection is estimated from the same probit as that for

omitted productivity heterogeneity. While this simplifies the estimation, it gen-

erates collinearity in the model, as discussed further below. Controlling for the

additional sample selection due to non-reporting breaks this connection.

HMR’s original framework used factors that affect fixed but not variable

costs of trade to separately identify the effects of productivity-heterogeneity and

selection. However, finding such variables can be challenging. The introduction

of an additional source of sample selection allows identification of the intensive

and extensive margins without finding a factor that only affects fixed but not

variable trade costs. Some countries do not report any of their trade in a

3

particular year, and for a pair of these countries, we know for certain that a

trade-flow will not be observed. However, a country’s decision not to participate

in the Comtrade database is uncorrelated with their bilateral trade, and so is

excludable from the other two equations.

The following sections of the paper document the quality of reporting and

develop a methodology to consistently control for the sample selection induced

by some countries’ failure to report their trade. The final section compares

estimates derived from HMR’s original technique to the modified approach.

2 Reporting of Trade Flows

There are three major databases for global trade flows. The UN and the IMF

both collect trade data from their members, in the Comtrade and Direction of

Trade Statistics databases respectively. In addition, Feenstra and co-authors

have assembled and maintained a large database, World Trade Flows, which

is derived from both the UN and IMF databases, supplemented by data from

some national trade records.

The Feenstra database makes a number of corrections to the original UN and

IMF data, reconciling importers’ and exporters’ differing reports into a single

number, and correcting entrepot trade flows. It also establishes concordances

between SITC1, SITC2 and SIC codes, allowing matching of disaggregated trade

flows over time and between trade and industrial production. As part of their

procedure for making adjustments to commodity level trade flows, Feenstra et

al benchmark the aggregate trade flow to the level reported in the IMF’s DOTS

data2. The number of countries reporting their trade to the IMF is consistently

2“The decision was to benchmark each country’s total exports to the world to the world

total of imports from that country reported in the International Monetary Fund volumes on

The Direction of Trade ... Data by partner country and by commodity were then adjusted in

various ways so as to be compatible with these control totals,” Feenstra, Lipsey and Bowen

[6], pp.3-4; Feenstra [7], p.3.

4

lower than that to the UN, and this procedure appears to lead Feenstra et al to

omit a large number of small trade flows.

Table 1. DATA COVERAGE, 2001–2005

DATA REPORTED FOR:

Complete Year Part of the Year DATA NOT REPORTED

Number of Percent of Number of Percent of Number of Percent of Countries1 World Trade Countries World Trade Countries World Trade

Exports 2005 96 (72) 92 0 0.00 86 82004 105 (81) 93 1 0.01 76 72003 116 (92) 95 0 0.00 66 52002 116 (92) 96 0 0.00 66 42001 120 (97) 95 0 0.00 62 5

Imports 2005 99 (75) 95 0 0.00 83 52004 107 (83) 95 1 0.01 74 52003 117 (93) 96 1 0.04 64 42002 119 (95) 97 0 0.00 63 32001 123 (100) 97 0 0.00 59 3

1The figures in parentheses indicate the number of developing countries that reported complete data for the respective year.

Figure 1: Taken from the DOTS database’s documentation

Complete documentation of reporting to the UN is available online from the

UN’s Comtrade database. Unfortunately less documentation is available for the

IMF’s DOTS, but Figure 1, taken from the DOTS’ supporting documentation,

summarises the extent of reporting for 2001-2005. The striking feature of Figure

1 is that almost as many countries did not report their trade to the IMF as did.

Although I have not been able to find data on the extent of reporting to

the IMF for HMR’s sample period, it is clear that significantly more countries

reported their trade to the UN than to the IMF. Reporting to the UN from the

Comtrade database is presented in Figure 2 and Table 1. Reporting to both

institutions follows the same trend from 2001-2005 (the years for which the IMF

figure is available), but the number reporting to the IMF is significantly lower.

Figure 3 illustrates the extent of the missing data problem in the sample,

comparing the trade-flows in Feenstra, for the 158 countries in HMR’s sample,

5

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005 201080

90

100

110

120

130

140

150

160

170Number of countries reporting trade-flows to the UN and IMF

UNIMF

Figure 2: Extent of Reporting to the UN and IMF

Year UN IMF2001 166 1232002 164 1192003 162 1172004 159 1072005 153 99

Table 1: Comparison of Reporting of Trade-Flows to the UN and IMF

6

1970 1975 1980 1985 1990 1995 20000

1000

2000

3000

4000

5000

6000

7000unreported to UNunreported to UN, pos in Feenstra pos in UN, 0 in Feenstrareported 0 to UN, pos in Feenstratotal missing in Feenstratotal missing/0 in UN, pos in Feenstra

Figure 3: Positive and Missing Trade Flows in Feenstra’s Data

with the data direct from the UN’s Comtrade database3. The blue line shows

how many of the bilateral trade-flows in the UN data are definitely missing

because neither partner reported its trade to the UN in that year4. This moves

inversely with the number of reporters shown in Figure 25. The missing data

problem peaks in the sub-set of Feenstra’s data used by HMR. The green line

illustrates the number of zero trade-flows in Feenstra’s data that the UN records

as positive. The reason these trade-flows are missing from the Feenstra data is

almost certainly because they were not reported to the IMF and so have been

omitted per Feenstra’s benchmarking procedure6.

3To produce a single number from both an importer and exporter’s report, I followedFeenstra’s convention of adopting the importer’s report.

4This is a conservative estimate. When neither partner reports any of their trade we canbe certain that the flow is not observed. Since reporting is incomplete even for countries thatreport some of their trade, it is likely that additional flows are also unreported, particularlythose for which only one partner makes any reports.

5The correlation is not perfect because Figure 2 shows the total number of reporters inthe world rather than the sample. For example, in 1996 the number of reporters to the UNincreased slightly while the amount of missing trade also increased because the number ofreporters in the sample fell slightly.

6To be fair to Feenstra, he does not record non-positive trade-flows as zero - this is aninterpretation imposed by HMR - and in the documentation for the latest revision of his tradedata, he explicitly recognises this problem. In table 1 of Feenstra et al (2005) [8] he lists

7

The black line is the sum of the blue and the green, and shows the number

of observations treated as zero by HMR that are actually either positive, or

should be treated as missing. With 158 countries in the sample, there are 24806

possible trade-flows per year. A conservative estimate of the average number

misclassified over HMR’s sample period of 1980-89 is about 5000, or roughly

20% of the sample, a significant number.

The other three lines in Figure 3 shows how many trade-flows recorded as

positive by Feenstra are missing or zero in the UN data. There are a handful of

such observations, many of which are associated with Taiwan, whose trade was

not officially recorded by the UN but was for a time by the IMF. It seems that

any country that reported its trade to the IMF also reported it to the UN, but

not necessarily vice versa.

0 1 2 3 4 5 6 7 8 9 10 11 120

10000

20000

30000

40000

50000

60000

70000

80000

90000

10x <= trade < 10x+1

Distribution of trade-flows in the data sets, 1970-1997

UN dataFeenstra data

Figure 4: Distribution of Positive Trade Flows

Figure 4 shows the distribution of the positive trade flows in the two data

the countries (only 65 of which are in HMR’s sample) for which he has reported data for1984-2000, and notes “When the two countries are both not included in Table 1, however, thetrade flows for 1984-2000 are entirely missing from the dataset,” pp.2-3

8

sets, and that the positive flows unrecorded in Feenstra (the green line in Figure

3) are mostly small. Feenstra’s data contains no flows less than $1000, and most

of the missing flows are less than $1 million.

1970 1975 1980 1985 1990 1995 20000

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Volume of Trade Missing from Feenstra and Recorded as Positive by the UN, as % of Total World Trade

% o

f Wor

ld T

rade

Figure 5: % of World Trade Missing from Feenstra by Volume

The small magnitude of the missing flows is reflected in the small percentage

of world trade that is missing from the Feenstra data but included in the UN

data, as represented in Figure 5, averaging less than 1% of world trade over the

whole of Feenstra’s sample, and less than 1% for the sub-sample used by HMR.

For many applications, ignoring these trade flows might be unproblematic,

as they are relatively close to zero and treating them as such should not be a

source of bias. However, HMR’s procedure distinguishes importantly between

zero and positive trade flows. Figure 6 illustrates how many zero trade flows

there are in the two data sets - the difference is quite steady at around 5000,

or 20%, in each year. This suggests that although in terms of trade volume the

missing trade is not significant, in terms of the number of observations, about

20% are misclassified in HMR’s original dataset.

9

1970 1975 1980 1985 1990 1995 20000

2000

4000

6000

8000

10000

12000

14000

16000Zero trade flows in the different data sets

UN dataFeenstra data

Figure 6: Zero Trade Flows by Year

1970 1975 1980 1985 1990 1995 20000.75

0.8

0.85

0.9

0.95

1Correlation of Feenstra and UN trade data

Conditional on both flows positiveConditional on both flows observed

Figure 7: Correlation of Trade in UN and Feenstra datasets

10

Feenstra’s dataset is not appropriate for HMR’s analysis, as one cannot

distinguish between flows that are missing or are really zero. Fortunately the

UN’s data has thorough documentation that shows which countries reported

and which did not. A drawback of using the UN data is that Feenstra has made

adjustments to the UN data which are lost by reverting to the UN data. Many

of the adjustments relate to the disaggregated trade data with which we need

not be concerned. However, some relate to the aggregate data and adjustments

for entrepot trade. I have not attempted to make corrections for entrepot trade.

However, these affect only a small minority of trade flows, and so on balance

ignoring this seems to be a cost worth bearing. Figure 7 shows the correlation

between Feenstra’s and the UN’s data. The blue line indicates the correlation

conditional on both data sets recording a positive flow; the red line indicates

the correlation conditional on the data that is observed in the UN dataset. The

correlation is reassuringly high (above 0.99 for HMR’s sub-sample), and suggests

that Feenstra’s adjustments to the aggregate trade flows have been relatively

small.

2.1 Estimating the Propensity to Report Trade

We can distinguish three groups of observations, based on whether the countries

involved report their trade: (A) neither partner reports; (B) both partners re-

port; (C) only one partner reports. For group (A), trade is certainly unreported.

For group (B), if both countries fully reported their trade we should have two

numbers for each trade flow, but as alluded to above, 24% of these flows that are

positive only have one partner reporting. This variation in reporting standards

allows one to estimate the propensity of countries to report their trade.

For the subset of countries that file some report of their trade, we can observe

reports of a particular trade flow from both the importer and exporter. In

particular, conditional on a trade flow being reported by at least one partner,

we can observe whether the flow is reported by the other side. The propensity

11

[1] [2] [3] [4]Importer Exporter Importer Exporter

log(trade) 0.229** 0.235** 0.3** 0.323**(0.009) (0.006) (0.012) (0.009)

R-sq 0.255 0.27 0.377 0.419n 7301 8398 7301 8398

** p<0.01, * p<0.05

Data for 1986. Columns [3] and [4]: reporter fixed effects

Probit of reporting trade > 0, conditional on partner reporting trade > 0

Table 2: Influence of the level of trade on the propensity to report

of a partner to report, conditional on the flow being reported by their partner,

can be estimated by probit.

Table 2 shows that the size of the underlying trade flow has a strong influence

on a country’s propensity to report it. Columns [3] and [4] include reporter

fixed effects to control for idiosyncratic differences in reporting quality across

countries. These account for a significant amount of the variation, pushing up

the R-squared of the regressions.

Since by definition trade is unobserved if it is unreported, we cannot extend

this approach to estimate the unconditional probability of a flow being reported.

But the strong correlation between trade and reporting is prima facie evidence

that the attrition in the sample of trade flows that are missing due to a failure

to report is not random, leading to a classic sample selection problem.

3 Controlling for Unreported Trade Flows

HMR’s technique addresses the problem of sample selection due to the omission

of observations without trade with Heckman’s (1976, 1979) sample selection

correction. This two-step procedure controls for the possibility that sample

selection may bias estimates of the main equation of interest by first estimating

a selection equation, and then controlling for possible correlation between the

selection and main equations.

Taking into consideration non-reporting of trade as an additional level of

12

sample selection, we face a simultaneous sample selection problem, in which

an observation could be missing from the log-linearised gravity equation either

because it is zero, or because it is unreported.

3.1 Sample Selection in the Original HMR Model

The first step in controlling for sample selection is to specify a structural model

relating the selection and main equations. In the original HMR framework, this

was

mij = β0 + λj + χi − γdij + ln(eδ(z∗ij+ˆη∗ij) − 1) + eij (1)

Tij = 1[z∗ij > 0] (2)

z∗ij = γ∗0 + ξ∗j + ζ∗i − γ∗dij − κ∗φij + η∗Zij(3)

Equation (1) is HMR’s intensive-margin gravity equation for imports mij of

i from j. λj and χi are exporter and importer dummies, controlling for the

multilateral resistance terms analysed by Anderson and van Wincoop (2003).

dij is a vector of trade barriers that affect variable costs of trade. The error

terms eij and η∗Zijare jointly normally distributed eij

η∗Zij

∼ N(0,Σ),Σ =

σ2e σeηZ

σeηZ1

z∗ij is the fitted value of the latent variable of the probit specified in equations

(2) and (3). Tij is an indicator of whether there are any imports from j to i, as

determined by the latent variable z∗ij . This may be influenced by the same trade

barriers that affect the intensive-margin, dij , and possibly additional factors,

denoted φij .

A sample selection bias can arise if equation (1) is estimated by OLS, because

what we can estimate is E[mij |Tij = 1]7. If E[eij |Tij = 1] = E[eij ] = 0

then there is no sample selection problem, and OLS is unbiased. However, if

7All the conditional expectations are also conditional on the full set of regressors. This issuppressed for notational convenience only.

13

E[eij |Tij = 1] 6= 0 there is a problem. Fortunately, under the assumptions of

equations (1)-(3) we have a consistent estimator of E[eij |Tij = 1]

E[eij |Tij = 1] =σeηZ

σ2e

E[η∗Zij|Tij = 1] =

σeηZ

σ2e

ˆη∗ij

where ˆη∗ij = φ(z∗ij)

Φ(z∗ij

) is the inverse Mills ratio estimated from the first-stage probit,

and which gives a consistent estimate of E[η∗Zij|Tij = 1].

ln(eδ(z∗ij+ˆη∗ij)−1) is the term introduced by HMR to control for the potential

correlation between the trade barriers, dij , and the average productivity of firms

in j that have chosen to enter the export market to i. The productivity of these

firms will affect the volume of their sales; and this productivity will be correlated

with trade barriers, because higher trade barriers will induce less productive

firms to exit the market.

HMR show that the latent variable z∗ij of whether or not there is trade is

related to the productivity level of the marginal exporter, and so can be used

to estimate the unobserved heterogeneous productivity term in the aggregate

gravity equation. Under the assumption that the distribution of firm produc-

tivity is Pareto, this has the form max{(Z∗ij)δ−1, 0}. To include this term in an

estimation we need to simplify the ‘max’ term. Fortunately, only observations

for which trade is positive are observed, so in equation (1) all observations will

have (Z∗ij)δ − 1 > 0. We do not observe, Z∗ij , but can estimate E[z∗ij ] with z∗ij

8.

To simplify the ‘max’ we need E[z∗ij |Tij = 1], but this is E[z∗ij ] + E[η∗ij |Tij = 1],

which as discussed above in the context of sample selection can be estimated

with the inverse Mills ratio, giving ˆz∗ij = z∗ij + ˆη∗ij .

Thus HMR elegantly show how to address both the sample selection and

productivity heterogeneity biases in a simple two-stage procedure, using the

same first-stage probit. However, this elegance comes at the cost of the model

being potentially underidentified. The estimated latent variable z∗ij is a linear

combination of the regressors included in equation (3). If the regressors in

8z∗ij = log(Z∗ij)

14

equations (1) and (3) are the same, then z∗ij is perfectly collinear with the trade

barriers in equation (1). Since ˆη∗ij is also included as a regressor in the gravity

equation to control for the zero-trade sample selection, this collinearity extends

to ˆz∗ij = z∗ij + ˆη∗ij .

This collinearity is a particular problem for a non-parametric estimate of

the heterogeneity productivity bias, which would proceed by including a high-

degree polynomial of ˆz∗ij instead of the term derived from the assumption of

a Pareto distribution for firms’ productivity. Potentially the non-linearity of

ln(eδ(z∗ij+ˆη∗ij) − 1) means that the perfect collinearity between z∗ij + ˆη∗ij and the

other trade barriers does not prevent identification of the parametric model.

However, for ‘large’ δ, ln(eδ(z∗ij+ˆη∗ij) − 1) ≈ δ(z∗ij + ˆη∗ij) and the regressors are

collinear again.

For HMR’s sample the non-linearity of the heterogeneity-bias term is insuf-

ficient to identify the model, and this motivates their search for an additional

exclusion restriction. This is a variable φij , which enters into equation (3) but

not equation (1). Such a variable breaks the collinearity problem, by introduc-

ing an extra source of variation into z∗ij that is not collinear with the regressors

of equation (1). In economic terms, this would be a factor that affects the fixed,

but not the variable, costs of trade.

HMR proposed two potential exclusion restrictions: measures of the costs of

starting a firm, as compiled by Djankov et al (2002); and an index of religious

similarity. There are drawbacks to using either of these exclusion restrictions.

Regulatory costs seem like they would be correlated with fixed costs of entry

into business, and possibly by extension with the fixed costs of entering export

markets, although this is less clear. However, they may also be correlated with

factors affecting variable costs of trade9, violating the exclusion restriction. A

9For example, a country with higher regulatory barriers may also be more likely to be ahigher tax environment, which would be expected to reduce the profitability of exporting atthe intensive margin too. Countries with more regulation might also be more likely to usequantitative trade restrictions such as import or export licenses, or other non-tariff barriers,which would also affect the intensive margin, but are typically not controlled for.

15

second weakness of the regulatory data is that it is only available for a sub-set

of countries (116 out of HMR’s full sample of 158), and so cannot be used in a

broad panel setting10.

The conceptual case for the validity of religion as a factor affecting fixed

but not variable costs of trade is very unclear. In their original paper HMR

justify the exclusion on the grounds that their religion variable is not statis-

tically significant in a benchmark OLS gravity equation, suggesting that it is

broadly uncorrelated with trade. Unfortunately, there is a problem with their

original data11. Replacing their data with a similar index compiled from Bar-

rett et al (2001) indicates that religion is a highly significant variable in the

benchmark gravity equation, undermining the prima facie case for the validity

of the exclusion restriction.

The difficulty of finding valid or practical exclusion restrictions is a potential

pitfall of HMR’s methodology. One motivation for controlling for the sample

selection induced by the non-reporting of trade is that it weakens the collinearity

between the productivity-hetereogeneity and sample-selection correction terms.

This allows more general identification of the model.

3.2 Controlling for Non-Reporting of Trade

Heckman’s sample-selection correction can be extended to the case of multiple

selection decisions, by jointly estimating the underlying selection relationships.

HMR’s system of equations (1)-(3) is extended to include an equation specifying

10Reduction of the sample size is a potentially serious problem for HMR’s methodology,which relies on the presence of zero trade flows in the aggregate. If country j exports to allother countries in the sample, then its exporter-specific dummy perfectly predicts trade inthe first-stage. This is problematic because it implies a fitted value of infinity for z∗ij , whichmeans that all observations of exports from j must be dropped from the second-stage, as theheterogeneity correction term cannot be estimated. For 1986, the baseline year for HMR’sstudy, the reduction in sample size only led to the dropping of 9 importers or exporters (11when one uses the more complete UN data which includes some positive trade flows omittedfrom the Feenstra dataset). However, given the growth in trade over time, in later yearsthis reduction in the sample could be a critical problem for applying the technique, as morecountries will trade with all members of this sub-sample.

11The measure is an index of religious similarity of a country-pair, and as such should bethe same for the observation of country i’s exports to j as for the observation of i’s importsfrom j. Unfortunately this is not the case, which indicates that there has been a corruptionof their data.

16

the reporting decision

Rij = 1[r∗ij > 0] (4)

r∗ij = τ∗0 + ϕ∗j + ω∗i − ν∗dij + η∗Rij(5)

where the latent variable driving the reporting decision, r∗ij , depends on the

trade barriers dij , importer/exporter dummies, and a normally distributed error

η∗Rij. The errors of equations (1), (3) and (5) are assumed jointly Normally

distributed12e

η∗Zij

η∗Rij

∼ N(0,Σ),Σ =

σ2e σeηZ

σeηR

σeηZ1 σηZηR

σeηRσηZηR

1

The selection relationships follow Poirier’s (1980) model of a bivariate pro-

bit13. The bivariate probit with partial observability treats the dependent vari-

able as 1 if we observe a positive trade flow, and 0 otherwise. The dependent

variable is assumed to be 1 if both the underlying probits are 1, and 0 if either

is 0. This yields the log-likelihood function

ln(L) =∑n

yij ln(F (z∗ij , r∗ij , ρ)) + (1− yij) ln(1− F (z∗ij , r

∗ij , ρ))

where for notational convenience ρ denotes the covariance between η∗Zijand

η∗Rij, previously denoted as σηZηR

, and F (z∗ij , r∗ij , ρ) is the CDF of the bivariate

normal distribution with unit variances and covariance ρ. yij is the dependent

variable, which is 1 if a positive trade flow is observed.

The parameters of both underlying selection equations can be jointly esti-

mated from this log-likelihood function. Poirier (1980) discusses the identifia-

bility of the model. The reduced form parameters are locally identified except12The assumption of unit variances for η∗Zij

and η∗Rijis without loss of generality. The

coefficients of a probit can only be estimated up to scale, so the coefficients in equations (3)and (5) are normalised by the variance of their respective errors. The other key parameter inwhat follows is the covariance σηZηR , but this enters the following equations as the correlation,which is the covariance automatically scaled by the variances.

13The model is also treated very approachably in Maddala (1983), pp. 278-283. See Grilli(2005) for an application.

17

in pathological cases14. However, there can be a labelling problem, as if the

regressors are identical in both equations it is not possible to identify which co-

efficients correspond to which selection relationship due to the symmetric nature

of the problem. However, as long as there is at least one variable excludable

from one of the selection equations, this labelling problem is resolved.

Controlling for the simultaneous selection in the intensive-margin log-linearised

gravity equation is straightforward. When estimating equation (1) by OLS, we

estimate E[mij |Tij = 1, Rij = 1]. As in the single selection case, we need to

take into account the possibility that E[eij |Tij = 1, Rij = 1] 6= E[eij ] = 0.

E[eij |Tij = 1, Rij = 1] = βZH∗Zij

+ βRH∗Rij

where

βZ ≡ σeηZ

σ2e

βR ≡ σeηR

σ2e

H∗Zij

≡φ(z∗ij)Φ

(r∗ij−ρz

∗ij√

1−ρ2

)F (z∗ij , r

∗ij , ρ)

H∗Rij

≡φ(r∗ij)Φ

(z∗ij−ρr

∗ij√

1−ρ2

)F (z∗ij , r

∗ij , ρ)

These expressions are analogous to the inverse Mills ratio, extended to the

two-stage selection case. If there is no correlation between the two selection

stages (ρ = 0), then the expressions simplify down to the single-variable selection

correction. F (z∗ij , r∗ij , 0) = Φ(z∗ij)Φ(r∗ij), so

H∗Zij

=φ(z∗ij)Φ(z∗ij)

H∗Rij

=φ(r∗ij)Φ(r∗ij)

14Such as equality of the coefficients across the two equations.

18

and the dual sample selection is controlled for simply by including a standard

inverse Mills ratio for each stage. The objects z∗ij , r∗ij , and ρ are all estimable

quantities from the first stage, and so the sample selection corrections can be

made in a two-step procedure analogous to HMR’s original method.

Defining

ˆη∗ij ≡ E[η∗Zij|Tij = 1]

eij ≡ eij − βZH∗Zij− βRH∗

Rij

˜eij ≡ eij − βHMRˆη∗ij

we consistently estimate the intensive margin gravity equation (1) by

mij = β0 + λj + χi − γdij + βZH∗Zij

+ βRH∗Rij

+ ln(eδ(z∗ij+ˆη∗ij) − 1) + eij (6)

Comparing this to HMR’s original corrected equation

mij = β0 + λj + χi − γdij + βHMRˆη∗ij + ln(eδ(z∗ij+ˆη∗ij) − 1) + ˜eij (7)

the key difference is the change in the sample selection corrections, H∗Zij

and

H∗Rij

instead of ˆη∗ij . This difference weakens the collinearity between ˆz∗ij =

z∗ij + ˆη∗ij and the other regressors. In HMR’s original equation, the ‘coincidence’

that the control for the productivity term being positive and trade being positive

was the same meant that both were controlled for by the same inverse Mills ratio,

ˆη∗ij .

Using the modified inverse Mills ratios breaks this coincidence. The cor-

rection for the productivity term is the ‘original’ inverse Mills ratio15, which

controls for the fact that productivity is only in the regression when above its

cutoff. This inverse Mills ratio is conditional on Tij > 0, but not conditional on

Rij > 0 too, as the reporting decision is irrelevant to the underlying productiv-

ity cut-off, once the parameters of equation 3 have been consistently estimated.

15‘Original’ in the sense that it has the same functional form. Its value will be different,as the estimates of the parameters of equation (3) will have changed, as they will reflect theestimates from the joint estimation which controls for some observations being unreported.

19

However, the corrections for sample selection in the main gravity equation re-

quire both of the new modified inverse Mills ratios. Since ˆη∗ij is a non-linear

function of z∗ij , and is no longer itself included as a regressor in the main equa-

tion, ˆz∗ij is now a non-linear function of the other regressors, and no longer

collinear with them.

Controlling for both dimensions of sample selection not only consistently

estimates the parameters of the underlying structural models (assuming that

they are correctly specified), but also separates the correction for sample selec-

tion from the correction for heterogeneity, allowing both effects to be identified

without distinguishing fixed versus variable trade costs.

4 Empirical Results

Table 3 reports ‘traditional’ OLS gravity equation estimates on two samples.

Column [1] reports for the full sample of 175 countries. Out of a possible 30450

trade flows, 14503 were observed positive.

Column [2] repeats this for the sub-set of 116 countries for which there is

data on regulatory costs of entry. Out of a possible 13340 flows, 8583 were

observed positive. Some countries in the Reg sample export or import to all

other partners in the sample16. This makes their country exporter or importer

dummy a perfect predictor of the outcome in the first-stage probit, which implies

an infinite coefficient on the dummy and for the latent variable in the probit.

The second stage estimation cannot proceed with an infinite value for ˆz∗ij , so

these observations must be dropped, reducing the number of useable positive

observations in the second stage to 732717. To maintain consistency between the

second stage sample and the ‘benchmark’ gravity equation these observations

are also dropped here.

Column [3] repeats the regression of Column [2], but includes the measures

16The exporters are Japan, Hong Kong, Denmark, France, Germany, Italy, the Netherlands,Sweden, the UK, and Norway. The importer is Japan.

17The same issue arises in HMR’s original paper. See the discussion on pp.461-462.

20

[1] [2] [3]

All Reg Excl Reg Incl

log(Distance) -1.305*** -1.278*** -1.295***(0.0269) (0.0413) (0.0415)

Border 0.0605 0.216 0.212(0.123) (0.154) (0.154)

Island 0.719*** 0.750*** 0.732***(0.0858) (0.176) (0.176)

Landlock 0.265 0.0838 0.0889(0.186) (0.198) (0.197)

Colonial 0.925*** 0.558*** 0.553***(0.110) (0.158) (0.157)

Language 0.371*** 0.356*** 0.351***(0.0570) (0.0809) (0.0808)

Legal 0.321*** 0.372*** 0.384***(0.0438) (0.0603) (0.0603)

Religion 0.443*** 0.682*** 0.690***(0.0897) (0.127) (0.127)

CU 1.884*** 1.466*** 1.531***(0.222) (0.370) (0.370)

FTA 0.446*** -0.273 -0.214(0.116) (0.181) (0.181)

Reg: cost -0.331***(0.0973)

Reg: days -0.234**(0.111)

Observations 14503 7327 7327R2 0.706 0.683 0.684

*** p<0.01, ** p<0.05, * p<0.1

Standard errors in parentheses

Column [1]: all 175 countries

Columns [2] and [3]: 116 countries with regulation data

Importer and Exporter dummies

Table 3: Benchmark ‘Traditional’ OLS Gravity Equations

21

of regulatory costs of starting a business, which HMR use as their exclusion

restriction to identify the intensive margin. The regulation variables are highly

significant in the traditional OLS regression. Although this does not necessar-

ily invalidate the exclusion restriction (their statistical significance could reflect

omitted variable bias, through their correlation with the omitted heterogeneity-

productivity term), it is prima facie evidence that they are strongly correlated

with the volume of trade, which is suggestive that they might affect both inten-

sive and extensive margins.

Table 4 reports estimates for the latent variables for the first-stage probits

on the regulation sub-sample. Column [1] gives the estimates for a univariate

probit on the observed positive trade flows, following HMR’s methodology.

Columns [2] and [3] of Table 4 report the joint maximum likelihood estimates

of the positive trade and reporting probits using the partially observed bivari-

ate probit model. Comparing the coefficients of the underlying positive-trade

and reporting probits to the univariate probit, those of the univariate probit

generally lie inbetween those of the two bivariate probits, suggesting that the

outcome of the univariate probit reflects a mixture of the two selection processes.

The trade barriers are highly correlated with the reporting decision, which sug-

gests that the selection induced by non-reporting should not be ignored. The

correlation between the errors in equations 3 and 5 is estimated to be 1.

One concern with partial observability and the Poirier model is that there is

a loss of efficiency relative to the full information estimation18. Unfortunately

we cannot compare the partial information to the full information estimates, but

for most variables the standard errors are very similar to those for the univariate

probit, suggesting that augmenting the first stage to control for non-reporting

does not lead to a large efficiency loss.

An exception to this is for the Island, Colonial and Currency Union variables

in the Positive Trade probit, for which standard errors cannot be computed. In

18See Meng and Schmidt (1985).

22

[1] [2] [3]

Univariate Bivariate ProbitProbit Positive Trade Reporting

log(Distance) -0.582*** -1.296*** -0.430***(0.0356) (0.0723) (0.0540)

Border -0.378*** 0.168 0.0893(0.133) (0.376) (0.226)

Island 0.314** 40.38 -0.0725(0.150) (∞) (0.178)

Landlock 0.105 -0.0577 0.789***(0.132) (0.180) (0.233)

Language 0.416*** 1.208*** -0.601***(0.0632) (0.103) (0.103)

Colonial -0.0856 50.12 5.582(0.292) (∞) (530.3)

Legal 0.149*** 0.0757 0.583***(0.0440) (0.0678) (0.0745)

Religion 0.390*** 0.202 0.578***(0.102) (0.147) (0.163)

CU 0.844*** 12.70 0.726(0.230) (∞) (0.445)

FTA 1.819*** -0.668 8.447(0.533) (0.841) (75246)

Reg: Cost -0.403*** -0.127 -0.337***(0.0857) (0.161) (0.130)

Reg: Days -0.0939 0.106 -0.546***(0.0762) (0.109) (0.132)

Neither Reports -∞

ρ 1(0)

Observations 13340 13340 13340Standard errors in parentheses

*** p<0.01, ** p<0.05, * p<0.1


Table 4: Zero-trade and Reporting Probits: Regulation Sample

23

a univariate probit, a dummy variable that perfectly predicts the dependent

variable is estimated to have an infinite coefficient, and the probit is estimated

after dropping those observations. In the bivariate case, it is possible for a

variable to be a perfect predictor of one of the underlying probits, but not the

other. In this case, the coefficient cannot be precisely estimated for the probit

for which it is a perfect predictor, but the variable will not perfectly predict

the imperfectly observed dependent variable of the bivariate probit, because

the dependent variable may not be observed due to the second equation19. It

would be inappropriate to drop these observations from the bivariate probit,

as firstly we do not know which variables will be perfect predictors in one of

the underlying equations, and the variables should be included in the second

equation.

Table 5 reports the estimates for the parametric specification of the gravity

equation given in equation (1), based on a Pareto distribution of productivity.

Column [1] is based on the estimate of ˆz∗ij derived from column [1] of Table 4,

following the standard HMR procedure, and excluding the regulation variables

from the second stage in order to identify the model. Comparison of column

[1] of Table 5 and column [2] of Table 3 broadly replicates HMR’s finding that

the absolute magnitude of most trade barriers is smaller in the bias-corrected

estimates than the traditional OLS gravity equation20. This motivates their

conclusion that heterogeneity-bias inflates standard OLS estimates.

The standard errors for all the second stage regressions given are somewhat

impressionistic, as they have not been corrected to take into account the gen-

erated regressors from the first-stage. The coefficient given on ˆz∗ij is actually

for ∆ = log(δ)21. This implies an estimate of δ of 0.2526. This is somewhat

19For example, all colonial powers might trade with their former colonies, making the colo-nial dummy a perfect predictor in the positive trade probit. However, if they do not also allreport their trade, some colonial country-pairs will not have positive trade flows recorded, andthe colonial dummy will not perfectly predict the overall dependent variable.

20This is true for distance, island, landlock, legal, language, currency union, and religion,but not for border, colonial or FTA.

21δ must be positive, and to impose this constraint it is convenient to replace it with the

24

[1] [2] [3]

HMR TB - Reg excl TB - Reg incl

ln(Distance) -0.994*** -0.712*** -0.700***(0.1029) (0.0692) (0.0713)

Border 0.430*** 0.174 0.162(0.1626) (0.1440) (0.1442)

Island 0.540*** -13.618*** -14.363***(0.1707) (2.0205) (2.1286)

Landlock 0.030 0.099 0.106(0.1994) (0.1956) (0.1952)

Legal 0.301*** 0.377*** 0.385***(0.0646) (0.0609) (0.0610)

Language 0.144 -0.203** -0.224**(0.1065) (0.0954) (0.0974)

Colonial 0.610*** -16.867 -17.796(0.1193) (-16.8675) (-17.7961)

Currency union 1.051** -3.608*** -3.780***(0.4842) (0.7280) (0.7516)

FTA -0.747* 0.357** 0.381***(0.3934) (0.1461) (0.1467)

Religion 0.489*** 0.475*** 0.480***(0.1445) (0.1295) (0.1296)

Reg Cost -0.101(0.0974)

Reg Days -0.284**(0.1129)

η∗ -0.083(0.1920)

H∗Zij

-3.106*** -0.063(0.5354) (0.0978)

H∗Rij

0.268 -0.005(0.4254) (0.0041)

z∗ -1.376 -1.060*** -1.008***(1.0404) (0.1470) (0.1467)

Observations 7327 7327 7327*** p<0.01, ** p<0.05, * p<0.1



Table 5: Intensive Trade Margin: Regulation Sample

25

lower than HMR’s original estimate of 0.84. I conjecture that one reason for

this difference is that I do not follow their practice of censoring ˆz∗ij above 5.199,

which has the effect of increasing their estimate of δ22.

Column [2] of Table 5 reestimates equation (7), maintaining the regulation

variables as excluded from the second stage, but using the estimate of ˆz∗ij de-

rived from column [2] of Table 4 and the dual sample selection correction terms

H∗Zij

and H∗Rij

. The variables Island, Colonial and Currency Union whose co-

efficients in the bivariate positive trade probit were estimated very imprecisely

suffer a large loss of efficiency in the modified procedure, presumably reflect-

ing the imprecision of the first stage. The opposite seems to be true for the

other variables, whose standard errors diminish somewhat. The results support

HMR’s finding of a significant productivity-heterogeneity bias, as the coeffi-

cients in Table 5 generally have a smaller absolute magnitude than the OLS

benchmarks.

There is an interesting difference in the coefficient on membership of an FTA,

which is negative in HMR’s specification, but quite economically and statisti-

cally significant using the modified procedure. A positive coefficient seems more

economically intuitive. Column [3] of Table 4 shows that countries sharing an

FTA are much more likely to report their trade, which is also quite intuitive,

since most FTAs have strict rules of origin clauses which necessitate careful doc-

umentation of intra-FTA trade. Distinguishing this effect of higher reporting

quality from the influence on the extensive trade margin seems to also make a

significant difference to the intensive margin estimates.

Column [3] repeats the estimation of column [2] but includes measures of

regulation in the second stage. As discussed above, the second-stage is still

identified even without an exclusion restriction, and there is no loss of precision

unconstrained parameter ∆ = log(δ) and estimate ln(ee∆(z∗ij+ˆη∗ij) − 1). To recover δ, the

coefficient in ln(eδ(z∗ij+ˆη∗ij)−1), ∆ should be exponentiated. The delta method could be used

to derive a standard error for δ from that of ∆.22This would affect 396 observations in this sample.

26

in the standard errors from relaxing the exclusion restriction. The point esti-

mates are also very similar to those of column [2], which is encouraging. Column

[3] gives mixed support for the validity of HMR’s exclusion restriction, as one

of the regulation variables is found to be statistically significant in the intensive

margin, although the other is not. This suggests that Reg Cost can be validly

excluded, but Reg Days should not be used for identification.

Table 6 repeats estimation of the first-stage probits on the full sample. The

results are broadly similar to those on the Regulation sample, but the greater

variation in the dataset means that none of the trade barriers appear to be per-

fect predictors in either of the underlying probits, so that all are estimated with

relatively tight standard errors. There appears to be very little efficiency loss be-

tween the univariate and bivariate specifications, and the univariate coefficients

mostly lie between those of the two bivariate equations.

One noticeable difference between the two samples is that on the full sample

ρ is estimated to be negative, whereas on the regulation sample it was estimated

to be 1. It is hard to have a strong prior as to what the correct sign for the

correlation based on unobserved variables should be. The estimate of 1 lies on

the boundary of the coefficient space, and an interior solution may be more

appealing. Although it is possible that the value could change a lot with the

underlying sample, the difference in these results suggests that the correlation

coefficient may not be very precisely estimated by this procedure.

Table 7 reports estimates from a non-parametric approximation of the het-

erogeneity bias correction term, using a seventh-order polynomial of ˆz∗ij . Columns

[1]-[3] replicate the non-linear estimates of columns [1]-[3] of Table 5. Column [4]

reports the polynomial approximation using the full sample and the first-stage

estimates in columns [2] and [3] of Table 6.

Columns [1] and [2] of Table 7 use the regulation data as an exclusion re-

striction to identify the second stage, while columns [3] and [4] are estimated

without an additional exclusion restriction on fixed versus variable trade costs.

27

[1] [2] [3]

Univariate Bivariate ProbitProbit Positive Trade Reporting

log(Distance) -0.709*** -1.027*** 0.0753(0.0198) (0.0301) (0.0581)

Border -0.505*** 0.754*** 0.139(0.0986) (0.293) (0.211)

Island 0.274*** 0.209*** 0.416**(0.0544) (0.0701) (0.188)

Landlock 0.180 0.0900 0.582(0.111) (0.159) (0.383)

Language 0.344*** 0.747*** -0.876***(0.0371) (0.0520) (0.131)

Colonial -0.366** 0.0760 0.186(0.157) (0.343) (0.266)

Legal 0.107*** 0.0417 0.631***(0.0274) (0.0366) (0.114)

Religion 0.202*** 0.116 0.293(0.0580) (0.0749) (0.183)

CU 0.524*** 0.422* 3.380***(0.153) (0.231) (0.657)

FTA 1.458*** 1.307*** 1.120***(0.168) (0.257) (0.408)

Neither Reports -∞

ρ -0.548(0.103)

Observations 30450 30450 30450*** p<0.01, ** p<0.05, * p<0.1



Table 6: Zero-trade and Reporting Probits, Full Sample

28

[1] [2] [3] [4]HMR TB - Reg excl TB - Reg incl TB - Full

log(Distance) -0.919*** 0.0230 0.119 -3.252***(0.137) (0.197) (0.202) (0.486)

Border 0.572*** 0.0408 0.0107 1.578***(0.175) (0.156) (0.156) (0.378)

Island 0.279 -35.94*** -39.21*** 1.055***(0.190) (6.064) (6.193) (0.128)

Landlock 0.0157 0.186 0.200 0.543***(0.196) (0.196) (0.196) (0.187)

Colonial 0.591*** -44.65*** -48.67*** 1.073***(0.156) (7.530) (7.688) (0.114)

Language 0.120 -0.857*** -0.954*** 1.771***(0.126) (0.194) (0.198) (0.361)

Legal 0.262*** 0.330*** 0.335*** 0.407***(0.0677) (0.0611) (0.0611) (0.0467)

Religion 0.420*** 0.416*** 0.406*** 0.644***(0.153) (0.129) (0.129) (0.104)

CU 0.989** -10.75*** -11.73*** 2.343***(0.405) (1.928) (1.969) (0.290)

FTA 0.689 0.753*** 0.821*** 3.884***(0.505) (0.206) (0.207) (0.630)

Reg: cost -0.0362(0.0995)

Reg: days -0.377***(0.112)

ˆη∗ij 0.130

(0.668)

H∗Rij

0.000186 0.000827 -0.605***

(0.00743) (0.00743) (0.169)

H∗Zij

-0.450*** -0.483*** 3.741***

(0.0963) (0.0977) (0.426)

ˆz∗ij -2.683 1.519*** 1.610*** 1.918**

(4.398) (0.186) (0.191) (0.877)

ˆz∗2ij 3.385 -0.0572*** -0.0583*** -1.110***

(3.204) (0.00718) (0.00730) (0.280)

ˆz∗3ij -1.345 0.00225*** 0.00230*** 0.150**

(1.238) (0.000348) (0.000352) (0.0630)

ˆz∗4ij 0.269 -4.45e-05*** -4.54e-05*** -0.0105

(0.270) (8.07e-06) (8.14e-06) (0.00751)

ˆz∗5ij -0.0293 4.64e-07*** 4.74e-07*** 0.000357

(0.0331) (9.59e-08) (9.66e-08) (0.000484)

ˆz∗6ij 0.00164 -2.44e-09*** -2.49e-09*** -4.22e-06

(0.00212) (5.63e-10) (5.66e-10) (1.59e-05)

ˆz∗7ij -3.69e-05 0*** 0*** -1.60e-08

(5.49e-05) (0) (0) (2.07e-07)

Observations 7327 7327 7327 14503R2 0.695 0.690 0.691 0.719

Standard errors in parentheses*** p<0.01, ** p<0.05, * p<0.1Importer and Exporter dummies

Table 7: Polynomial Approximations of Intensive Margin29

There is very little loss of efficiency from including the regulation data in col-

umn [3] relative to column [2], or from estimating the polynomial equation on

the full sample without an additional exclusion restriction.

For the regulation sample, the variables Island, Colonial and Currency Union

continue to be poorly estimated, as in the non-linear estimation. The higher

order polynomial terms in this sample are very statistically significant, which

suggests that the approximation may be somewhat inaccurate and even more

high-order terms should be included. This may also explain why the coefficients

of some of the trade barriers here (notably distance) are somewhat different from

those in the non-linear specification. The implications for the validity of HMR’s

exclusion restriction are similar to those from the non-linear specification, with

Reg Cost appearing excludable but not Reg Days.

Column [4] presents results for the full sample. The polynomial approxima-

tion appears to be less dependent on higher order terms here, and the coefficients

on all of the trade barriers are relatively tightly estimated. The sample correc-

tion and heterogeneity-bias terms are much more statistically significant using

the modified procedures than under HMR’s original, which suggests that the

modifications are helping to identify these effects more accurately.

5 Conclusions

Distinguishing effects of trade barriers on the intensive and extensive margins of

trade is a growing area of research, and HMR have provided an elegant frame-

work with which to disentangle these effects. However, when bringing their

approach to the data, it is important to recognise the limitations of existing

databases. In particular, insufficient attention has been given so far to distin-

guishing trade flows that are actually zero from those that are unreported. This

issue is likely to be even more urgent for scholars using disaggregated trade data,

for which the likelihood of underreporting is presumably somewhat higher.

30

While this distinction is unimportant for traditional gravity equation estima-

tion, it is potentially very important when using the presence of positive trade

in the aggregate to identify fixed costs that deter entry into export markets, as

suggested by HMR. Classifying unreported trade as zero will tend to misclassify

some positive trade flows as zero, reducing the accuracy of first-stage probits.

The use of generated regressors derived from these estimates may then transfer

this noise into second-stage estimates.

Since the decision to report a flow seems likely to be correlated with its

size, the omission of some observations due to non-reporting is a classic sample

selection problem. Heckman’s correction procedure is extended to the case of

multiple selection criteria, using Poirier’s partially observed bivariate probit

model. A natural exclusion restriction exploiting information on countries that

did not participate in the Comtrade database at all allows us to distinguish the

effects of non-reporting from non-trading.

In addition to correcting for possible bias due to sample selection, this dual

selection model provides a means of identifying HMR’s intensive margin of trade,

without needing an exclusion restriction that distinguishes fixed from variable

trade costs. This is appealing, as the two exclusion restrictions they propose

in their original paper both have drawbacks. This may facilitate application of

HMR’s technique in a broader set of applications, where a suitable alternative

exclusion restriction is unavailable.

The results from the modified procedure are supportive of HMR’s findings

that unobserved and hetereogeneous firm-level productivity is a determinant of

the volume of trade flows, and that failure to control for this may bias estimates

of the intensive margin of trade. Distinguishing a failure to report from a failure

to trade helps to estimate both the intensive and extensive margins of trade with

higher accuracy.

31

A Countries in the Sample

Albania Ecuador Madagascar S.African CUAlgeria Egypt Malawi S.KoreaAngola El Salvador Malaysia Saudi Arabia

Argentina Ethiopia Maldives SenegalAustralia Fiji Mali Sierra LeoneAustria Finland Mauritania Singapore

Bangladesh France Mexico Solomon IslandsBelgium Ghana Mongolia SpainBenin Greece Morocco Sri Lanka

Bhutan Guatemala Mozambique SwedenBolivia Guinea Nepal SwitzerlandBrazil Haiti Netherlands Syria

Bulgaria Honduras New Zealand TanzaniaBurkina Faso Hong Kong Nicaragua Thailand

Burundi Hungary Niger TogoCambodia India Nigeria TunisiaCameroon Indonesia Norway Turkey

Canada Iran Oman UAECAR Ireland Pakistan UgandaChad Israel Panama UKChile Italy Papua New Guinea UruguayChina Jamaica Paraguay USA

Colombia Japan Peru USSRCosta Rica Jordan Philippines Venezuela

Cote d’Ivoire Kenya Poland VietnamCzechoslovakia Kiribati Portugal West Germany

Denmark Kuwait Rep Congo YugoslaviaDominican Rep Laos Romania Zambia

DR Congo Lebanon Rwanda Zimbabwe

Table 8: 116 Countries with Regulation Data

32

Albania Fiji Morocco UgandaAlgeria Finland Mozambique United KingdomAngola France Nepal United States

Antigua & Barbuda Gabon Netherlands UruguayArgentina Gambia New Zealand VenezuelaAustralia Germany, West Nicaragua VietnamAustria Ghana Niger Zambia

Bahamas Greece Nigeria ZimbabweBahrain Grenada Norway Maldives

Bangladesh Guatemala Oman SomaliaBarbados Guinea Pakistan New CaledoniaBelgium Guinea-Bissau Panama French PolynesiaBelize Guyana Papua New Guinea MacaoBenin Haiti Peru Marshall Islands

Bermuda Honduras Philippines MicronesiaBhutan Hong Kong Poland VanuatuBolivia Hungary Portugal ReunionBrazil Iceland Qatar St.Pierre & MiquelonBrunei India Rwanda Guadeloupe

Bulgaria Indonesia Samoa MartiniqueBurkina Faso Iran Saudi Arabia Neth.Antilles

Burundi Iraq Senegal French GuianaCameroon Ireland Seychelles East Germany

Canada Israel Sierra Leone Faeroe IslandsCape Verde Italy Singapore Cayman Islands

CAR Jamaica Solomon Islands CubaChad Japan S. Africa CU N.KoreaChile Jordan Spain MyanmarChina Kenya Sri Lanka Turks & Caicos

Colombia Kiribati St.Kitts & Nevis Western SaharaComoros S.Korea St.Lucia North Yemen

DR Congo Kuwait St.Vincent AfghanistanRep Congo Lao Sudan CambodiaCosta Rica Liberia Suriname Czechoslovakia

Cote d’Ivoire Libya Sweden DjiboutiCyprus Madagascar Switzerland Greenland

Denmark Malawi Syria LebanonDominica Malaysia Thailand Paraguay

Dominican Rep Mali Togo RomaniaEcuador Malta Tonga USSREgypt Mauritania Trinidad & Tobago Tanzania

El Salvador Mauritius Tunisia YugoslaviaEquatorial Guinea Mexico Turkey South Yemen

Ethiopia Mongolia UAE

Table 9: 175 Countries in Full Sample

33

Algeria Egypt Libya Saudi ArabiaArgentina El Salvador Macao SenegalAustralia Ethiopia Madagascar SeychellesAustria Faeroe Is Malawi Singapore

Bahamas Fiji Malaysia Solomon IsBahrain Finland Malta South Korea

Bangladesh France Martinique SpainBarbados French Guiana Mauritius Sri LankaBelgium Greece Mexico St. Pierre & MiquelonBelize Greenland Morocco St.Kitts & NevisBolivia Grenada Nepal St.LuciaBrazil Guadeloupe Netherlands SwedenBrunei Guatemala Netherlands Antilles Switzerland

Cameroon Honduras New Zealand SyriaCanada Hong Kong Nicaragua ThailandChile Hungary Nigeria TogoChina Iceland Norway Tonga

Colombia India Oman Trinidad & TobagoCosta Rica Indonesia Pakistan Tunisia

Cyprus Ireland Panama TurkeyCzechoslovakia Israel Papua New Guinea UAE

Denmark Italy Paraguay UKDjibouti Jamaica Peru UruguayDominica Japan Philippines USA

Dominican Rep Jordan Poland VenezuelaDR Congo Kenya Portugal West Germany

East Germany Kiribati Rep Congo YugoslaviaEcuador Kuwait Reunion Zimbabwe

Table 10: 112 Countries That Report Some Trade in 1986

34

B The Devil is in the Detail

While the most important changes to the data originally used by HMR are to the

trade data, switching from the Feenstra to the UN data, for the reasons outlined

above, there are some ‘quibbles’ with the covariates of the gravity equation in

the original dataset. Since HMR’s data is derived from the widely-used Glick

and Rose dataset, it seems worthwhile to describe the extensive changes made

to the data used here.

B.1 Distance

There are many ways one could try to measure the distance between countries23.

HMR describe their distance variable as being the log distance in km between

country capitals24. Unfortunately it is clear that the data they use does not

correspond to this. Their minimum value, associated with the log(distance)

between the capitals of Qatar and Bahrain, Doha and Manama, is -0.1505518,

which implies a distance of 0.86 km. The true great circle distance between these

cities is 142 km25. The maximum value in the HMR data is 5.660652, between

the capitals of New Zealand and Canada, Wellington and Ottawa, which implies

a distance of 287 km. The true great circle distance between these cities is 14452

km.

The shortest distance in the sample is between the capitals of the Republic

of Congo and the Democratic Republic of Congo, Brazzaville and Kinshasa,

which are divided by the Congo river, and are 8 km apart 26. The next closest

pair of capitals are Damascus, Syria, and Beirut, Lebanon 27. The most distant

23The measure depends both upon the choice of point from which to measure a country’slocation, such as capital city or largest city, and upon the choice of measure, such as greatcircle distance, or distance in degrees. On an ellipsoid the great circle distance gives the mostaccurate measure of physical separation.

24HMR, [13], p.47925I use Vicenty’s algorithm [18] to calculate the great-circle distances. A useful website

for calculating distances between world cities is http://www.infoplease.com/atlas/calculate-distance.html

262.14 in logs2784 km, 4.43 in logs. HMR measure 12 capital-pairs as being closer to each other than

Damascus and Beirut.

35

capital-pair is Taipei, Taiwan, and Asuncion, Paraguay28. The next furthest

pair is Madrid, Spain, and Wellington, New Zealand29.

There is no obvious transformation of HMR’s variable that can align it with

the true great-circle distances, as both the scales and ordering are distorted. The

positive correlation between their measure and the great-circle distances shows

that they are related somehow, but nevertheless their measure is surprisingly

inaccurate as a measure of distance between national capitals.

B.2 Common Border

Sharing a common border is a well-defined concept: sharing a land border.

Table 11 lists the instances in which HMR’s dataset either codes two countries

as sharing a border when they do not, or vice versa.

B.3 Island

There is a little room for interpretation in what constitutes an island, and

Table 12 sets out how I have interpreted this differently from HMR. The Turks

& Caicos are quite straightforwardly islands. Indonesia is a vast archipelago

of over 17500 islands. Papua New Guinea, Ireland, Haiti and the Dominican

Republic all share small landmasses with another neighbour. In my view this

still qualifies them to be islands, but one could certainly make the case that an

island should not share any land boundaries. In this case there would have to

be some reclassifications in the other direction to ensure consistency, such as the

UK, which shares a land boundary with the Republic of Ireland. Hong Kong

is divided between its historic island and the New Territories on the mainland

which border China. Given the central role that the island plays in Hong Kong,

I classified it as an island. Some might argue that as a continent Australia

should not be considered an island; however, in my view from an economic

point of view it seems to qualify (in that it doesn’t share a land border; and its2819894 km, 9.9 in logs2919829 km, 9.89 in logs. HMR measure 449 capital-pairs as being further apart than

Madrid and Wellington

36

HMR codes border; TB does not HMR does not code border; TB doesEgypt Jordan Tanzania RwandaEgypt Saudi Arabia Tanzania Uganda

Bahrain Qatar Tanzania DRCBahrain Saudi Arabia Tanzania KenyaLebanon Turkey Tanzania Malawi

UAE Qatar Tanzania MozambiqueTrinidad Venezuela Tanzania Burundi

El Salvador Nicaragua El Salvador HondurasMalaysia Singapore Malaysia BruneiSweden Denmark Malaysia Indonesia

Czechoslovakia Norway Czechoslovakia AustriaHungary AustriaHungary Yugoslavia

USSR NorwayUSSR IranAngola CongoRwanda BurundiTanzania ZambiaDjibouti SomaliaColombia Peru

Belize GuatemalaCambodia LaosCambodia ThailandCambodia Vietnam

China Hong KongChina India

North Korea South Korea

Table 11: Differences in coding of the common border dummy

37

HMR codes island; TB does not HMR does not code island; TB doesTurks & Caicos

Papua New GuineaIreland

IndonesiaHong Kong

HaitiDominican Republic

Australia

Table 12: Differences in coding of the island dummy

geographic size is not reflected in a proportionally large population, so that one

would not expect it to be unusually autarchic).

In their paper HMR describe their island variable as being one if both coun-

tries are an island, and zero otherwise30. However, the variable that they ac-

tually use in their empirical work is one if at least one country is an island,

and zero if neither are. While this may be of limited importance in interpreting

the coefficient, it does mean that a change in classification of a country from

not-island to island affects quite a lot of observations (the number of countries

less the number already coded as islands).

B.4 Landlock

Table 13 reports differences in the landlock dummy. Although modern Ethiopia

is a land-locked country, this is only since the secession of the province of Er-

itrea in 1993. Since all the data precedes this period and applies to the united

Ethiopia, it should be classified as not landlocked.

Similarly to the island variable, in their paper HMR report that landlock

is 1 if both countries are landlocked and zero otherwise. However in their em-

pirical work it is 1 if either country is landlocked, and zero if neither are. For

consistency I follow this formulation also.

30HMR [13], p.480

38

HMR codes landlocked; TB does not HMR does not code landlocked; TB doesSyria Rwanda

Ethiopia

Table 13: Differences in coding of the landlock dummy

B.5 Common Legal System

HMR’s data is based on the dataset of La Porta et al [14], which in turn is

derived from the work of Flores and Reynolds [9], and there do not appear to

be any inconsistencies in this variable.

B.6 Common Language

There is a lot of scope for interpretation as to whether two countries share a

common language, especially in the absence of reliable international data on

what percentage of the population speak particular languages.

HMR is misleading on the construction of this variable, which they suggest

is one if both countries share the same primary language31, as indicated by the

CIA World Factbook. However, in constructing the variable they designated a

common language in many cases when only a small minority of people in either

country could possibly share a common language. In part this is because the

CIA World Factbook is quite inconsistent in its description of languages across

countries32, and does not generally distinguish between primary and secondary

languages. On this basis, English would form a global lingua franca and the

dummy would be one for all countries and would not correspond to a measure

of the ability of citizens of different countries to communicate.

The dummy is reconstructed based on countries’ official languages only. This

leads to so many changes that it is not possible to present them concisely in a

table. 1694 country-pairs are reclassified as not sharing a language33. There are

31HMR [13] p.47832Compare Libya: “Arabic, Italian and English, all are widely understood in the major

cities”; Argentina: “Spanish (official), Italian, English, German, French”; USA: “English82.1%, Spanish 10.7%, other Indo-European 3.8%, Asian and Pacific island 2.7%, other 0.7%(2000 census)”; Greece: “Greek 99% (official), other 1% (includes English and French)”

33Representative examples are Greece-Chad, Laos-Sierra Leone, and Argentina-Syria

39

also 584 country-pairs reclassified as sharing an official language34.

B.7 Colonial Heritage

Defining a colonial relationship is not straightforward, and coding a dummy

should also reflect how long the relationship lasted, and how long ago it ended.

In previous work35 I built up a dataset of annual colonial relationships since

1500, where I interpreted a colonial relationship as administration of one (or a

major part of one) country by another, often accompanied by settlement or mil-

itary occupation. Discounting and summing up colony-years gives a measure of

the extent of the colonial legacy today. I applied a cut-off to the data to exclude

colonial relationships that were very brief/a long time ago. Italy-Ethiopia and

Iraq-UK were not classed as colonial because they both fell below the cut-off of

10 discounted colony-years36 Table 15 gives the other country-pairs classed as

colonies by HMR that on balance I didn’t count as colonies.

Table 14 lists the colonial relationships that I felt HMR had omitted. Even

though the Ottoman empire ended in 1918, the length of its rule over its do-

minions has left a strong colonial heritage into the modern era. The USSR’s

relationships with its satellites in Eastern Europe has many of the hallmarks of

traditional colonialism and is included. Some of the smaller European colonial

powers were also left out by HMR, such as Belgium’s African colonies.

B.8 Currency Unions

Countries can share a currency because they jointly choose to adopt the same

currency, because one country unilaterally adopts the currency of another, like

Liberia and the US dollar, or because two countries have both adopted the

currency of a third, like Liberia and Bermuda, which both use the US dollar.

34For example Congo-Central African Republic (French), UK-India (English), Austria-Germany (German)

35Baranga, “Identifying Relationships Between Income and Faith” [3]36Ethiopia was invaded by Italy in 1936 and occupied by them until January 1941; Iraq

was a British mandate between 1918 and 1932. After discounting this came in just under thethreshold.

40

HMR does not code colonial; TB doesEquatorial Guinea Portugal Algeria Turkey

Ghana Portugal Libya TurkeyKenya Portugal Tunisia Turkey

Tanzania Portugal Egypt TurkeyOman Portugal Sudan Turkey

Sri Lanka Portugal Israel TurkeyIndonesia Portugal Cyprus TurkeyMalaysia Portugal Iraq TurkeyGhana Denmark Jordan TurkeyNorway Denmark Kuwait Turkey

South Africa Netherlands Lebanon TurkeyGhana Netherlands Qatar Turkey

Mauritius Netherlands Saudi Arabia TurkeyGuyana Netherlands Syria Turkey

Sri Lanka Netherlands Yemen TurkeyMalaysia Netherlands Hungary TurkeyMaldives Netherlands Yugoslavia TurkeyLiberia USA Western Sahara Spain

Dominican Rep USA USA SpainHaiti USA Uruguay Spain

Philippines USA Nicaragua SpainVietnam USA Haiti Spain

USA France Jamaica SpainMauritius France Philippines SpainSeychelles France Belgium SpainCanada France Netherlands Antilles SpainSt.Kitts France Trinidad Spain

Germany France Italy SpainSouth Korea Japan Cameroon GermanyNorth Korea Japan Rwanda Germany

Taiwan Japan Togo GermanyBahrain Iran Tanzania Germany

Hong Kong China PNG GermanyBelgium Austria Cameroon UK

Italy Austria Tanzania UKCzechoslovakia Austria UAE UK

Hungary Austria Bangladesh UKGermany USSR Burundi BelgiumFinland USSR DRC BelgiumBulgaria USSR Rwanda Belgium

Czechoslovakia USSR Finland SwedenHungary USSR Norway SwedenPoland USSR Romania USSR

Table 14: Differences in coding of the colonial dummy

41

HMR codes colonial; TB does notEthiopia Italy

Iraq UKGermany UKNicaragua Columbia

Bangladesh PakistanBhutan India

Table 15: Differences in coding of the colonial dummy

These should be equivalent for the effects of sharing a currency on bilateral

trade flows, as they have the same effect on bilateral exchange rate volatility.

Including only center-periphery members as sharing a currency and excluding

the periphery-periphery pairs is likely to bias up estimates of the effects of

sharing a currency, as trade is likely to be higher (for other reasons) between

country pairs that have selected into a direct currency sharing arrangement.

Table 16 lists the country pairs sharing a single currency omitted by HMR.

Western Sahara has used the Moroccan dirham since 1976 and the Turks and

Caicos have used the US dollar since 196937.

Liberia, Bermuda, the Bahamas, the Turks and Caicos and Panama were all

using the US dollar during the 1980s, and shared a currency with each other as

well as with the United States. Guadeloupe, Reunion and French Guiana had

a similar relationship because of their mutual use of the French Franc.

Table 17 lists the country-pairs coded as sharing a currency by HMR which I

have excluded. The country-pairs in Table 17 have mostly been members of the

CFA Franc at some point. Under the umbrella of the CFA Franc are actually

several separate currencies in a fixed exchange rate arrangement, of which the

two main groups are the West African CFA Franc38 and the Central African

CFA Franc39. Both these CFA Francs have occasionally devalued against the

Franc, but they have done so simultaneously, and they have maintained their

fixed rate since inception in 1946, so it is reasonable to treat the two groups as

37See https://www.globalfinancialdata.com/gh/GHC Histories.xls38Members in 1980 were Benin, Burkina Faso, Cote d’Ivoire, Niger, Senegal and Togo39Members in 1980 were Cameroon, Central African Republic, Chad, Congo and Gabon.

42

HMR does not code CU; TB does YearsWestern Sahara Morocco 1980-89

USA Turks & Caicos 1980-89Liberia Bermuda 1980-89Liberia Bahamas 1980-89Liberia Turks & Caicos 1980-89Liberia Panama 1980-89

Bahamas Bermuda 1980-89Turks & Caicos Bermuda 1980-89

Panama Bermuda 1980-89Bahamas Turks & Caicos 1980-89Bahamas Panama 1980-89

Turks & Caicos Panama 1980-89Guadeloupe Reunion 1980-89

Reunion French Guiana 1980-89Guadeloupe French Guiana 1980-89

Table 16: Differences in coding of the currency union dummy

forming one currency area.

The Comorian Franc has also been fixed against the French Franc since its

independence from Madagascar, but it has not always coordinated its devalu-

ations with other CFA members40. This suggests that the Comorian Franc is

more appropriately thought of as having a fixed exchange rate against the CFA

Franc rather than participating in a currency union with the other members.

Madagascar withdrew from the CFA Franc in 1972; Mali withdrew in 1962

and rejoined in 1984; Equatorial Guinea first joined in 1984; Guinea-Bissau did

not join until 199741. This accounts for the bulk of the disagreements in Table

17.

Both Reinhart and Rogoff and the Global Financial Data database suggest

that during the 1980s the Dominican Republic and Guatemala had indepen-

dent currencies loosely pegged to the US $ but not in a currency union. The

Netherlands Antilles and Suriname also appear to have had independent cur-

rencies. Qatar and the UAE separated their currencies in 1973. South Africa

40For example in 1994 the Comorian Franc devalued by 1/3 against the French Franc whileother CFA members devalued by 1/2.

41See https://www.globalfinancialdata.com/gh/GHC Histories.xls and the appendix ofReinhart and Rogoff [16]

43

and Switzerland have never had a link between their currencies42.

B.9 Free Trade Agreements

Coding FTAs is also far from straightforward, as frequently agreements are

signed but not transparently implemented, and the degree of implementation

might be close to zero. This has been the case for many FTAs between de-

veloping countries, such as LAFTA43, the Andean Pact and Central American

Common Market44, and the many African FTAs45.

HMR ignore nominal developing world FTAs with the exception of CACM.

Following Pomfret, I exclude Honduras from the CACM FTA. Although the

Bahamas joined the Caribbean Community (CARICOM) in 1983, they opted

out of the customs union so I dropped their FTA dummy. The Australia New

Zealand Closer Economic Relations Trade Agreement came into force January

1st 1983, so I dropped the FTA dummy for 1981 and 1982.

The largest group of countries with FTAs added to the sample arise from

recognising EFTA as a fully functional FTA, and its bilateral FTA with the EC

following the accession of several of its members to the EC in 197346. These

country-pairs are listed in Table 19.

In addition there are several country pairs belonging to FTAs that HMR

recognise, but which have been omitted in error, sometimes for a few years

42This may be the result of a confusion between Switzerland and Swaziland, which is in acurrency union with South Africa, but is not in the sample.

43‘LAFTA was a far looser arrangement than the EEC. By 1967 disenchantment with itsprogress led to fragmentation as five Andean nations began negotiations, which culminated inthe 1969 Cartagena Agreement creating the Andean Pact In 1980 LAFTA was replaced bythe more “flexible” (i.e. with little binding content) Latin American Integration Association,’Pomfret [15], pp.99-100.

44‘The more homogeneous Andean Pact was initially more dynamic [than LAFTA], but inlate 1975 a crisis arose over the external tariff, sector plan, and election of new Pact authorities,after which the integration process slowed down. The Central American Common Marketfunctioned reasonably well for most of its first decade, but in 1968 Nicaragua introduceddiscriminatory measures against the other CACM members and in 1970 Honduras seceded,’Pomfret [15], p.100.

45‘Despite their profusion, the African integration schemes have had little impact on worldtrade... the schemes themselves have been ravaged by distributional conflicts as new nationshave been jealous of their policy-making autonomy,’ Pomfret [15], p.102.

46‘In July 1972 the EC signed bilateral free trade agreements in non-agricultural goods withthe remaining EFTA countries and Iceland,’ Pomfret [15] p.92; ‘For trade policy purposes,the economic integration of Western Europe was essentially complete,’ Pomfret [15] p.93.

44

HMR codes CU; TB does not Years HMR codes CU; TB does not YearsComoros Benin 1980-89 Mali Benin 1980-84Comoros Burkina Faso 1980-89 Mali Burkina Faso 1980-84Comoros Cameroon 1980-89 Mali Cameroon 1980-84Comoros CAR 1980-89 Mali CAR 1980-84Comoros Chad 1980-89 Mali Chad 1980-84Comoros Congo 1980-89 Mali Congo 1980-84Comoros Cote d’Ivoire 1980-89 Mali Cote d’Ivoire 1980-84Comoros Eq. Guinea 1980-89 Mali Eq. Guinea 1980-84Comoros Gabon 1980-89 Mali Gabon 1980-84Comoros Mali 1980-89 Mali Guinea-Bissau 1980-84Comoros Madagascar 1980-81 Mali Madagascar 1980-81Comoros Niger 1980-89 Mali Niger 1980-84Comoros Senegal 1980-89 Mali Senegal 1980-84Comoros Togo 1980-89 Mali Togo 1980-84

Eq. Guinea Benin 1980-84 Guinea-Bissau Benin 1980-89Eq. Guinea Burkina Faso 1980-84 Guinea-Bissau Burkina Faso 1980-89Eq. Guinea Cameroon 1980-84 Guinea-Bissau Cameroon 1980-89Eq. Guinea CAR 1980-84 Guinea-Bissau CAR 1980-89Eq. Guinea Chad 1980-84 Guinea-Bissau Chad 1980-89Eq. Guinea Congo 1980-84 Guinea-Bissau Congo 1980-89Eq. Guinea Cote d’Ivoire 1980-84 Guinea-Bissau Cote d’Ivoire 1980-89Eq. Guinea Gabon 1980-84 Guinea-Bissau Eq. Guinea 1980-89Eq. Guinea Mali 1980-84 Guinea-Bissau Gabon 1980-89Eq. Guinea Niger 1980-84 Guinea-Bissau Niger 1980-89Eq. Guinea Senegal 1980-84 Guinea-Bissau Senegal 1980-89Eq. Guinea Togo 1980-84 Guinea-Bissau Togo 1980-89Madagascar Benin 1980-81 USA Dominican Rep 1980-84Madagascar Burkina Faso 1980-81 USA Guatemala 1980-85Madagascar Cameroon 1980-81 Qatar UAE 1980-89Madagascar CAR 1980-81 Neth. Antilles Suriname 1980-89Madagascar Chad 1980-81 South Africa Switzerland 1980-89Madagascar Congo 1980-81Madagascar Cote d’Ivoire 1980-81Madagascar Gabon 1980-81Madagascar Niger 1980-81Madagascar Senegal 1980-81Madagascar Togo 1980-81

Table 17: Differences in coding of the currency union dummy

45

and sometimes for the full sample. Greenland joined the EC with Denmark in

1973 and withdrew in 198547. Costa Rica and El Salvador remained members

of CACM through the 1980s. Jamaica, Guyana and St.Kitts were members of

CARICOM. The other entries are country-pairs that were members of the EC

but were not coded by HMR as being in an FTA.

B.10 Religion

HMR report that their measure of religious proximity is constructed as (%

Protestants in country i· % Protestants in country j) + (% Catholics in country

i· % Catholics in country j) + (% Muslims in country i· % Muslims in country

j), with the data taken from the CIA World Factbook48. This measure has the

limitation that countries for which these religions do not constitute a substan-

tial share will automatically be classified as religiously unrelated to all other

countries. This is misleading as there are significant concentrations of other

religious groups across country borders49.

More seriously, HMR’s actual data is not consistent with their stated mea-

sure. Clearly the value for a particular country pair should be the same whichever

country is the importer or exporter. Unfortunately this is not the case in their

data, suggesting that something has gone wrong at their data compilation stage.

HMR use the low degree of correlation between religion and trade flows in

their regressions as justification for excluding the religion variable from their

gravity equation50. The scrambling of the data provides an alternative expla-

nation for the low correlation, and casts doubt on their claim that religious

proximity satisfies the exclusion restriction and can be omitted from the sec-

ond stage of their estimation, and hence on the estimates derived under that

assumption.

I reconstruct the data using Barrett et al. [4] which contains comprehensive

47http://europa.eu/abc/history/1980-1989/1985/index en.htm48HMR [13] pp.478,48049e.g. Orthodox Christians in Eastern Europe, or Buddhists in Asia50HMR [13] p.466

46

HMR codes FTA; TB does not Years HMR does not code FTA; TB does YearsCosta Rica Honduras 1980-89 Belgium Greenland 1980-85El Salvador Honduras 1980-89 Denmark Greenland 1980-85Guatemala Honduras 1980-89 France Greenland 1980-85Nicaragua Honduras 1980-89 Germany Greenland 1980-85Bahamas Barbados 1983-89 Greece Greenland 1981-85Bahamas Jamaica 1983-89 Ireland Greenland 1980-85Bahamas St.Kitts 1984-89 Italy Greenland 1980-85

New Zealand Australia 1981-82 Netherlands Greenland 1980-85UK Greenland 1980-85

Denmark Belgium 1980-89France Belgium 1980-89

Germany Belgium 1980-89Greece Belgium 1981-89Ireland Belgium 1980-89Italy Belgium 1980-89

Netherlands Belgium 1980-89Portugal Belgium 1986-89

Spain Belgium 1986-89UK Belgium 1980-89

France Denmark 1980-89Portugal Denmark 1986-89

Spain Denmark 1986-89France Ireland 1980-89France UK 1980-81France UK 1983-89

Germany Greece 1981-89Germany UK 1980-89Greece Italy 1981-84Greece Italy 1986-89Greece Netherlands 1981-89Italy Ireland 1980-89Italy Spain 1986-88UK Spain 1986-89

Costa Rica El Salvador 1980-89Jamaica St. Kitts 1980-88Guyana St. Kitts 1980-89

Table 18: Differences in coding of the FTA dummy: non-EFTA

47

HMR does not code FTA; TB does Years HMR does not code FTA; TB does YearsItaly Portugal 1980-85 Portugal Greenland 1980-85Italy Austria 1980-89 Austria Greenland 1980-85Italy Finland 1986-89 Iceland Greenland 1980-85Italy Iceland 1980-89 Norway Greenland 1980-85Italy Norway 1980-89 Switzerland Greenland 1980-85Italy Sweden 1980-89 Portugal Belgium 1980-85Italy Switzerland 1980-89 Austria Belgium 1980-89

Portugal Netherlands 1980-85 Finland Belgium 1986-89Austria Netherlands 1980-89 Iceland Belgium 1980-89Finland Netherlands 1986-89 Norway Belgium 1980-89Iceland Netherlands 1980-89 Sweden Belgium 1980-89Norway Netherlands 1980-89 Switzerland Belgium 1980-89Sweden Netherlands 1980-89 Portugal Denmark 1980-85

Switzerland Netherlands 1980-89 Austria Denmark 1980-89UK Portugal 1980-85 Finland Denmark 1986-89

Austria Portugal 1980-89 Iceland Denmark 1980-89Finland Portugal 1986-89 Norway Denmark 1980-89Iceland Portugal 1980-89 Sweden Denmark 1980-89Norway Portugal 1980-89 Switzerland Denmark 1980-89Sweden Portugal 1980-89 France Portugal 1980-85

Switzerland Portugal 1980-89 France Austria 1980-89Austria Spain 1986-89 France Finland 1986-89Finland Spain 1986-89 France Iceland 1980-89Iceland Spain 1986-89 France Norway 1980-89Norway Spain 1986-89 France Sweden 1980-89Sweden Spain 1986-89 France Switzerland 1980-89

Switzerland Spain 1986-89 Germany Portugal 1980-85UK Austria 1980-89 Germany Austria 1980-89UK Finland 1986-89 Germany Finland 1986-89UK Iceland 1980-89 Germany Iceland 1980-89UK Norway 1980-89 Germany Norway 1980-89UK Sweden 1980-89 Germany Sweden 1980-89UK Switzerland 1980-89 Germany Switzerland 1980-89

Austria Finland 1986-89 Greece Portugal 1981-85Austria Iceland 1980-89 Greece Austria 1981-89Austria Norway 1980-89 Greece Finland 1986-89Austria Sweden 1980-89 Greece Iceland 1981-89Austria Switzerland 1980-89 Greece Norway 1981-89Iceland Finland 1986-89 Greece Sweden 1981-89Norway Finland 1986-89 Greece Switzerland 1981-89Sweden Finland 1986-89 Portugal Ireland 1980-85

Switzerland Finland 1986-89 Austria Ireland 1980-89Iceland Norway 1980-89 Finland Ireland 1986-89Iceland Sweden 1980-89 Iceland Ireland 1980-89Iceland Switzerland 1980-89 Norway Ireland 1980-89Sweden Norway 1980-89 Sweden Ireland 1980-89

Switzerland Norway 1980-89 Switzerland Ireland 1980-89Switzerland Sweden 1980-89

Table 19: Differences in coding of the FTA dummy: EFTA

48

data on world religious adherence, using the data for 199051 and including a

broad range of 20 religious groups instead of just 3.

B.11 Regulation

HMR use the dataset of Djankov et al [5], and there are no anomalies to report

in the dataset. However, HMR misreport which countries have regulation data

available in footnote 2952. Kiribati, the Maldives and Zaire are included in

the list although they are in the Djankov et al dataset, while the Netherlands

Antilles are not included in the list although the regulation data for them is

not available. In HMR’s regressions including regulation data they omit the

observations of country-pairs including the Maldives, Netherlands Antilles and

Zaire, but include those of Kiribati.

References

[1] J.Anderson and E.van Wincoop, “Gravity with Gravitas: A Solution to

the Border Puzzle,”, American Economic Review, Mar 2003, v.93 no.1,

pp.170-192

[2] J.Baffoe-Bonnie, “Black-White Wage Differentials in a Multiple Sample

Selection Bias Model,” Atlantic Economic Journal, March 2009, v.37 no.1,

pp.1-16

[3] T.Baranga, “Identifying Relationships Between Income and Faith”, mimeo,

2007

[4] D. Barrett, G. Kurian and T. Johnson, “World Christian Encyclopedia,

2nd edition”, 2001, Oxford University Press

[5] S.Djankov, R.La Porta, F.Lopez-de-Silanes and A.Shleifer, “The Regula-

tion of Entry,” Quarterly Journal of Economics, 2002, v.117, pp.1-37

51Supplementing this with data from the 1st edition for Hong Kong52HMR [13], p.461

49

[6] R.Feenstra, R.Lipsey and H.Bowen, “World Trade Flows 1970-1992, with

Production and Tariff Data,” NBER Working Paper 5910, Jan 1997

[7] R.Feenstra, “World Trade Flows, 1980-1997,” Center for International

Data, March 2000

[8] R.Feenstra, R.Lipsey, H.Deng, A.Ma and H.Mo, “World Trade Flows, 1962-

2000,” NBER Working Paper 11040, Jan 2005

[9] A.Flores and T.Reynolds, Foreign Law Guide

[10] R.Glick and A.Rose, “Does a Currency Union Affect Trade? The Time

Series Evidence,” European Economic Review, v.106, pp.1125-1151

[11] J.Heckman, “The Common Structure of Statistical Models of Truncation,

Sample Selection, and Limited Dependent Variables and a Simple Esti-

mator for Such Models,” Annals of Economic and Social Measurement 5,

475-492

[12] J.Heckman, “ Sample Selection Bias as a Specification Error,” Economet-

rica 47, 153-161

[13] E.Helpman, M.Melitz and Y.Rubinstein, “Estimating Trade Flows: Trad-

ing Partners and Trading Volumes,” Quarterly Journal of Economics, May

2008, v.123 no.2, 441-487

[14] R.La Porta, F.Lopez-de-Silanes, A.Shleifer and R.Vishny, “The Quality of

Government,” Journal of Law Economics and Organisation, 1999

[15] R.Pomfret, The Economics of Regional Trading Arrangements, 1997

[16] C.Reinhart and K.Rogoff, “The Modern History of Exchange Rate Ar-

rangements: A Reinterpretation”, NBER Working Paper 8963

50

[17] I.Tunali, “A General Structure for Models of Double-Selection and an Ap-

plication to a Joint Migration/Earnings Process with Remigration,” Re-

search in Labor Economics, v.8, 1986, part B, pp.235-283

[18] T.Vicenty, “Direct and Inverse Solutions of Geodesics on the Ellipsoid with

Application of Nested Equations”, Survey Review, vol. 23, no. 176, April

1975, pp 88-93

51

Documents

Unreported Trade Flows and Gravity Equation Estimation...Unreported Trade Flows and Gravity Equation Estimation Thomas Baranga May 15, 2009 Abstract Some widely used trade databases