Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Shocking Firm-Level Interactions:
From Micro Intervention to Aggregate Implications*
Wyatt BrooksArizona State University
Kevin DonovanYale University
Terence R. JohnsonUniversity of Virginia
October 2021
Abstract
Facilitating interactions that transmit skills or information can spur large in-creases in firm-level profitability, as evidenced by a growing set of interventionsgenerating inter-firm interaction in developing countries. How informative arethese micro-level returns about the aggregate gains that could be achieved byscaling the same intervention? We study this question in a model that linksfirm-level interactions to aggregate outcomes via diffusion. Two key diffusionchannels differently affect micro and macro elasticities, implying the observedtreatment effect is generally unrelated to aggregate gains at scale. We providea new procedure by which the micro data identifies the parameters governingthese elasticities in aggregate diffusion models and disciplines their relative im-portance. The aggregate gains are primarily related to treatment effect hetero-geneity, as it closely corresponds to the main channels in the aggregate model.The results provide guidance on moving from micro to aggregate implications:quantitatively, the bias from not utilizing this moment can be substantial.
JEL Classification Codes: O11, O33, O40
Keywords: diffusion, general equilibrium, ability
*This paper has benefited from comments and discussions by participants at the NBER Growth Meeting, the Societyfor Economic Dynamics, Y-RISE, and various seminars, especially those from Daron Acemoglu, Lorenzo Caliendo, TeresaFort, Mushfiq Mobarak, Melanie Morten, Ezra Oberfield, Jesse Perla, Tommaso Porzio, Peter Schott, Kevin Song, Mered-ith Startz, Adam Szeidl, Duncan Thomas, Mike Waugh, and Daniel Xu. Brian Mukhaya provided outstanding researchassistance. Contact Information: Brooks, [email protected]; Donovan, [email protected]; Johnson, [email protected]
1 Introduction
A substantial amount of economic know-how is embodied in other agents. The prof-
itability of a firm, for example, depends in part on skills, management practices, and
other innovations gleaned from, or originally discovered by, other firms. However, fric-
tions limiting the inter-firm interactions necessary to extract this information may limit
firm growth in developing countries. A growing body of evidence supports this view.
Recent interventions facilitating these interactions find large changes in firm-level profit,
technology adoption, and management practices.1
These promising micro-level effects put this class of policies at the center of a funda-
mental question of economic growth: whether we can develop at-scale policy that combats
the low managerial skill (Bloom and Van Reenen, 2007; McKenzie and Woodruff, 2017)
and technology adoption (Comin and Hobijn, 2010) in developing countries, both of
which contribute to the substantial productivity and income gaps between rich and poor
countries (Klenow and Rodrıguez-Clare, 1997; Prescott, 1998). Indeed, more than US$1
billion per year is spent attempting to correct skill deficiencies and the corresponding low
profit that plague entrepreneurs in developing countries (Blattman and Ralston, 2017).
Unfortunately, we have little information about what the empirical evidence derived
from these interventions teaches us about the gains that could be achieved at scale. Does
a large treatment effect from an intervention that encourages inter-firm interaction imply
large gains from scaling it? And if not, how do we use the evidence to inform our ultimate
goal of inferring at-scale implications? We study these questions here, joining a small
body of recent work studying the potential differences between micro and macro effects
of various policies (e.g., Berguist et al., 2019; Greenwood et al., 2019; Buera et al., 2021b;
Fried and Lagakos, 2021; Fujimoto et al., 2021). We ask how the micro-level elasticities of
better matching relate to the aggregate-level elasticities that are relevant when scaling the
same intervention. We use the results to investigate the proper interpretation of reduced-
form results from this class of interventions when attempting to inform at-scale policy, and
whether extrapolating directly from treatment effects generates a quantitatively relevant
bias at scale.
We answer these questions in three steps. First, within a broad class of aggregate
models that take seriously firm-level interactions, we clarify how the average treatment
effect is informed by various economic forces that play different roles at the micro and
1For example, Cai and Szeidl (2018) create randomly-formed business groups in China and Brooks et al. (2018) createrandom one-to-one matches among Kenyan microenterprise owners. Relatedly, Atkin et al. (2017) introduce randomvariation in buyer-seller links in Egypt, Beaman et al. (2020) randomly seed information on new technology in Malawi, andFafchamps and Quinn (2018) introduce smaller-scale firm owners to managers of high-profit firms in Ethiopia, Tanzania,and Zambia during a business plan competition. Broadly, all of these are designed to test the importance of giving firmsaccess to new information embedded in other economic agents.
1
macro levels. Directly extrapolating the smaller-scale intervention results to at-scale
gains is therefore infeasible, except in knife-edge cases. Second, we provide a method by
which the micro data can be used to separately identify the parameters governing these
channels in this same class of models, thus solving the first problem. Finally, we show
quantitatively that the bias induced by not doing so – for example, by directly extrapo-
lating the average treatment effect – can be large. Throughout, our results highlight the
importance of measuring treatment effect heterogeneity when extrapolating to at-scale
implications, as it closely corresponds to important theoretical channels that generate
the at-scale gains.
We study these issues in a general equilibrium framework that takes seriously the
role of individual-level interactions, building off a body of work recently reviewed in
Buera and Lucas (2018). Agents interact and exchange “ideas” (or some related profit-
enhancing notion), with higher-quality interactions generating more profit. This micro-
foundation shares a tight intuition with the interactions induced by the aforementioned
micro literature, and we will think of these interventions as a shock to this process. But
aggregating the impact of these interactions must take seriously that they do not exist in
a vacuum. Today’s interactions affect the distribution of ideas in the economy, which has
implications for future interactions and thus future income. This equilibrium diffusion
process has long been emphasized in the development process and we allow for it here.2
Formalizing it in the model links individual-level interactions with aggregate outcomes.3
We use this framework to study the relationship between the average treatment effect
(ATE) from a particular type of intervention and the gains that could be achieved by
scaling it. Our intervention of interest encompasses those discussed above, in which eco-
nomic agents get access to other firms or firm-embodied ideas that they would otherwise
be unlikely to encounter. The ATE derived from this intervention is governed by two
forces that we focus on throughout this paper. First, it depends on the return to a match:
how much the exogenously-provided match changes a treated firm’s profit. But this is
insufficient, as untreated firms still interact with others via the usual, non-intervention
matching process. Therefore, the ATE also depends on the extensive margin: how likely
that same match would have been without the intervention.4 These two forces affect the
ATE in opposite ways. The first affects the ATE positively, as the benefit of an exogenous
2See Munshi (2007) for a review of diffusion in agricultural technology, health, and education, along with Alatas et al.(2016) and others on recent implications for policy. Romer (1990) highlights the importance of knowledge spillovers forlong-run growth.
3Early work on the topic includes Jovanovic and Rob (1989) and is extended to endogenous growth in Lucas (2009),Lucas and Moll (2014), and Perla and Tonetti (2014). These models have been further extended to include internationaltrade (Sampson, 2016; Buera and Oberfield, 2020; Perla et al., 2021; Van Patten, 2020), innovation policy (Benhabibet al., 2020; Lashkari, 2020), frictional labor markets (Engbom, 2020), and various types of learning among co-workers(Herkenhoff et al., 2018; Jarosch et al., 2020; Wallskog, 2021).
4Said slightly more technically, these represent the two “differences” in the difference-in-difference regression requiredto estimate the treatment effect.
2
high-quality match naturally increases when the return to a match is higher. The second
affects the ATE negatively. If finding good matches is already easy, exogenously offering
one provides little value to a treated firm.
The divergent implications of these two channels pose a hurdle in moving from mi-
cro to at-scale gains, in the form of an under-identification problem. Specifically, any
observed ATE can be rationalized by a continuum of economies defined by different diffu-
sion parameters. These economies differ only in the relative importance of the two forces
highlighted above. To the extent these same forces play different roles at scale, any bias
in the parameter estimates (by, for example, attributing the ATE to the wrong margin)
will bias the implied aggregate gains. This gives us our first, and somewhat negative,
result. There will be a (potentially large) set of aggregate gains for any treatment effect.
The average treatment effect therefore gives us little direct information about the gains
from scaling the same intervention in this class of models. Informing the at-scale gains
requires disciplining the relative importance of these two channels.
We therefore provide a method to separately identify the parameters governing these
two channels. While the difficulty in estimating diffusion parameters in aggregate models
is well known, the intervention’s empirical moments offer a powerful and straightforward
tool to do so. It relies on two simple moments: the average treatment effect and a
measure of heterogeneity in the treatment effect. The first step measures the return to
a match. At its heart is the fact that exogenously engineering random matches creates
observable variation in treatment intensity. Some treated firms meet relatively high
quality matches, others lower quality. We estimate the return to a match by measuring
how a marginally higher quality match affects an otherwise similar firm. Implementing
it requires a regression similar to reduced-form tests of treatment effect heterogeneity in
recent empirical work on this topic (Brooks et al., 2018; Cai and Szeidl, 2018). Once
estimated, the average treatment effect measures the extensive margin. Intuitively, if we
observe a large average effect (conditional on the measured return to a match), we infer
that the intervention matches must be substantially better than those usually afforded
to firms. This two-step procedure disciplines the relative importance of our two main
channels. A useful feature is that it leverages the exclusion restrictions built into the
empirics, and thus holds in a broad class of models without ancillary assumptions on the
shape of distributions, details of entry and exit, and other features usually employed to
estimate or calibrate these parameters.5
5As it contrasts with existing work estimating these parameters in aggregate models, our results can be thought of asshifting the burden of identification toward the data generating process and away from the structural economic environment.One extreme is calibration-style exercises (examples provided in Buera and Lucas, 2018, such as Caicedo et al. 2019). Theyuse relatively aggregated or cross-sectional data, relying on model structure to infer diffusion parameters (though of coursetoward different goals than we have here). We lie on the other extreme. We use the structure provided by the RCT-styleinterventions to limit required assumptions on the economic environment. Between the two is work utilizing more detailed,
3
Finally we study quantitative implications, asking whether the potential theoretical
biases highlighted earlier have any practical relevance. Doing so requires us to close to
the model and study a particular intervention, as quantifying these relationships requires
us to be explicit about the remaining economic environment. We therefore focus on
the context of a randomized controlled trial in Kenya, in which we introduce one-to-one
matches between low- and high-profit firm owners (Brooks et al., 2018). We construct the
remaining details of our model to remain faithful to the economic environment studied,
facilitated by the flexibility of the diffusion parameter identification. We model diffusion
as the knowledge of lower-cost suppliers, as our empirics point to this as a key channel by
which profit increases. We also allow for occupational choice, as our study is primarily
run among small firms for whom exit to wage work is a potential outside option.
Empirically, we find that average profit increases by 19 percent in response to the
intervention. We implement the at-scale counterpart to the RCT by permanently making
it easier for all agents to find high-quality matches. The model predicts that average
income rises by 11 percent. Our observed ATE, however, is actually consistent with a
wide range of aggregate gains. Indeed, when we do not use heterogeneity as an estimating
moment, we can construct a set of economies that all deliver our observed treatment
effect, but whose at-scale gains vary between 0.6 and 38 percent. These economies differ
only in the relative importance of the two channels highlighted above. The wide range of
at-scale gains for a given treatment effect has quantitatively important, policy-relevant
implications. It implies than an 11 percent increase in aggregate income is feasible in
economies delivering nearly any treatment effect. Therefore, the average treatment effect
or its implied cost-effectiveness need not predict the most cost-effective scale-up and,
quantitatively, can be substantially misleading. However, measuring treatment effect
heterogeneity provides a solution by disciplining the relative importance of the two main
channels that deliver the at-scale gains.
The variation in feasible at-scale gains suggests the micro and and macro elasticities
are being driven by different parameters. We show that this is indeed the case, and
provide some rationale for why measuring heterogeneity proves useful here. The result
follows from the fact that our empirical notion of heterogeneity is closely connected to the
main quantitative forces in aggregate diffusion models. In particular, aggregate diffusion
models are primarily governed by the right tail of the equilibrium ability distribution.
Tail weight has a natural relationship to our empirical measure of heterogeneity. Given a
set of matches created by the intervention, this moment measures the marginal increase in
own-ability from a slightly better match. Or, put differently, it tells us how much farther
but non-random, data like matched employer-employee datasets (Herkenhoff et al., 2018; Jarosch et al., 2020). See alsoNakamura and Steinsson (2018) for a broader discussion on this tradeoff in aggregate models.
4
into the tail we can push an agent by slightly increasing her match quality. It is therefore
closely linked with the key channels governing aggregate implications. Measuring it
is critical when attempting to extrapolate to at-scale implications from reduced-form
interventions of the class we study here.
On the other hand, the extensive margin plays a smaller role. If matches involve some
randomness, as has been pointed to empirically (e.g. Banerjee et al., 2020), most firms
will eventually interact with at least one high quality match. This tilts the quantitative
importance of this margin away from long-run aggregates. This result shares an intu-
itive connection to the high value of random seeding in an important class of network
diffusion models (Akbarpour et al., 2020), which we exploit to highlight the importance
of treatment effect heterogeneity in governing aggregate gains.
Motivated by the limited direct role of the ATE we extend our results to another
commonly used reduced-form moment, treatment effect persistence. It highlights one
of more interesting features of our empirical results: three quarters after the treatment,
there is no discernible effect remaining. One view of this result is that the intervention is
ineffective (e.g. McKenzie et al., 2020). Does it follow that the lack of long-term benefit
to RCT participants implies small gains at-scale? We use the model to study whether
this type of model-free inference is warranted and find it misleading in a particularly
severe way: the same force that increases the at-scale gains simultaneously hastens the
fade-out of the ATE. Indeed, our model replicates the short half-life of the ATE in
part because it predicts that diffusion is important at-scale. This reinforces our earlier
results. Drawing aggregate policy conclusions from the lack of persistence in this class
of experiments without considering why we observe this outcome potentially leads to
erroneous inferences.
To further this discussion, we replicate our exercise in the setting of Cai and Szeidl
(2018) in the Appendix. They run an innovative RCT creating randomly-formed busi-
ness groups in China, finding persistent changes in sales, employment, and management
practices. Our model predicts the persistence of their effects. The ability to match
these divergent patterns of persistence follows from the fact that our estimation proce-
dure infers different diffusion parameters from different patterns of heterogeneity within
the treatment group. Again, the result reinforces the importance of understanding the
underlying forces that govern observed empirical patterns matters when attempting to
derive scalable policy.
The remainder of the paper proceeds as follows. Section 2 lays out an economic
environment (or, a class of models) in which we will interpret a data generating process
with particular characteristics, the latter of which formalizes the data generated by a
5
class of interventions. We highlight the main issue that arises and our solution to it. We
then turn to a particular application to study quantitative implications. Section 3 details
the Kenyan RCT we use, then applies our diffusion estimation procedure to it. Section
4 builds those empirical results into an equilibrium diffusion model whose quantitative
implications are detailed in Section 5.
2 Facilitating Inter-Firm Interactions: Measurement and Iden-
tification
We begin by formalizing the type of intervention and a class of models within which we
interpret it. We first lay out the assumptions on the economic environment, then describe
the intervention as a data-generating process derived from this environment. We discuss
extensions of various assumptions made here at the end of this section.
2.1 Assumptions on the Economic Environment
Consider a dynamic economy populated by agents (“firms”) with heterogeneous ability
z. Ability here is some notion of managerial capital in the sense of Bruhn et al. (2010).
It makes the firm more profitable but need not be directly equal to a firm’s productivity
(TFPQ in Hsieh and Klenow, 2009).
Time is discrete and, in each period, every agent receives two shocks. First, they
receive an idiosyncratic imitation shock z. If her own ability z is greater than z, then the
imitation opportunity is useless and it has no effect on her future ability. If z > z, then
the imitation opportunity contains some useful information that the agent incorporates
it into her future ability. Second, each agent receive a random shock ε that enters next
period’s ability multiplicatively. The functional form of the subsequent ability z′ is given
by Assumption 1.6
Assumption 1. Given ability z this period, an imitation opportunity z, and a random
shock ε, ability next period z′ is given by
z′(z, ε, z) = ec+εzρ max
{1,z
z
}β, (2.1)
where the parameter c is a constant growth term, β is diffusion intensity, and ρ is per-
sistence.
This assumption parameterizes how a specified match z translates into an agent’s
future ability. The final term is the additional benefit from the imitation opportunity.6Throughout this paper, we will use ′ to denote values one period in the future.
6
If β = 0, this law of motion collapses to an exogenous AR(1) process, log(z′) = c +
ρ log(z) + ε. On the other hand, β > 0 allows ability to increase when matched with
z > z.7 At its extreme, ρ = β = 1 simplifies the law of motion to z′ = ec+ε max{z, z}.We refer to β as the “intensity” of diffusion and ρ as “persistence” throughout. Our goal
is, in part, to let the data tell us where these two parameters fall.
Given the notion of ability we consider here, we cannot observe it directly. Thus, we
require a link between ability and observable variables. The requirement is summarized
in Assumption 2.
Assumption 2. In any period, there exists some observable characteristic, denoted π,
that is proportional to ability. That is, for any two agents i and j with πi and πj,
πi/πj = zi/zj.
We denote this variable π because profit is an intuitive and commonly used measure
in the literature and will be the moment we focus on in our main empirical application.89
Finally, we specify the assumptions on the source distribution from which z is drawn.
We denote the cumulative density function of z as M(z; z, θ). The source distribution
allows agents with different abilities to draw from different distributions. These distri-
butions depend on the parameter θ, which is assumed to order a class of distributions in
the sense of first order stochastic dominance. This is summarized in Assumption 3.
Assumption 3. The imitation opportunity z is drawn by an agent with ability z from
a distribution characterized by the cumulative density function M(z; z, θ), a known func-
tion. For every z and z, M is continuous in θ and θ1 < θ2 =⇒ M(z; z, θ2) first order
stochastically dominates M(z; z, θ1).
This assumption admits a variety of search and assignment processes that we discuss
further below. Critical for our purposes is that θ shifts the distribution in a way that
increases the average match quality. In light of this, a useful way to think of θ is as
measure of the productivity or quality of the underlying matching technology, and we
will do so to fix intuition in the following results.
7Furthermore, notice that the max operator in the diffusion process rules out any productivity benefit accruing to thehigher ability agent in the match. We show later that this is not critical, but does have some empirical support (Brookset al., 2018; Jarosch et al., 2020). Since our empirical results will eventually require this functional form, we maintain itthroughout.
8Setting profit π = z satisfies this assumption (Lucas, 2009; Perla and Tonetti, 2014). Alternatively, a Cobb-Douglasproduction function y = z1−αnα would satisfy the same assumption in a competitive labor market where z is firm-levelproductivity.
9One in which this assumption can fail is if firms are subject to unobserved idiosyncratic variation, whether they bedistortions or measurement error. This can be included in a straightforward way. We discuss this and other generalizationsof these assumptions in Section 2.5 and detail them in the Appendix.
7
2.2 Intervention as a Data-Generating Process
Section 2.1 laid out a set of assumptions on the economic environment. Assumption 4
summarizes the type of intervention we seek to investigate.
Assumption 4. A set of agents with ability distributed H(z) are observed in two con-
secutive periods. The set of agents is partitioned into two subsets characterized by dis-
tributions HC(z) and HT (z) (i.e., “control” and “treatment”). The following conditions
hold:
1. Agents in HT and HC draw their ε shocks from the same distributions
2. The matches for agents in HC are not observable, and distributed M(z; z, θ)
3. The matches for agents in HT are observable, and distributed HT (z) 6= M(z; z, θ)
4. For any arbitrary partition of the treatment group, characterized by H1T (z) and
H2T (z), agents in both groups draw their ε shocks from the same distribution
The first assumption is the usual exclusion restriction. As required in any attempt to
measure causal effects, it puts restrictions on how the unobserved characteristics vary
between control and treatment. The second formalizes the intuitive notion that we
cannot observe individual-level control group matches, but also that matches continue
to happen in the background of the economy. That is, the control group does not stop
meeting other agents because of the intervention.
Finally, the third and fourth lay out what we require from the treatment. The third
states that we can observe all treatment matches, and those matches are drawn from
some other distribution than the control group. Finally, the last assumption states a
second exclusion restriction within the treatment group, guaranteeing that a comparison
between treated firms is unbiased.
2.2.1 Example
It is useful to highlight a concrete example of these assumptions to both fix ideas and
show how they relate to the literature. We take a deliberately simple version using the
discrete time analog of Lucas (2009), with notation adjusted for continuity. He develops
a model in which income π is equal to ability z. Each agent receives α ∈ N uniformly
random draws from the equilibrium ability distribution Mt(z), and keeps the max of
those draws. It is fully internalized if useful, but otherwise can be disregarded. Defining
z := max{z1, . . . , zα}, ability therefore evolves according to zt+1 = max{zt, zt}.
8
Linking to Assumptions 1 – 3 This model falls under our assumptions in a straightfor-
ward way. Assumption 2 (π ∝ z) follows directly from the assumption π = z. We next
define the source distribution M(z; z, θ) in Assumption 3 as
M(a; z, θ) = Prob(max{z1, . . . , zα} ≤ a)
=α∏i=1
Mt(a)
= Mt(a)1
1−θ
where θ = 1 − 1/α indexes the number of draws. A higher θ implies the firm receives
more matches and therefore increases the likelihood that z will be high. This is a specific
interpretation of the FOSD assumption in Assumption 3.10 Finally, the law of motion is
identical to that in Assumption 1 once we assume c = 0, the exogenous shock distribution
F is degenerate at ε = 0, and ρ = β = 1. This gives z′ = max{z, z}.
Intervention in this Model In a developing country, we may think it difficult for firms to
create these matches thereby keeping income low. Said differently, we may hypothesize
that α (or equivalently θ = 1 − 1/α) is low. A researcher could test this hypothesis by
creating an intervention in which she randomly creates groups of N firms, measuring
how profit changes relative to a control group that continues to match according to M .11
Treated firms therefore draw from an exogenously created source distribution HT (z).
If the true value of θ is indeed low, exogenously offering firms this opportunity will
generate a large treatment effect. The inverse is intuitively true as well. Exogenously
offering the opportunity to meet has little value if it is already easy to do so. This
intuition underlies a natural link between the average treatment effect and θ. Moreover,
Lucas (2009) shows that the arrival rate of ideas is critical for determining the long-run
growth path of the economy. Together, they comprise a clear link from the average
treatment effect to aggregate implications through the diffusion parameter θ.
Note, however, that this link rests critically on having already made an assumption
that determines how a given match z translates into future ability z′. Here, we assumed
ex ante that imitation draws translate one-for-one into own ability (ρ = β = 1). To
the extent that these parameters are unknown, perhaps because we want to allow for the
possibility of diminishing returns in imitation gains (e.g. Van Patten, 2020), the preceding10We provide other source distributions that fall under our assumptions in the Appendix. These include letting θ
index the probability of receiving a draw, the noise in the matching process from either difficulty or uncertainty inlearning (Beaman et al., 2020) or because of complementarity between the productivity of the match and some exogenousinnovation (Buera and Oberfield, 2020), the cost of effort choice or bargaining weight between matched agents, a measureof complementarity in deterministic assortative matching, and the cost for firms to seek out a match (Lucas and Moll,2014). All of these differ in their economic interpretations of θ, but can be characterized by the same broad functionalform of the source distribution.
11This is a similar idea to Cai and Szeidl (2018), but is only one example of a potential intervention.
9
intuition is incomplete.
The multiple channels at play highlight the difficulty in directly link the ATE and at-
scale gains. Only by ex ante assuming away the problem is there a one-to-one relationship
between the ATE and remaining diffusion parameters. This concern is not specific to the
exact example used here. We next generalize this to any model defined by Assumptions
1 – 3 and intervention defined by Assumption 4.
2.3 Linking the Treatment Effect to Model Parameters
An intervention of the type described in Assumption 4 allows us to measure the causal
average treatment effect (ATE). In percentage terms, this empirical moment is
ATEdata =E[π′i|i ∈ T]
E[π′i|i ∈ C]
where π′ is the ex post profit of a firm, T is the set of treated firms that draw from HT ,
and C are control firms drawing from M . The model provides its own counterpart to
this ATE. After a bit of algebra, it can be written as
ATEmodel =
∫ ∫πρ max {1, π/π}β dHT (π)dHT (π)∫ ∫
πρ max {1, π/π}β dM(π; π, θ)dHC(π). (2.2)
Conditional on the intervention, there are two ways the model can deliver a given
ATEdata. The first is how we motivated the example above: appealing to the source
distribution. But an alternative is an appeal to β and ρ. If good matches always create
large changes in profit, and the intervention creates more good matches, the treatment
effect will be large. This can be summarized as
∂ATEmodel
∂θ< 0 ,
∂ATEmodel
∂β> 0 ,
∂ATEmodel
∂ρ> 0.
The first implication here is that biasing the parameter choices (e.g., assuming ex ante
that β = ρ = 1 as in the example above) will have little impact on the model’s ability to
match ATEdata. For example, if the realized ATEdata is small, but we counterfactually
assume that there are large gains from a good imitation draw, the model responds by
assuming it must be extremely easy for agents to otherwise access those good draws.
That is, it infers a high θ. More generally, under loose restrictions we formalize in the
next section, we can always find an extensive margin parameter θ to rationalize a given
treatment effect for any values of (β, ρ).
The second implication forecasts our main concern in this paper. If these various
parameters play different aggregate quantitative roles, there need not be any relationship
10
between the ATE and the gains from scaling the intervention. There will be a range of
possible at-scale outcomes for a given ATE, and we show later quantitatively that this
range can be quite broad.
Therefore, our first goal is to separately identify these parameters. We must first
identify (β, ρ). Conditional on those parameters, we can use the intuition developed
above to formalize the link between the ATE and θ. It turns out that this first step can
be accomplished with one additional reduced-form moment that is both straightforward
to interpret and commonly used in applied work.
2.4 Identification
Our procedure works as follows. Using only the treatment data, we show how to identify
β and ρ uniquely. This step identifies the parameters that govern the intensive margin
– the gains derived conditional on a match. The second step lets us then measure the
underlying quality of the matching technology, conditional on the gains from a given
match. That is, we show that θ can be identified from the average treatment effect as a
function of (β, ρ). Proofs of the following results are available in Appendix G.
Using Treatment Data for (β, ρ) Identifying the intensity and persistence parameters use
only variation from treatment matches. This has the benefit that identification follows
almost directly from the law of motion for diffusion, and is summarized in Proposition 1.
Proposition 1. The parameters (β, ρ) are identified by coefficients from the regression
run only on treatment firms
log(π′i) = c+ ρ log(πi) + β log
(max
{1,πiπi
})+ εi, (2.3)
where c is the constant equal to the technological parameter c if π = z.
There are two ways to read the regression (2.3). The first is that β measures the
impact of receiving a better match, after controlling for initial income. An alternative
is that (2.3) estimates a standard AR(1) process, but controlling for the potential bias
induced by variation in match quality π/π. The within-treatment orthogonality condition
allows us to measure both forces in one regression.
Quality of the Matching Technology, θ The next step is to identify θ. We show that the
average treatment effect identifies this parameter as a function of (β, ρ). We begin with
the formal result in Proposition 2, with discussion following.
11
Proposition 2. Given the values (β, ρ), the value of θ is uniquely identified by ratio of
ex post average profit of treatment relative to control (i.e., the average treatment effect)
if that ratio falls within the bounds [Γmin,Γmax].
Γmin = infθ
∫ ∫πρ max {1, π/π}β dHT (π)dHT (π)∫ ∫
πρ max {1, π/π}β dM(π; π, θ)dHC(π)(2.4)
Γmax = supθ
∫ ∫πρ max {1, π/π}β dHT (π)dHT (π)∫ ∫
πρ max {1, π/π}β dM(π; π, θ)dHC(π). (2.5)
The intuition for Proposition 2 follows from what this data-generating process allows
us to infer about the control group, and Figure 1 provides a graphical representation.
The intervention guarantees all treated agents a high-quality match from some fixed
distribution. The average treatment effect is then governed by the difference between
that distribution and the baseline distribution from which the control group still draws
matches. A small treatment effect implies the control group must already be able to
access high-quality matches. Said differently, they are drawing from M indexed by a
high θ.12
Figure 1: Identification of θ
Productivity
pdf treatmentdraws
pdf control draws(θ = 0.9)
pdf control draws(θ = 0)
ztreat
Figure notes: Figure shows the distribution of draws for treatment and control firms under different θ.The distributions are drawn Pareto, but this is for the example’s sake only.
To relate back to the example in Section 2.2.1 and broader under-identification dis-
12A final point to mention here are the bounds Γmin and Γmax in equations (2.4) and (2.5). These bounds exist onlyto guarantee that the treatment effect stays within the set of values that a model can possibly rationalize. For example, ifβ = 0, the model naturally cannot rationalize any non-zero treatment effect. This implies Γmin = Γmax = 0. Thus, thesebounds provide the theoretical limits that govern the model.
12
cussion in Section 2.3, note that our identification of θ is a conditional one. It depends
critically on already knowing how a given match translates into future productivity. This
is the power of the within-treatment orthogonality condition. If those intensive margin
parameters are biased for some reason, Proposition 2 shows that the model will still ratio-
nalize the average treatment, but with a θ whose bias tracks that of the intensive margin
parameters. The extent to which this affects the relationship between the reduced-form
ATE and aggregate gains is a quantitative question, which we turn to in the coming
sections.
2.5 Discussion on Assumptions and Identification
Before turning to the application, we briefly digress to discuss the results in this Section.
Our results lay out a relatively parsimonious procedure to identify a limited set of
parameters. In the Appendix, we extend the results in a number of directions. We
first allow for a more general law of motion for ability which can be estimated semi-
parametrically with the same data generating process. Relatedly, the results can easily
be extended to the inclusion of heterogeneous returns by characteristics. As a simple
example, this allows for the possibility that gains from a match depend on the age
difference between two firm owners. Letting a1, . . . , aN be the age-difference bins, we
could instead estimate (ρ, β1, . . . , βN) in the regression
log(π′i) = c+ ρ log(πi) +N∑j=1
βaj1Gapi=aj log
(max
{1,πiπi
})+ εi.
We also extend the results to allow a more general relationship between observables
and ability, instead of our maintained assumption π ∝ z. This allows, for example,
the integration of production function estimation to derive the required link between
observables and ability. Finally, we show the procedure can be adjusted to include
measurement error or idiosyncratic distortions that affect the relationship between profit
and ability, building off an active literature on non-linear error-in-variables models (see
Schennach, 2020, for a thorough review).
On a practical level, all of these extensions come with some additional cost or compli-
cation, whether it be empirical (e.g., an increased sample size required for non-parametric
identification) or theoretical (the details of the required deconvolution methods to deal
with measurement error). But the underlying intuition behind the estimation procedure
itself, along with the economic outcomes we focus on, are robust to their inclusion. We
therefore save them for the Appendix.
13
3 Application to Kenyan Firms
Our results so far highlight a way to separate parameters governing two dimensions of
the diffusion process: the extensive margin (the probability a given imitation draw will
be realized) and the intensive margin (how a given match affects ability). Both of these
margins affect the micro-level treatment effect and the at-scale implications. The next
question we seek to answer is whether or not this process matters quantitatively. If the
same parameters governing the treatment effect similarly govern at-scale implications,
for example, the quantitative bias by not separately identifying them may be small.
Studying this quantitatively requires us to both close the model and bring to bear
evidence that satisfies Assumption 4. We therefore utilize randomized controlled trial
data from Brooks et al. (2018). Per the discussion in Section 2, we can focus on this
estimation independent of the remaining model structure, so we leave the model details
for Section 4.
3.1 Details of RCT and Data Collection
The experiment took place in Dandora, Kenya, a dense informal settlement on the out-
skirts of Nairobi. Self-employment is ubiquitous in Dandora with a huge number of
street-level businesses operating in a variety of industries, such as retail, simple manufac-
turing, repair and other services. We began by conducting a large scale cross-sectional
survey. We sampled a random cross-section of 3290 businesses. This sample is represen-
tative of the population of enterprises, and it includes businesses of a variety of ages and
industries. We use it to derive moments from the population of operating firms.
We randomly match more profitability entrepreneurs with less profitable ones. The
business owners are followed over 17 months to measure changes in business practices
and profitability over time. Outcomes are compared to a control group of similar firms.
Selection and Randomization We start from a sample of female business owners who
have been in operation for less than 5 years.13 We then randomly select a subset of
these business owners to randomly match with a more profitable owner. A control group
receives no such offer. Firms were then surveyed over 6 quarters to track the time series
of treatment.
The set of more profitable owners were selected from those businesses with owners
over 40 years old and at least 5 years of experience. This was to minimize the impor-
tance of “luck” in baseline profit realizations to allow us to focus on truly productive
13The sex selection criteria is to limit heterogeneity outside the model. Note, however, that females make up 65 percentof business owners in Dandora and 71 owners with businesses open less than 5 years.
14
business owners. We then recruited business owners with the highest profit until we had
a sufficient number for matches. Of those contacted, 95 percent accepted. Matches with
the treatment firms were randomly created conditional on industry.
To summarize, Figure 2 plots the cumulative distribution function of baseline profit
for the entire sample, the population we study, and the selected matches. One can see
that our study population is somewhat poorer than the entire population, while the
matches are drawn from the far right tail of the baseline profit distribution.
Figure 2: Baseline Profit Distributions
0.2
.4.6
.81
5 7 9 11Log(Monthly Profit)
Entire Pop. Study Pop. Defined matches
Note that this procedure satisfies all the requirements in Assumption 4. The random-
ization naturally satisfies the exclusion restriction, we do not assume observability of the
control matches, treatment draws are drawn from the right tail of the baseline firm profit
distribution, and the individual one-to-one matches are randomly assigned.
Details of a “Match” What does it mean to enter into one of our matches? We designed
the program to remain as truthful to the theoretical counterpart of the model as possible.
First, matches were designed to only last for one month, though of course there was no
restriction on meeting after the formal end of the program.
The program was pitched to both sides of the match as a mentee-mentor relationship,
and thus was explicitly focused on business success. The more successful business owners
were the “mentors,” while the less successful were the “mentees.” The mentors were told
they could potentially help other business owners learn the requisite skills required to
operate in Nairobi. However, we provided no topics to discuss, instead preferring that
the content of any discussions was self-directed. After signing up mentors we simply
provided the mentees with the mentor’s phone number and told them that a prominent
15
business owner in Dandora was willing to discuss business questions with them. Whether
they contacted the mentor, or ever met, was their decision. However, all matches met
at least once in the official month-long treatment period.14 For simplicity and ease of
reference to the more detailed discussion in Brooks et al. (2018), we refer to these two
groups as mentees and mentors. We emphasize, however, that they should more generally
be thought of as the more and less profitable members of a match.
3.2 Balance
Since our theoretical results rely on two layers of randomization, we need to verify balance
both on between control and treatment and within treatment. Our two balance tests are
yi0 = α0 + α1Ti + εi,
yi0 = α0 +∑
j∈{L,M,H}
αjMij + εi
where Ti = 1 if i is in the treatment group and Mij is an indicator denoting that firm i is
a treatment firm matched with a bottom 25th percentile (denoted MiL), 25-75 percentile
(MiM), or top 25 percentile firm (MiH) in terms of baseline profitability. The results are
in Table 3 and Table 4 in Appendix A.15 As expected given the randomization, there is
limited variation across 15 different characteristics. The only significant difference is in
age and the magnitude is small.
3.3 Treatment Effects and Underlying Mechanisms
Figure 3 begins by plotting the average treatment effect (as a percentage) over time,
along with the 95 percent confidence interval. There is a large treatment effect in the
immediate aftermath of the treatment period that fades quickly. We test whether our
model rationalizes this pattern after laying out the model below, and find that it does. We
also directly confront what inferences can and cannot be made about aggregate diffusion
from this pattern in Section 5.3.
Mechanisms The observed changes in profitability primarily come from lowering input
costs, with unit inventory costs falling by 49 percent relative to the control group.16
14One might be concerned that we indirectly primed mentees to believe these matches would be beneficial. We can dolittle to rule this out completely. We note, however, that evidence of the mentor’s business success are easily visible tothe mentee. Mentors had substantially more physical capital and workers, and had a fixed, relatively large buildings fromwhich they conducted business. Moreover, the first meeting took place at the mentor’s business. Thus, that the mentorwas “good” at running a business would likely have been understood with or without us.
15Some of the simple empirical results, like the standard balance test, are also reported in Brooks et al. (2018). We erron the side of including relevant results here to keep this paper self-contained. We are clear to note in each table whichresults are reported in the previous work.
16This is not to say that other components do not matter, and there is indeed substantially heterogeneity in topicsdiscussed within matches. The main topics covered in these meetings are attracting customers (63 percent), keeping
16
Figure 3: Time Series of the Average Treatment Effect (from Brooks et al., 2018)
0 1 2 3 4 5-0.4
-0.2
0
0.2
0.4
0.6
0.8
Figure notes: Figure plots average treatment effect as percentage above control mean (0.4 = 40%), alongwith the 95 percent confidence interval. Treatment takes place between quarters 0 and 1.
Consistent with this, we find that treated firms are 19 percent more likely to switch
suppliers in the aftermath of the treatment. When we close to model to study the
quantitative implications, we will take seriously this channel.
How, then, does this result square with the quick fade-out observed in Figure 3?
We find that the economic environment is characterized by massive amounts of turnover
– 62 percent of control firms switch suppliers in the 3 quarters immediately following
treatment. Thus, any value procured by firms on this dimension by meeting with others
is likely to have a short half-life. As we will show shortly, estimating the law of motion
for ability similarly points to limited persistence.
Impact on Higher-Profit Business Owner Finally, we note that the diffusion process in
Section 2 assumes that there should be no gains to the more productive members of
the match via the use of the max function in law of motion for ability (Assumption 1).
These individuals were not randomly selected relative to their peers, and thus cannot be
directly compared to a control group. However, our design allows us to use the selection
procedure to identify the causal impact of being chosen using a regression discontinuity
after resurveying both those chosen for the program and those just below the cutoff for
selection. We find no change in profitability, scale, or any practices one may associate
with ability (e.g., better book keeping, more marketing). These results are available in
records (57 percent), and lowering input costs (39 percent), where the number in parenthesis is the fraction of matchesthat discuss each topic.
17
Brooks et al. (2018), but we reproduce them in in Appendix A for simplicity.17
3.4 Estimating Diffusion Parameters from the RCT Results
We now turn to estimating diffusion parameters using these results. This estimation
requires only data on profit and treatment matches, and sets aside for the moment the
mechanisms highlighted in Section 3.3. Those details will return in Section 4 when we
close the model.
To identify the diffusion parameters, we restrict attention to the baseline and survey
wave 1 quarter post-treatment for estimation. Later we test whether the impulse response
of the model-computed RCT results matches the remaining empirical time series from
Figure 3, and find that it does.18
Step 1: Estimating (β, ρ) from Treatment Data Denoting the set of individuals in the
treatment as M and control as C, we first estimate equation (2.3) on treatment firm
data. We can write this more simply because πi ≥ πi for all i in the treatment, giving us
log(π′i) = c+ ρ log(πi) + β log
(πiπi
)+ εi for i ∈M.
As discussed in Section 2, this identifies β and ρ. Those results are in Column (1) of
Table 1, and we find that β = 0.538 and ρ = 0.595. These estimates can be ported into
any model in which ability evolves according to this law of motion, as they are identical
to their structural counterparts.
Step 2: Estimating θ from Relative Profit We now estimate θ. This requires a solution
to the problem
minθ
abs
(E[π′T ]
E[π′C ]−
∫ ∫πρ max [1, π/π]β dHT (π)dHT (π)∫ ∫
πρ max [1, π/π]β dM(π; π, θ)dHC(π)
)(3.1)
which is the difference between the empirical ex post profit ratio and its structural coun-
terpart under our assumptions.19 At this point, we are required to take a stand on the
type of diffusion process that underlies the economic environment. This is a tradeoff gov-
erned by the level of detail in the data collected by the researcher. With detailed network
data, one could use a more empirically-motivated approach (e.g. Beaman et al., 2020).
17Interestingly, Jarosch et al. (2020) find similar evidence for the max assumption among German co-workers. Asdiscussed in Section 2.5, this functional form assumption is not crucial. Given these results here, however, we would haveeventually required it and therefore maintained it throughout the paper.
18Alternatively, we could use the entire time series for identification. But given that the model matches it well estimatedoff only the first 2 quarters, the results are nearly identical.
19This derivation is detailed in the proof of Proposition 2 in Appendix G.
18
In the absence of such (costly) data, one can do so by assumption and test robustness of
results with other diffusion processes, as the same identification procedure holds in any
environment satisfying our assumptions.
We take this latter strategy in the absence of such data. We assume matches follow our
random directed search, in that imitation draws follow M(z; θ) = M f (z)1/(1−θ), where M f
is the distribution of operating firms in the economy. We note that such an assumption
has some empirical support in developing countries, emphasizing both random meetings
and the global nature of these spillovers across agents (e.g. Banerjee et al., 2020).
Since we have data from our baseline survey on M f , the only remaining unknown in
(3.1) is θ, and can thus be solved directly. The relevant empirical moment is in Column
(2) of Table 1. When we solve (3.1), this implies θ = −0.417.
Table 1: Identification Moments
(1) (2)
β 0.538(0.273)**
ρ 0.595(0.273)**
Treatment 891.990(280.720)***
R2 0.053 0.047Control Avg – 1897.851
Table notes: Standard errors are in parentheses. The top and bottom one percent of dependent variables are trimmed. Statisticalsignificance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***.
With the diffusion parameters in hand, we can now reintroduce the additional em-
pirical details to construct the remaining economic environment in which we will study
the quantitative implications of scaling this intervention.
4 Building the Results into a Model of Diffusion
The flexibility provided by the identification requires that we must only satisfy Assump-
tions 1 – 3 when closing the model. We first include occupational choice between firm
operation and wage work. Since the economy is primarily made up of relatively small
firms, this is the relevant income-generating trade-off for the firm owners in our study.
Second, as discussed in Section 3.3, lowering costs is a key driver of increased profitabil-
ity. We model this channel explicitly by formalizing the bargaining process between firms
and suppliers. Finally, we model a small open economy. That is, suppliers have access
to infinite supply of inputs, consistent with their connection to the broader Nairobi and
Kenyan economy. However, the labor market and diffusion of ability are local.
19
4.1 Model Details
Time is discrete and infinite. There are a unit mass of agents. The state of an agent
is her ability z, which evolves over time. Each agent dies with exogenous probability
δ and a mass δ of new agents replace them each period. New agents draw their initial
productivity from a fixed distribution with c.d.f. G(z).
Every agent has flow utility
u(c, s) = ω log(c) + (1− ω) log(1− s)
where c is consumption and s is effort (discussed below). ω is the relative weight of
consumption in utility. There are no borrowing and savings markets, so consumption is
equal to income.
Each period, an agent can choose between running a firm and working at one. Wage
work earns the market-clearing wage w, as labor is traded on a competitive market.
Firms earn profit π = xαnη−pxx−wn, where x and n are intermediate inputs and labor
services, and px and w are their respective prices.
Supplier Matching, Input Choice, and the Importance of Ability Unlike the competitive
labor market, intermediate purchases require a firm to seek out a supplier. There are
a continuum of suppliers indexed by their marginal cost m who earn profit πs(m,x) =
(px−m)x. Suppliers can source from some outside entity, consistent with their connection
to the broader Nairobi and Kenyan economy, and thus can provide whatever amount of
input x is requested by firms at marginal cost m.
Firms can seek out suppliers with lower marginal cost by exerting search effort s.
Given effort s and ability z, the firm matches with a supplier m = e−szα+η−1α . Thus,
ability makes it easier to seek out lower cost suppliers.20
Once they meet, the two parties Nash bargain over the price px that the firm will
pay, taking into account the optimal input choice by the firm at that price. This implies
an equilibrium price
p∗x(m) = argmaxpx (π)ν(πs)1−ν (4.1)
for bargaining weight ν and the participation constraint that both the supplier and firm
earn weakly positive profit.
20The constant returns assumption for suppliers eliminates any strategic interactions between supplier-firm bargaininggames.
20
Ability Transmission This ability to find lower-cost suppliers is transmitted between
agents. It evolves according to our maintained Assumption 1,
z′ = ec+εzρ max
{1,z
z
}β.
We assume learning occurs via operating firms. Denoting M f as the equilibrium dis-
tribution of operating firms (defined formally below), we write M(z;M) = M f (z;M)1/(1−θ)
as discussed in Section 3.
Recursive Formulation The timing and decisions described above implies that we can
write the problem recursively. The aggregate state of the economy is the distribution of
ability, M(z). At the start of each period, each agent chooses to either operate a firm or
engage in wage work given her ability z and aggregate state M ,
v(z,M) = max{vf (z,M), vw(z,M)}
We let the decision rule φ(z,M) = 1 denote firm operation and φ(z,M) = 0 denote wage
work. Thus, we write the distribution of draws as
M(z;M) =
( ∫ z0φ(z,M)dM(z)∫∞
0φ(z,M)dM(z)
) 11−θ
.
As discussed above, an agent who decides to operate a firm must exert effort s to find
a supplier, then bargain over the input price px. Her problem can be written recursively
as
vf (z,M) = maxs,x,n≥0
ω log(π) + (1− ω) log(1− s) + (4.2)
(1− δ)∫ε
∫z
v(z′(z, ε; z),M ′)M(dz,M)dF (ε)
s.t. m = f(s, z)
px = argmaxpx [π]ν [πs(m)]1−ν
π = xαnη − pxx− wn
z′(z, ε; z) = ec+εzρ max
{1,z
z
}βThe first constraint is the type of supplier met with search effort s. The second guarantees
that the realized cost is the outcome of Nash bargaining between the firm and its supplier.
The third and fourth are the definitions of profit and the law of motion for ability.
On the other hand, an agent who chooses to work for a wage receives flow consumption
21
w from wage work, but does not need to seek out a supplier, implying
vw(z,M) = ω log(w) + (1− δ)∫ε
∫z
v(z′(z, ε; z),M ′)M(dz,M)dF (ε) (4.3)
s.t. z′(z, ε; z) = ec+εzρ max
{1,z
z
}β.
Equilibrium We study the stationary equilibrium of this model, which includes value
functions v, vf , and vw, decision rules for occupation φ(z,M), effort s(z,M), labor
n(z,M), and intermediates x(z,M), bargaining outcomes px(z,M), and a distribution
of ability M(z), such that the value functions solve the agent’s problem above with the
associated decision rules, and the aggregate state evolves according to
M ′(z′) := Λ(M(z′)) (4.4)
= δG(z′)
+ (1− δ)∫ ∞
0
∫ ∞0
F (log(z′)− ρ log(z)− β log(max{1, z/z})− c)M(dz;M)M(dz),
Λ(M) is the law of motion for the aggregate state. It consists of the ability of new
entrants, δG(z), and the evolution of ability for surviving agents. In the stationary
equilibrium, M∗(z) = Λ(M∗(z)).
Guaranteeing Assumptions are Satisfied This model satisfies all of the necessary as-
sumptions required for our procedure to hold. We directly assume Assumption 1 as the
law of motion for ability, and Assumption 3 is satisfied by our assumption M(z;M) =
M f (z;M)1
1−θ . All that remains is Assumption 2, that π ∝ z. We provide details of these
derivations in Appendix G.4, but we summarize a few useful features of the equilibrium
in the following proposition.
Proposition 3. The following results hold in the equilibrium of the model:
1. The solution to the Nash bargaining game is a constant markup over marginal cost,
px(m) ∝ m. Thus further implies that the realized equilibrium cost paid by firms is
given by px(z) ∝ zα+η−1α .
2. The equilibrium profit function can be written as π(z, w) = A(w)z, where A(·) de-
pends only on the equilibrium wage and parameters of the model.
3. Occupational choice follows a cut-off rule by which all agents with z ≥ z operate
firms. That cut-off is increasing and convex in the equilibrium wage, and given by
the function z(w) = w1−α
1−η−αC for a constant C.
22
Proof. See Appendix G. �
We highlight these specific features because they turn out to be useful for calibration
and interpretation of the quantitative results. For now, however, the second result of
Proposition 3 shows that our model satisfies the necessary assumptions for our procedure
to remain valid.21
One additional point made clear by Proposition 3 is that the assumptions laid out
in Section 2 are restrictions only on the observed outcomes from the model. There are
many assumptions on the primitives that can generate these outcomes, which allows the
flexibility in modeling decisions used to fit the relevant environment.
4.2 Calibration of Remaining Parameters
Given the previously-estimated diffusion parameters, we require the remaining parame-
ters before turning to quantitative results. This follows a relatively standard calibration
procedure and we choose parameters to match moments of the same set of firms in
which the experiment was conducted. We make use of the baseline field data that is a
representative sample of firms in Dandora, Kenya.
The model parametrization can be broken into three parts. First, as we showed
previously, the diffusion parameters are independent of the remaining model parameters.
Thus, we can simply impose our estimated parameters β = 0.538, ρ = 0.595, and θ =
−0.417. The model period is assumed to be one quarter.
That leaves 9 remaining parameters. On the utility side, they include the relative
weight of consumption ω and the agent death rate δ. The remaining parameters dealing
with technology and ability evolution are the parameters α and η, the ability growth
term c, and the terms governing the exogenous shocks governed by distributions G and
F . We assume that G ∼ logN(µ0, σ0) and that F ∼ logN(µ, σ), and normalize µ0 = 0.
We also note that c and µ are not separately identified, so we choose µ = −σ2/2 so that
E[eε] = 1. Finally, we have the bargaining weight ν. After the normalization and choice
of µ, we have 8 remaining parameters to calibrate.
The remaining 8 parameters can be broken into two groups. The first group is matched
one-to-one with a given moment or value (δ, σ0, ν). The second group is jointly targeted
to jointly target a set of moments (σ, c, ω, α, η). The death rate δ, which we match
to the average age of population in the study, which is 34. The constant death rate δ
implies a geometrically distributed age distribution with mean 1/δ and, assuming a new
21Note that the result π(z, w) = A(w)z in Proposition 3, which gives us the requirement that π ∝ z, follows becausematching between suppliers and firms is deterministic. Uncertainty these matches is straightforward to include. It requiresan adjustment to the diffusion parameter estimation analogous to a measurement error adjustment. We discuss thismeasurement error extension in the Appendix.
23
agent is 18 years old, implies an average age of 64 quarters in the model.22 The standard
deviation of new entrant ability matches the variance of log profit for firms that have
been open for less than 3 months, which implies σ0 = 1.00. Finally, we note that given
the model set-up the bargaining power of firms in its supplier negotiation (ν) has no
effect the results. Thus, we set it as ν = 0.5 for simplicity.
This leaves 5 parameters – σ, c, ω, α, and η – which we target to jointly hit 5
moments. While jointly calibrated, each has a clear intuitive counterpart. We choose σ
to match the standard deviation of log profit in the economy, equal to 0.99. The drift
parameter c in ability is targeted to match the average profit of all firms relative to those
who entered less than one year ago (1.51). The utility parameter ω is set to match the
share of employment in wage work. To measure this, we use the most recent Kenyan
census (via IPUMS, 2020) to measure the share the workforce who report wage work as
their main economic activity in Embakasi Contituency – the local area in Kenya that
includes our study site. Forty-eight percent of employment is in wage work.
The final two parameters – α and η – are set to match two moments. The first is the
standard deviation of the log unit intermediate cost across firms. After removing industry
fixed effects that may bias those estimates, the standard deviation in the data is 1.61.
The second moment is the cost ratio, defined as the wage bill relative to intermediate
spending. The average firm at baseline has a ratio of 0.13. Our Cobb-Douglas assumption
implies that wn/cx = η/α, so we set η = 0.13α. The standard deviation of intermediate
cost is then exactly pinned down by the cost ratio and the variability in log profit. This
follows from the fact that Proposition 3 implies the relationship log(px) = Constant +(α+η−1α
)log(π), and therefore σlog(px) =
(1−α−ηα
)σlog(π). Plugging η = 0.13α yields
α =σlog(π)
σlog(px) + 1.13σlog(π)
.
The full set of moments and parameter values are reported in Table 6 in Appendix
A.3. The model matches these moments well.
5 Quantitative Results
We begin with the aggregate implications of scaling the RCT in Section 5.1. We then
use the model to study whether measuring the average treatment effect can differentiate
the aggregate gains in Section 5.2, highlighting the importance of separately identifying
the intensive and extensive margins. Finally, in Section 5.3, we discuss the dynamic path
of the treatment effect drawing on results developed in Sections 5.1 and 5.2.22Because agents can move between firm operation and wage work over the course of their lives, we match actual age
rather than age of the firm.
24
5.1 Impact of Scaling the Intervention
We start by measuring the aggregate gains from policy. We think of this as a govern-
ment or other policy-making body implementing a program that makes it easier to find
high-quality matches, in line with the extensive margin focus of the RCT results. We
implement this by shocking the matching technology parameter θ so that it moves 25
percent closer to its limit θ = 1. We study the new stationary equilibrium and com-
pare it to the baseline equilibrium.23 This is a simple way to replicate a policy that
makes matching easier, for example through extension programs, better ICT, or explicit
mentoring programs, among others.24 Mechanically, it involves shocking the stationary
economy with θ = −0.417 to θnew = −0.063.
Aggregate moments are reported in Table 2. We present two steady states, both of
which operate under the new matching function. The first (in column 2) fixes the wage
at its baseline level. The second (in column 3) allows the wage to adjust as well.
Table 2: Equilibrium Moments
(1) (2) (3)
Baseline Fixed Wage At-Scale
Income 1.00 1.07 1.11
Ability 1.00 1.08 1.12
Aggregate Labor Supply 1.00 0.92 0.98
Wage 1.00 1.00 1.13
Table notes: Column (1) is the initial equilibrium, normalized to one. (2) and (3)report the new stationary equilibrium after shocking the matching technology,where (2) holds the wage fixed at its baseline level and (3) allows it to adjust.
Overall, income rises by 11 percent. This is made up of two general equilibrium effects.
The first is that the new matching technology directly affects the ability distribution by
making it easier to learn from high ability agents. The second is an amplification effect
through prices.
We decompose the relative importance of these two channels in columns 2 and 3 of
Table 2. Column 2 holds the wage fixed at its baseline level to but allows the ability
23Recall that given our matching technology, θ = 1 would imply all agents draw from the limit of the right tail of theability distribution. Thus, we take the economy 25 percent closer to that limit. Specifically, we set θnew = 0.25 + 0.75θ,which guarantees 1 − θnew = 0.75(1 − θ). A seemingly simpler alternative is to increase θ by 25 percent, but sinceθ ∈ (−∞, 1), such an exercise has misleading implications around θ ≈ 0.
24The choice of 25 percent is somewhat arbitrary, but also irrelevant net of some differences in magnitudes. Analternative, for example, would be to increase θ until we exactly replicate the treatment effect from the Kenyan RCT when
we offer treated firms the source distribution M(z; θnew). We think of the policy-maker more broadly than this exactreplication: she creates an aggregate program that adjusts the same margin as the RCT, but does not necessarily needexactly replicate the RCT gains (which may be impractical for a number of reasons outside the scope of this paper). Thus,these results are one example of an extensive-margin adjustment. Regardless, our main results hold for any change in thisdimension. In Appendix A.4 we provide the results when the aggregate policy exactly replicates the estimated RCT gains.The same results hold with larger magnitudes.
25
distribution to adjust. This isolates the direct effect on ability. Average ability rises by 7
percent and the labor supply declines by 8 percent as the distribution shifts mass across
the (fixed) cut-off ability level that defines occupational choice. Average income rises by
7 percent.
Column 3 allows the equilibrium wage to adjust, which increases by 13 percent as
higher TFPR firms demand more workers. Correspondingly, labor supply increases (rel-
ative to Column 2) as marginal entrepreneurs are drawn into wage work.25 This addi-
tionally amplifies ability. Since these marginal entrepreneurs most congest the learning
process, removing them makes it easier for others to learn. Thus, ability rises even fur-
ther. Of the total increase in ability, 67 is from the direct effect and 37 is from the
amplification through prices. A similar magnitude holds for income: 64 percent is from
the direct effect, while 36 is through the price adjustment.
By design, neither of these equilibrium effects are present in the causal reduced-form
results from randomly pairing entrepreneurs, though this random matching similarly
micro-founds the model from which the aggregate results are derived. We next turn to
the relationship between empirical moments from this RCT and the at-scale gains derived
from the model.
5.2 Relationship Between Treatment Effect and Aggregate Gains
We use the model to study the mapping between reduced-form RCT results and aggregate
gains. This requires replicating the RCT in the model.
Running the RCT in the Model We draw a set of treatment and control firms with profit
identical to our empirical distributions detailed in Section 3 from the baseline steady state
distribution M∗(z). Then we draw another set of agents – the matches – again conforming
to our empirical distribution. We randomly match our treatment agents with a match,
and track the evolution of both treatment and control for 6 quarters. Throughout, we
hold M∗ fixed so that the treatment does not result in control group contamination as
required for causal interpretation.
Quantitative Exercise and Results We then ask the relationship between the RCT results
and at-scale gains, and the importance of separately identifying the diffusion parameters
(β, ρ, θ). We do so in the following way. We fix the average treatment effect (ATE)
then vary the intensive margin parameters (β, ρ) ∈ [0.20, 0.95] × [0.20, 0.95].26 At each
25Recall from Proposition 3 that occupational choice is defined by a cut-off rule. All agents with ability z < z ∝ wbecome workers. Increasing the wage therefore shifts those entrepreneurs near the cut-off into wage work.
26Below β = 0.16, the estimation of θ may not be feasible at certain levels of the assumed treatment effect. This occursbecause the average treatment effect falls outside the bounds formalized in Proposition 2. We therefore constrain (β, ρ)
26
(β, ρ) we re-estimate θ to match the assumed ATE. We then compute the aggregate gains
from shocking the matching technology as described above. This exercise therefore gives
the range of possible aggregate outcomes that could be achieved by assuming intensive
margin parameters (β, ρ) instead of utilizing heterogeneity as an estimating moment.
Figure 4 presents the results at our baseline ATE. As (β, ρ) counterfactually decrease,
the model is forced to rationalize the ATE via the extensive margin. Since firms now
(counterfactually) gain little from a good match, the model instead requires control firms
to receive worse baseline matches, thus increasing the implied value of the treatment.
Thus, bias in (β, ρ) induces bias in θ to match the ATE. This is the quantitative coun-
terpart to the theoretic discussion in Section 2 and detailed in Figure 4a. Figure 4b
shows that the choice of (β, ρ) has critical implications for measuring aggregate gains,
even conditional on matching the ATE. The aggregate gains vary from 0.6 to 38 percent,
a 63-fold range.
Figure 4: Aggregate Implications of Varying (β, ρ) at Baseline ATE
(a) Implied θ (b) Aggregate Gains
Figure notes: For each value of (β, ρ), (a) plots the implied θ that guarantees the model hits our baselineATE. (b) then plots the implied aggregate gains from scaling the intervention as a percentage change inaverage income from the baseline equilibrium (i.e., 0.4 = 40%).
We next replicate the same exercise, but do so for every ATE between 0.01 and
70 percent. These results are in Figure 5, which plots the band of possible aggregate
outcomes for each ATE. Our 11 percent aggregate increase in income can be derived in
economies with nearly any ATE, depending on how one sets (β, ρ). Moreover, the wide
range of possible outcomes shows that it would be relatively simple to come up with a set
of economies in one could observe a negative relationship between the RCT treatment
above 0.20 to guarantee we are always comparing within the same set of parameters, instead of allowing the feasible set ofparameters to vary with the assumed treatment effect.
27
effect and the gains from scale.
Figure 5: Relationship between average treatment effect and gains at scale
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Figure notes: The shaded region shows all possible realizations of aggregate gains that can be achieved while holding theaverage treatment effect fixed by re-estimating the extensive margin parameter θ for each (β, ρ) combination. Multiply by100 for percentage gains.
Why do we see this type of variation in the model and how can it help us understand
how to use intervention-level evidence to measure aggregate potential? Figure 6 starts
with some mechanics. Holding all other parameters fixed, we vary θ (Figure 6a) and β
(Figure 6b).27 We then study how both the aggregate gains at scale and the RCT-level
treatment effect vary.
The patterns in Figure 6 show why our headline results take the shape they do.
While the RCT-level treatment effects are primarily driven by θ, the aggregate gains
are primarily driven by β (and, to a lesser extent, are reinforced by ρ). Thus, when we
vary β away from its estimated value, the impact on the ATE can be easily made-up
for by varying θ. Yet this adjustment of θ has little impact on the aggregate outcomes,
allowing the direct effect of β to dominate. This gives us the result that at-scale gains
can vary widely for a given ATE: the two moments are driven quantitatively by different
parameters in the model.
The Link Between Reduced-Form Heterogeneity and Aggregate Gains Why is β so crit-
ical here and how does it relate economically to measuring variation in the treatment
effect? We detail that relationship here.
The driving force behind aggregate diffusion models is the mass of agents in the right
tail of the ability distribution. At an intuitive level, the rationale is simple: if agents learn
27We omit ρ in the interest of clarity, as β plays a larger role. A similar pattern emerges for ρ, though with smallermagnitudes.
28
Figure 6: Variation at RCT and Aggregate Levels by Parameters
(a) θ
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1-2
-1
0
1
2
3
4
5AggregateRCT
(b) β
0 0.2 0.4 0.6 0.8 1-2
-1
0
1
2
3
4
5AggregateRCT
Figure notes: Each figure varies the listed parameter while holding the remaining parameterizationfixed at its baseline value. “RCT” is the implied average treatment effect, while “Aggregate” is thegains from scaling the intervention as defined in the text. Results are presented as growth rates g :=(E[ypost]/E[ypre])−1, then normalized relative to our baseline parameterization as g/gbaseline so that alllines cross through 1.
from those with higher ability, a heavier-tailed distribution will facilitate more learning.
But this distribution is an equilibrium object and therefore requires some mechanism by
which agents move into the tail of the distribution. Put differently, it requires that if an
average agent meets a high-ability agent, she must be able to internalize a large portion
of that ability.
This is exactly what the parameter β governs in the model, and its ability to shift the
ability distribution is crucial for its role in the at-scale gains observed here. To see this,
Figure 7 plots how the ability distribution responds to the improved matching technology
under changes for β and, for comparison, θ. Figures (a) and (b) plot the survival function
1−M(z), or the fraction of the population whose ability is larger than z. Both axes are
measured in logs.28 We do so under our baseline parameterization, then also vary β and
θ to make them 50 percent closer to their theoretical maximum of one (i.e., (1-0.769)/(1-
0.538) = 0.5 in Figure 7a). Figure 7c shows how the difference varies for each of these
two parameters.
While at-scale results rely on what happens if an agent meets another high-ability
agent, the RCT dispenses with the uncertainty and measures what happens when an
agent meets with another high-ability agent. That is, the RCT allows us to measure the
relative impact of a good match within a set of exogenously provided matches. Within
this fixed set of matches we provide, we estimate β by measuring how many agents get
28If the distribution of ability were Pareto, for example, the line would have a constant slope in this graph.
29
Figure 7: Equilibrium Survival Function as Parameters Change
(a) Increasing β
-2 0 2 4 6-10
-8
-6
-4
-2
0
(b) Increasing θ
-2 0 2 4 6-10
-8
-6
-4
-2
0
(c) ∆ Change in Ability Distribution as ParametersChange
-2 0 2 4 6-0.2
-0.1
0
0.1
0.2
0.3
Figure notes: Figures (a) and (b) plot the ability distribution at the baseline and improved matchingtechnology, measured as the log survival function. Figure (c) plots the difference in the mass across theability distribution. This is measured as the difference-in-difference of the curves plotted in (a) and (b).That is, denoting S = log(1−M(z)) as the log survival function, it plots ∆S = (Snewx′ −Sbasex′ )− (Snewx −Sbasex ) where x ∈ {θ, β} and x′ is the counterfactually higher level of that parameter.
pushed into the tail of the distribution. Figure 8 plots the survival function for treated
firms, based on two levels of β.29
29As a more explicit example, assume treated firms are distributed z ∼ logN(µz , σz), their matches z ∼ logN(µz , σz),and the exogenous noise ε ∼ N(µε, σε). Because our treatment matches are drawn from a small part of the right tailin the economy-wide ability distribution, we have µz > µz and σz < σz . Ex post ability for treated firms is givenby z′ = exp(c + ε)zρ(z/z)β after imposing our assumptions. This implies that z′ will be approximately distributed asz′ ∼ logN(µz′ , σz′ ), where µz′ := c + (ρ − β)µz + βµz + µε and σz′ := (ρ − β)σz + βσz + σε. This is not exact becausethe lognormal assumption does not generically allow for zi > zi among all treated firms, but the overlap will be small ifthe distributions are substantially different. Therefore, by shifting weight toward the match mean and variance, higherβ increases the mean and decreases the variance of the ex post treatment firm profit. Said differently, the treatmentcompresses ability in the right tail, which is the theoretical counterpart to Figure 8.
30
Figure 8: RCT Treatment Firms Survival Function
0 2 4 6-8
-6
-4
-2
0
Figure notes: The horizontal axis is ability. The vertical axis is the survival function of treated firms, or 1 minus thecumulative distribution function of ability. Both axes are measured in logs. The distribution of ability is the ex post abilitydistribution of only treated firms.
5.3 Dynamic Treatment Patterns
Our results so far highlight the critical importance of the intensive margin – the param-
eters governing the gains from a given match – for aggregate outcomes. How important
are these parameters for governing other reduced form moments? We study that question
here in the context of a unique aspect of our empirical results – the quick fade out of the
ATE discussed in Section 3. One seemingly ex ante reasonable takeaway from this result
is that there are limited gains from scaling the intervention. We study whether such an
interpretation is warranted here and find that it is not.
We start with an out-of-sample test of the model, asking whether it can match the
empirical time series of the average treatment effect over 5 quarters. These results are
in Figure 9. While the first two quarters (t = 0 and t = 1) are matched by construction,
both the model and data predict no treatment effect by t = 3. The model under-predicts
the effect in t = 2, so if anything, the model understates the partial equilibrium RCT
dynamics. An immediate consequence is that the short half-life of the treatment effect
need not necessarily be an indication of the diffusion-induced gains at scale.
31
Figure 9: Dynamics of Average Treatment Effect on Profit
0 1 2 3 4 5
Quarters Since Treatment
-0.4
-0.2
0
0.2
0.4
0.6
0.8Data
Model
Yet the relationship between ATE persistence and at-scale gains turns out to be even
more strained. Figure 10 varies β, a key parameter governing at scale-gains, and studies
how the implied treatment effect varies over time. We do so at our baseline persistence
ρ = 0.595 (Figure 10a) and a high persistence ρ = 0.99 (Figure 10b).30
Figure 10: Relationship between Treatment Persistence and Diffusion Intensity β
(a) At Estimated Persistence ρ = 0.595
0 1 2 3 4 5
Quarters Since Treatment
0
0.1
0.2
0.3
0.4
0.5 β = 0.16
β = 0.538
(b) At High Persistence ρ = 0.99
0 1 2 3 4 5
Quarters Since Treatment
0
0.1
0.2
0.3
0.4
0.5 β = 0.16
β = 0.538
Figure 10 shows that higher β (which implies larger at-scale gains) generates quicker
fade-out of the treatment effect. This has important implications for its use: the ATE
persistence may be negatively correlated the magnitude of the gains at-scale, much in the30We follow a similar procedure to our earlier exercise. We set (β, ρ), then re-estimate θ to match the same 1-quarter
ATE. This focuses the discussion on the fade-out from the same 1-quarter treatment effect.
32
same way the level of the ATE can be. The result reinforces the importance of measuring
this parameter directly for aggregate implications.
The intuition for this result stems from the fact that control firms continue to engage
in matches throughout the study period. A higher β allows them to internalize a larger
portion of a good match’s ability. As more and more control firms eventually meet a
high-quality match over time, a higher β allows them to more quickly catch treated
firms.31
We continue this discussion in the Appendix by conducting the same exercise with
a different experiment (Cai and Szeidl, 2018) in which gains persist longer, finding that
we can similarly replicate the persistence of those gains in that context because our
estimation procedure implies different parameters.
5.4 Discussion
Before concluding, we discuss a number of extensions we take up in the Appendix. De-
tailed results, and others discussed throughout the text, are available there.
First, we extend the identification results in a number of directions. These are dis-
cussed in Section 2.5, and include measurement error, generalizations of the ability law
of motion, and broader relationships between observables and ability.32 The robustness
of this procedure highlights the useful relationship between the experimentally-induced
data and estimation of hard-to-observe parameters in equilibrium models.
We then provide a number of additional quantitative results. First, and related to the
identification discussion, we quantify the bias induced by mismeasured profit. Our results
are a lower bound on the at-scale gains. Idiosyncratic distortions naturally generate a
similar, but not identical, result.
Second, we solve the social planner’s problem to study the gains that could be achieved
by transitioning the economy to its efficient allocation. This requires solving for efficient
consumption, search behavior, and occupational choice of all agents, taking into account
the diffusion externality generated by this allocation.33 Such gains are likely impossible
to achieve practically, but do provide a hypothetical upper bound from the optimal
management of diffusion. We show theoretically how the planner takes advantage of
complementarity between ability and search effort to tilt search toward high-ability agents
and simultaneously limits the number of firms to increase learning quality. She uses these
31Note also that we hold the aggregate state of the economy fixed when running the RCT in the model. Therefore theresult is not due to control group contamination.
32It turns out that under some additional conditions, the results can be pushed even further. Evdokimov (2010) providesnonparametric identification separating individual fixed effects from transitory errors in nonlinear models. His results couldbe used to extend the model and identification to arbitrary heterogeneity in some fixed, non-transmittable aspect of ability,though practical questions of power start to become more relevant in this case.
33As is common in aggregate diffusion models, our baseline economy is inefficient. This occurs in our model becauseindividuals do not take into account the effect of their occupational choices on the learning opportunities of other agents.
33
gains to subsidize consumption for those newly-excluded entrepreneurs (i.e., those who
run businesses in the laissez faire equilibrium). We also find that welfare gains are largest
at intermediate levels of β. This is a function of a two competing forces governed by this
parameter. First, all else equal, the planner has more scope for gains at higher β. Second,
however, general equilibrium effects at higher β incentivize agents to make decisions more
in line with the planner’s goals, thus lowering the gains from further intervention. We
detail this tradeoff and how it varies across the parameter space. These results deliver
a similar message as the main text: measuring the various forces that inform aggregate
gains is critical.
Third and finally, we use the work of Cai and Szeidl (2018) to apply our results to
a different experiment, in which gains from matches persist longer than in our baseline
RCT. Despite using a similar model to the one in the main text, we replicate the per-
sistence of their treatment effect. The reason is because our estimation procedure infers
different parameters from different patterns of treatment effect heterogeneity. The results
highlight the flexibility of the procedure, in that each model need not require a new esti-
mation strategy for key elasticities. Moreover, it shows that understanding how various
empirical moments inform these parameters plays an important role in understanding
what can and cannot be inferred directly from the empirical results.
6 Conclusion
Modern economic development research is grounded in well-identified studies that allow
a detailed snapshot of specific interventions and economic environments. Yet, sustained
economic growth requires understanding how such results inform aggregate policy. In a
recent review, Buera et al. (2021a) highlight the importance of this link.
This paper focuses on the relationship between firm-level interactions at the micro
and aggregate level, showing how the former can be misleading about the latter once
equilibrium forces are taken into account. We provide a solution, in that measuring a
type of treatment effect heterogeneity can provide a differentiating moment that allows
one to draw a ranking between intervention-level results and potential aggregate gains.
In our Kenyan application, we show the quantitative importance of doing so. Mis-
measuring the relative importance of these forces can cause the aggregate gains can vary
substantially despite identical average treatment effects. We further show how attempting
to extrapolate from other RCT moments – such as the persistence of the treatment effect
– suffer similar issues driven by same forces.
We note that the results leave open questions for future work. First, we leave out
34
additional interesting features that may arise at scale, such as the potential for better
interaction to be dampened by increased competition. Even without such model features,
our results show that moving from micro to aggregates is already a nuanced task. Second,
understanding the exact details of how firms interact and the frictions they face will allow
a more detailed and targeted policy approach. Different field experiments, designed with
an eye toward aggregate theory, could further refine our understanding of key aggregates
governed by a number of difficult-to-measure elasticities that affect these at-scale gains.
References
Akbarpour, Mohammad, Suraj Malladi, and Amin Saberi, “Just a Few Seeds
More: Value of Network Information for Diffusion,” 2020. Working Paper.
Alatas, Vivi, Abhijit Banerjee, Arun G. Chandrasekhar, Rema Hanna, and
Benjamin A. Olken, “Network Structure and the Aggregation of Information: The-
ory and Evidence from Indonesia,” American Economic Review, 2016, 106 (7), 1663–
1704.
Atkin, David, Amit K. Khandelwal, and Adam Osman, “Exporting and Firm
Performance: Evidence from a Randomized Experiment,” Quarterly Journal of Eco-
nomics, 2017, 132 (2), 551–615.
Banerjee, Abhijit, Emily Breza, Arun G. Chandrasekhar, Esther Duflo,
Matthew O. Jackson, and Cynthia Kinnan, “Changes in Social Network Struc-
ture in Response to Exposure to Formal Credit Markets,” 2020. Working Paper.
Beaman, Lori, Ariel BenYishay, Jeremy Magruder, and A. Mushfiq Mobarak,
“Can Network Theory-based Targeting Increase Technology Adoption?,” American
Economic Review, 2020. forthcoming.
Benhabib, Jess, Jesse Perla, and Christopher Tonetti, “Reconciling Models of
Diffusion and Innovation: A Theory of the Productivity Distribution and Technology
Frontier,” Econometrica, 2020. forthcoming.
Berguist, Lauren, Benjamin Faber, Matthias Hoelzlein, Edward Miguel, and
Andres Rodriguez-Claire, “Scaling Agricultural Policy Interventions: Theory and
Evidence from Uganda,” April 2019. Working Paper.
Blattman, Christopher and Laura Ralston, “Generating employment in poor and
fragile states: Evidence from labor market and entrepreneurship programs,” June 2017.
Working Paper.
35
Bloom, Nicholas and John Van Reenen, “Measuring and Explaining Management
Practices Across Firms and Countries,” Quarterly Journal of Economics, 2007, 72 (4),
1351–1408.
Brooks, Wyatt, Kevin Donovan, and Terence R. Johnson, “Mentors or Teachers?
Microenterprise Training in Kenya,” American Economic Journal: Applied Economics,
2018, 10 (4), 196–221.
Bruhn, Miriam, Dean Karlan, and Antoinette Schoar, “What Capital Is Missing
in Developing Countries?,” American Economic Review Papers and Proceedings, 2010,
100 (2), 629–633.
Buera, Francisco J. and Ezra Oberfield, “The Global Diffusion of Ideas,” Econo-
metrica, 2020, 88 (1), 83–114.
and Robert E. Lucas, “Idea Flows and Economic Growth,” Annual Review of
Economics, 2018, 10, 315–345.
, Joseph P. Kaboski, and Robert M. Townsend, “From Micro to Macro Devel-
opment,” 2021. Working Paper.
, , and Yongseok Shin, “The Macroeconomics of Microfinance,” Review of Eco-
nomic Studies, 2021, 88 (1), 126–161.
Cai, Jing and Adam Szeidl, “Interfirm Relationships and Business Performance,”
Quarterly Journal of Economics, 2018, 133 (3), 1229–1282.
Caicedo, Santiago, Robert E. Lucas Jr., and Esteban Rossi-Hansberg, “Learn-
ing, Career Paths, and the Distribution of Wages,” American Economic Journal:
Macroeconomics, 2019, 11 (1), 49–88.
Comin, Diego and Bart Hobijn, “An Exploration of Technology Diffusion,” American
Economic Review, 2010, 100 (3), 2031–2059.
Engbom, Niklas, “Misallocative Growth,” 2020. Working Paper.
Evdokimov, Kirill, “Identification and Estimation of a Nonparametric Panel Data
Model with Unobserved Heterogeneity,” 2010. Working Paper.
Fafchamps, Marcel and Simon Quinn, “Networks and Manufacturing Firms in
Africa: Results from a Randomized Field Experiment,” World Bank Economic Re-
view, 2018, 32 (3), 656–675.
Fried, Stephie and David Lagakos, “Electricity and Firm Productivity: A General
Equilibrium Approach,” 2021. Working Paper.
36
Fujimoto, Junichi, David Lagakos, and Mitch Vanvuren, “The Aggregate Effects
of “Free” Secondary Schooling in the Developing World,” 2021. Working Paper.
Greenwood, Jeremy, Philipp Kircher, Cezar Santos, and Michele Tertilt, “An
Equilibrium Model of the African HIV/AID Epidemic,” Econometrica, 2019, 87 (4),
1081–1113.
Hardle, Wolfgang, Hua Liang, and Jiti Gao, Partially Linear Models, Springer-
Verlag Berlin Heidelberg, 2000.
Herkenhoff, Kyle, Jeremy Lise, Guido Menzio, and Gordon Phillips, “Knowl-
edge Diffusion in the Workplace,” July 2018. Working Paper.
Hopenhayn, Hugo A., “Entry, Exit, and Firm Dynamics in Long Run Equilibrium,”
Journal of Political Economy, 1992, 60 (5), 1127–1150.
Hsiao, Cheng, “Consistent estimation for some nonlinear errors-in-variables models,”
Journal of Econometrics, 1989, 41, 159–185.
Hsieh, Chang-Tai and Peter J. Klenow, “Misallocation and Manufacturing TFP in
China and India,” Quarterly Journal of Economics, 2009, 124 (4), 1403–1448.
Imbens, Guido and Karthik Kalyanaraman, “Optimal Bandwidth choice for the
Regression Discontinuity Estimator,” Review of Economic Studies, 2012, 79 (3), 933–
959.
IPUMS, “Minnesota Population Center. Integrated Public Use Micro-
data Series, International: Version 7.3 [dataset],” Minneapolsi, MN.
https://doi.org/10.18128/D020.V7.2 2020.
Jarosch, Gregor, Ezra Oberfield, and Esteban Rossi-Hansberg, “Learning from
Coworkers,” Econometrica, 2020. forthcoming.
Jovanovic, Boyan and Rafael Rob, “The Growth and Diffusion of Knowledge,” Re-
view of Economic Studies, 1989, 56 (4), 569–582.
Klenow, Peter and Andres Rodrıguez-Clare, “The Neoclassical Revival in Growth
Economics: Has It Gone Too Far?,” in Ben S. Bernanke and Juliio Rotemberg, eds.,
NBER Macroeconomics Annual 1997, 1997, pp. 73–114.
Lashkari, Danial, “Innovation Policy in a Theory of Knowledge Diffusion and Selec-
tion,” 2020. Working Paper.
Li, Tong, “Robust and consistent estimation of nonlinear errors-in-variables models,”
Journal of Econometrics, 2002, 110, 1–26.
37
Lu, Will Jianyu, “Transport, Infrastructure and Growth: Evidence from Chinese
Firms,” 2021. Working Paper.
Lucas, Robert E., “Ideas and Growth,” Economica, 2009, 76 (301), 1–19.
and Benjamin Moll, “Knowledge Growth and the Allocation of Time,” Journal of
Political Economy, 2014, 122 (1), 1–51.
McKenzie, David and Christopher Woodruff, “Business Practices in Small Firms
in Developing Countries,” Management Science, 2017, 63 (9), 2967–2981.
, , Kjetil Bjorvatn, Miriam Bruhn, Jing Cai, Juanita Gonzalez-Uribe,
Simon Quinn, Tetsushi Sonobe, and martin Valdivia, “Training Entrepreneurs,”
VoxDevLit, 2020, 1 (1).
Melitz, Marc J., “The Impact of Trade on Intra-Industry Reallocations and Aggregate
Industry Productivity,” Econometrica, 2003, 71 (6), 1695–1725.
Munshi, Kaivan, “Information Networks in Dynamic Agrarian Economies,” in T. Paul
Schultz and John Strauss, eds., Handbook of Development Economics, 2007, pp. 3085–
3113.
Nakamura, Emi and Jon Steinsson, “Identification in Macroeconomics,” Journal of
Economic Perspectives, 2018, 32 (3), 59–86.
Perla, Jesse and Christopher Tonetti, “Equilibrium Imitation and Growth,” Journal
of Political Economy, 2014, 122 (1), 52–76.
, , and Michael E. Waugh, “Equilibrium Technology Diffusion, Trade, and
Growth,” 2021, 111 (1), 73–128.
Prescott, Edward C., “Needed: A Theory of Total Factor Productivity,” International
Economic Review, 1998, 39 (3), 525–551.
Romer, Paul M., “Endogenous Techniological Change,” Journal of Political Economy,
1990, 98 (5), S71–S102.
Sampson, Thomas, “Dynamic Selection: An Idea Flows Theory of Entry, Trade, and
Growth,” Quarterly Journal of Economics, 2016, 131 (1), 315–380.
Schennach, Susanne M., “Estimation of Nonlinear Models with Measurement Error,”
Econometrica, 2004, 72 (1), 33–75.
, “Mismeasured and unobserved variables,” in Steven N. Durlauf, Lars Peter Hansen,
James J. Heckman, and Rosa L. Matzkin, eds., Handbook of Econometrics, 2020,
pp. 487–565.
38
Van Patten, Diana, “International Diffusion of Technology: Accounting for Heteroge-
neous Learning Ability,” 2020. Working Paper.
Wallskog, Melanie, “The Mechanisms and Implications of Entrepreneurial Spillovers
Across Coworkers,” 2021. Working Paper.
Yatchew, A., “An elementary estimator of the partial linear model,” Economic Letters,
1997, 57 (2), 135–143.
39
Table of Contents for Online Appendix
A Additional Details from the Main Text 42
A.1 Balance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
A.2 Empirical Impact on More Productive Member of the Match . . . . . . . . 44
A.3 Parameter Targets and Values from Calibration . . . . . . . . . . . . . . . 46
A.4 Increasing θ to match RCT treatment effect . . . . . . . . . . . . . . . . . 47
B Examples of Different Matching Processes 48
B.1 Noise in the Imitation Process . . . . . . . . . . . . . . . . . . . . . . . . . 48
B.2 Effort Choice and Bargaining . . . . . . . . . . . . . . . . . . . . . . . . . 48
B.3 Deterministic Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
B.4 Cost to Receive a Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
C Additional Results: Extensions of Identification Procedure 52
C.1 Semi-parametric identification . . . . . . . . . . . . . . . . . . . . . . . . . 52
C.2 More general relationship between observables and z . . . . . . . . . . . . 52
C.3 Mis-measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
C.4 Additional characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
D Additional Results: Social Planner’s Problem 56
D.1 Quantitative Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
E Additional Results: Quantitative Implications of Mismeasurement 63
E.1 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
E.2 Updated Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
E.3 Quantitative Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
F Additional Results: Using the RCT of Cai and Szeidl (2018) 66
F.1 RCT Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
F.2 Overview of Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 66
F.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
F.4 Estimating Diffusion Parameters . . . . . . . . . . . . . . . . . . . . . . . 69
F.5 Remaining Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
F.6 Treatment Effect Persistence . . . . . . . . . . . . . . . . . . . . . . . . . 73
F.7 Quantitative Gains at Scale . . . . . . . . . . . . . . . . . . . . . . . . . . 73
G Proofs 76
G.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
G.2 Proof of Proposition 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
40
G.3 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
G.4 Proof of Proposition 7 (from Social Planner’s Problem) . . . . . . . . . . . 79
41
A Additional Details from the Main Text
A.1 Balance Tests
Our basic balance check is
yi0 = α0 + α1Ti + εi,
where yi0 is the baseline outcome for individual i and Ti = 1 if i is eventually treated.
Those results are in Table 3.
Table 3: Balance Test (from Brooks et al., 2018)
Control Mean Mentor - Control(1) (2)
Firm Scale:Profit (last month) 10,054 -975.25
(1186.76)
Firm Age 2.39 -0.05(0.23)
Has Employees? 0.21 -0.02(0.05)
Number of Emp. 0.21 0.02(0.06)
Business Practices:Offer credit 0.74 -0.02
(0.06)
Have bank account 0.30 -0.03(0.06)
Taken loan 0.14 -0.05(0.04)
Practice accounting 0.11 0.00(0.04)
Advertise 0.07 0.04(0.03)
Sector:Manufacturing 0.04 -0.03
(0.02)
Retail 0.69 0.03(0.06)
Restaurant 0.14 -0.02(0.05)
Other services 0.17 0.06(0.05)
Owner Characteristics:Age 29.1 -0.25
(0.64)
Secondary Education 0.51 -0.00(0.06)
Table Notes: Columns 1-2 are the coefficient estimates from the regression yi = α + βTi + εi, where Ti is an indicator for
treatment. Column 1 is α. Statistical significance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***.
42
We conduct the second balance test
yi0 = α0 +∑
j=L,M,H
αjTij + εi
where the indicator now depends on whether firm i is a treatment firm matched with a
bottom 25th percentile (denoted MiL), 25-75 percentile (TiM), or top 25 percentile firm
(TiH) in terms of baseline profitability.34 Table 4 reports the results. The only significant
difference is in age, and the magnitude is small.
Table 4: Balancing Test at Baseline
Control Mean TL - Control TM - Control TH - Control(1) (2) (3) (4)
Firm Scale:Profit (last month) 10,054 -732.65 -1337.06 -760.08
(1314.56) (1393.38) (2128.41)
Firm Age 2.39 0.04 -0.19 0.08(0.28) (0.30) (0.46)
Has Employees? 0.25 -0.10 -0.07 0.10(0.07) (0.07) (0.11)
Number of Emp. 0.23 -0.05 0.00 0.18(0.08) (0.08) (0.13)
Business Practices:Offer credit 0.74 -0.07 0.04 -0.03
(0.07) (0.08) (0.12)
Have bank account 0.30 -0.04 -0.05 0.06(0.07) (0.08) (0.12)
Taken loan 0.14 -0.07 -0.06 0.03(0.05) (0.05) (0.08)
Practice accounting 0.01 -0.01 0.01 -0.01(0.01) (0.02) (0.02)
Advertise 0.07 0.04 0.01 0.11(0.05) (0.05) (0.07)
Sector:Manufacturing 0.04 -0.02 -0.04 -0.04
(0.02) (0.03) (0.04)
Retail 0.69 -0.03 0.00 -0.10(0.08) (0.08) (0.12)
Restaurant 0.14 -0.06 0.00 0.03(0.05) (0.06) (0.09)
Other services 0.17 0.09 0.02 0.07(0.06) (0.07) (0.10)
Owner Characteristics:Age 29.1 0.92 -1.88 0.50
(0.79) (0.84)∗∗ (1.28)
Secondary Education 0.51 0.02 -0.08 0.13(0.08) (0.09 (0.13)
Table notes: Columns 1-4 are the coefficient estimates from the regression above, with column one being the estimate of theconstant α0. Statistical significance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***. All constants are significant at onepercent.
34We have experimented with a number of different ways to compute the balance table, and all show the same results.
43
A.2 Empirical Impact on More Productive Member of the Match
Since the more productive members of treatment matches were not randomly selected,
we require a different approach to identify any effect on these business owners. Brooks et
al. (2018) details these results, but Figure 11 reproduces some key results for simplicity’s
sake. Figure 11 suggests no statistically discernible discontinuity around the cutoff.
Figure 11: Profit for mentors and non-mentors (from Brooks et al., 2018)
(a) Full sample
−5000
05000
10000
15000
Pro
fit (K
sh)
−2 −1 0 1 2 3Normalized running variable
n=172, bins=15
(b) IK Optimal Bandwidth
−5000
05000
10000
15000
Pro
fit (K
sh)
−1 −.5 0 .5 1Normalized running variable
n=103, bins = 15
Figure notes: Figure 11 plots profit along with a fitted quadratic and its 95 percent confidence interval. Figure 11a usesthe entire sample, while Figure 11b uses the Imbens and Kalyanaraman (2012) procedure to choose the optimal bandwidth.Both use 15 bins on either side of the cutoff.
We next test this more formally. In particular, letting ε be the cut-off value for
mentors, we run the regression
πi = α + τDi + f(Ni) + νi (A.1)
where πi is profit, Di = 1 if individual i was chosen as a mentor (εi ≥ ε, f(Ni) is a
flexible function of the normalized running variable Ni = (εi − ε)/σε, and νi is the error
term. The parameter τ captures the causal impact of being chosen as a mentor. We use
local linear regressions to estimate the treatment effects on profit and inventory, along
with business practices of record keeping and marketing. The results are in Table 5, and
we find that being a mentor has no statistically significant effect on profits. Moreover,
there is no change in marketing or record-keeping practices, which one might associate
with ability. There is some evidence that inventory spending decreases, but it cannot
be statistically distinguished from zero. Overall, we find little evidence that entering
into a match changes either business scale or business practices for the more productive
member of the match. This is consistent with the max function in the forward equation
44
for ability (equation 2.1), which is assumed here and in much of the existing literature.
Table 5: Regression discontinuity results for matched firm treat-ment effect (Brooks et al., 2018)
Percent of IK Scale Practices
optimal bandwidth Profit Inventory Marketing Record
keeping
100 -503.18 -3105.87 0.01 0.02
(1321.82) (2698.11) (0.11) (0.18)
150 300.19 -2585.22 0.01 0.07
(1407.26) (2291.34) (0.09) (0.14)
200 322.09 -123.59 0.01 0.10
(1324.17) (1964.08) (0.08) (0.13)
Treatment Average 4387.34 8435.79 0.08 0.85
Control Average 1794.09 4039.20 0.13 0.63
Table notes: Statistical significance at 0.10, 0.05, and 0.01 is denotedby *, **, and, ***. Profit and inventory are both trimmed at 1 percent.
45
A.3 Parameter Targets and Values from Calibration
Table 6: Targets and Parameter Choices
Model Parameter Description Parameter Target Moment Source Target Model
Value Value Value
Group 1 From RCT
β Intensity of diffusion 0.538 Estimated parameter from regression (2.3) RCT results 0.538 0.538
ρ Persistence of ability 0.595 Estimated parameter from regression (2.3) RCT results 0.595 0.595
θ Match technology “quality” -0.417 Treatment effect in quarter 2 (as % above control) RCT results 0.403 0.403
Group 2 Matched one-to-one with parameter
δ Death rate of firms 0.016 Average age of baseline business owners Baseline survey 0.09 0.09
σ0 St. dev. of new entrant ability distribution 0.961 Variance of log profit among new entrants Baseline survey 1.00 1.00
ν Firm bargaining weight 0.50 Set exogenously – – –
Group 3 Jointly targeted
σ St. dev. of exogenous ability shock distribution 0.75 Standard deviation of log profit in all firms Baseline survey 0.99 0.99
c Growth factor in ability evolution -1.92 Ratio of average profit of all firms to new entrants Baseline survey 1.51 1.51
ω Consumption utility weight 0.53 Fraction of employment in wage work IPUMS 0.48 0.48
α Ability elasticity in supplier search 0.36 Standard deviation of log inventory cost Baseline survey 1.61 1.61
η Ability elasticity in supplier search 0.05 Average cost ratio Baseline survey 0.13 0.13
Table notes: Group 1 is jointly chosen from the experimental data. Group 2 are also set to match baseline data moments, but match 1-1 with target moments. Parametersin Group 3 are calibrated to jointly match moments.
46
A.4 Increasing θ to match RCT treatment effect
In the main text, we increase θ from θ = −0.417 to θ = −0.063. We replicate the results
here under a different quantitative experiment: we increase θ such that the policy change
induces the same partial equilibrium effect in the model as in our empirical results.
Specifically, we create a treatment and control group identical to our empirical RCT. We
then shock the treatment group by allowing them to draw from a new source distribution
M(z; θT ) instead of M(z; θ). We set θT such that running the RCT in the model delivers
an identical treatment effect to our empirical results.35 This implies that θ increases to
θ = 0.512.
Table 9 replicates Table 2 from the main text under this different quantitative exper-
iment.
Table 7: Equilibrium Moments
(1) (2) (3)
Baseline Fixed Wage New Equilibrium
Income 1.00 1.35 1.59
Ability 1.00 1.38 1.67
Aggregate Labor Supply 1.00 0.63 0.90
Wage 1.00 1.00 1.76
Table notes: Column (1) is the initial equilibrium, normalized to one. (2) and (3) report thenew stationary equilibrium after shocking the matching technology, where (2) holds the wagefixed at its baseline level and (3) allows it to adjust.
Overall, average income rises by 59 percent. When we decompose the aggregate
change, 57 percent of the total ability change comes from the direct effect on ability
(Column 2) and the remaining comes from the amplification through the wage. Similarly
for income, the direct effect accounts for 59 percent. The mechanisms for these changes
are discussed in the main text. Note, however, that these results point to a larger role
for price amplification (i.e., the price amplification accounts for 41 percent of the average
income gain here and 37 percent in the main text). This occurs because the induced
wage change is convex in the imposed θ change. Thus, the results point to a larger role
for price amplification as the aggregate policy change becomes larger.
35In the empirics, we let treated firms draw from the source distribution HT . The idea is the same, but does not require
the treatment source distribution to be in the same functional class as M .
47
B Examples of Different Matching Processes
In the main body of the paper, we provided two examples of matching processes that fall
under our assumptions, and we detail additional versions here.
B.1 Noise in the Imitation Process
An agent with ability z receives new arrivals of ideas that have two components: zm that
comes from a random match from another agent, and γ a random innovation on that idea.
Then z = γ1/θzm. Here, zm is a uniform draw from the distribution of productivities.
Then if γ has a cumulative density function given by Γ, then:
M(c) = Prob(z ≤ c) = Prob(zm ≤ cγ−1/θ) =
∫M(cγ−1/θ)dΓ(γ) (B.1)
B.2 Effort Choice and Bargaining
Each period, every agent characterized by ability z is matched to an agent that owns a
potential imitation opportunity zm as a uniform draw from the distribution of operating
firms M . The agent has an effort endowment of 1 that must be divided between imitation
and providing a utility benefit to the owner of the imitation opportunity zm. If z ≥ zm,
then no effort is put into imitation and z = z. If zm > z, then the agent and the owner
of the imitation opportunity must first agree on the distribution of effort, then the choice
of effort x and the values of z and zm together generate the value of z for the agent in
that period according to:
z =(zmz
)xz (B.2)
That is, by putting in more effort x ∈ [0, 1] the agent is able to close the gap between
their z and zm. The benefit to the owner of zm is given by the function b(x), which is
decreasing in x.
Agents and owners of imitation opportunities have one-off interactions and each re-
ceive 0 benefit if no agreement is made. They bargain over the assignment of the agent’s
effort between imitation and utility benefits for the owner of the imitation opportunity
according to a Nash bargaining problem where the bargaining weight of the agent is θ.
The bargaining problem is:
maxx∈[0,1]
([zmz
]xz)θb(x)1−θ (B.3)
48
Suppose that b(x) is given by b(x) = 1− x. Then it is easy to show that:
x = max
[0, 1− 1− θ
θ log(zm/z)
](B.4)
z = max[z, zme
1−1/θ]
(B.5)
As expected, the more bargaining power that the learning agents have, the greater is x,
resulting in greater z.
Note that, in the model, draws of imitation opportunities z < z are not useful. Hence,
the distribution M can be written, for any value c, as:
M(c) = Prob(z ≤ c) = Prob(zme1−1/θ ≤ c) = Prob(zm ≤ ce1/θ−1) = M(ce1/θ−1) (B.6)
or following the notation more standard in the paper:
∀z, M(z; z, θ) = M(ze1/θ−1) (B.7)
B.3 Deterministic Assignment
Here we consider a case where M arises when all agents can interact with one another
and sort into relationships endogenously. Suppose that every agent with ability z has the
option to influence any other agent that has ability z. Every agent can only be influenced
by one other agent each period, and they always prefer to be influenced by the highest
ability possible.
The utility of an agent with ability z influencing an agent with ability z is given by:
z
z− 1− 1
2θ
(z
z− 1
)2
(B.8)
That is, the agent with z gains benefit in proportion to how large the benefit is for
the other agent, but their cost is quadratic in the distance between their productivities.
For example, the influencer is happy when the other agent is helped by their influence,
but it takes more effort to influence when the distance between them is great. Therefore,
if there is a continuous distribution of z < z, the ideal agent that the influencer would
like to interact with has ability:
z∗(z) = z/(1 + θ) (B.9)
That is, the lower is the cost of influencing low ability firms, the deeper into the left tail
of the distribution is the agent willing to go.
However, since every agent can only be influenced by one agent each period and they
49
strictly prefer to be influenced by agents of higher ability, it is possible that (even if
the distribution is continuous) that the ideal agent for z is already matched to another
influencer. Therefore, intuitively, the probability distribution over assignment between
z and z is constructed by starting at the upper support of the distribution M , allowing
the highest ability firms to choose their most preferred matches, then descending down
through the distribution letting each firm choose to influence its preferred firm among
those remaining. Note that not all firms need have another firm to influence if their
utility from doing so be negative.
Formally, the probability distribution over imitation opportunities can be constructed
in the discretized case as follows, when the ability grid takes values z ∈ {z1, ...zN}, which
are ordered (i < j =⇒ zi < zj).
Define µ(z, z) as the measure of z influencing z (a N ×N matrix). We can construct
µ in the following steps given the measure µ of agents of each z type:
1. Let U(z, z) be the N × N matrix of utilities of z influencing z, and µ be a N × Nmatrix of zeros. Let µ be the N × 1 vector of unassigned influencers and µu be the
N × 1 vector of unassigned imitators. Set µ = µu = µ, n = N , and m = 1.
2. Let l be the m-argmax of U(·, zn). If U(zl, zn) ≤ 0, set µ(z1, zn) = µu(zn) and skip
to step 5.
3. If µ(zn) ≤ µu(zl), then µ(zn) = 0, µu(zl) = µu(zl) − µ(zn), and µ(zl, zn) = µ(zn).
Skip to step 5. Otherwise, go to 4.
4. If µ(zn) > µu(zl), then set µ(zl, zn) = µu(zl), µu(zn) = 0 and µ(zn) = µ(zn)−µu(zl).Set m = m+ 1 and return to step 2.
5. Set n = n− 1 and m = 1. If n = 0, go to step 6. Otherwise, go to step 2.
6. Set µ(·, z1) = µ(·, z1) + µu, and stop.
Given this matrix µ(z, z), the measure of assignments M is given by:
M(zi, zj) =
∑ik=1 µ(zj, zk)
µ(zj)
B.4 Cost to Receive a Match
Firms pay a cost to receive a uniform random draw from the productivity distribution.
We model this as a function f(s; θ), in that a firm that pays cost f(s; θ) receives a uniform
random draw with probability s. We assume that fθ > 0, so that the cost to achieve any
level of s is increasing in θ. In a stationary equilibrium, under standard conditions this
50
implies a stationary decision rule s(z,M∗; θ) with sθ < 0. We can write the distribution
of draws as
M(c; z, θ) = (1− s(z,M∗; θ)) + s(z,mM∗; θ)M∗(c)
≡ Q(c; z,M∗, θ)
Thus, in the stationary equilibrium of this economy, our same procedure goes through.
This example provides important context for our set of assumptions – they need not
be assumptions only on the primitives of the model. Additional assumptions, such as
stationarity, may guarantee the model is covered under our assumptions. The use of
stationarity here is similar to its role in the identification procedure of Jarosch et al.
(2020), who study learning among German co-workers.
51
C Additional Results: Extensions of Identification Procedure
In this Appendix, we focus on theoretical extensions of the main estimation procedure
in the paper to show that the procedure itself is robust to any number of extensions.
In Appendix E, we provide a quantitative evaluation of a particular extension related to
mis-measurement.
C.1 Semi-parametric identification
Assumption 1 laid out a function form for the law of motion of ability: z′(z, ε, z) =
ec+εzρ max{
1, zz
}β. We broaden this in Assumption 5 by replacing the max function,
Assumption 5. Given ability z this period, an imitation opportunity z, and a random
shock ε, ability next period z′ is given by
z′(z, ε, z) = ec+εzρf
(z
z
)(C.1)
Assuming f(z/z) = max{1, (z/z)}β gives us the original Assumption 1. Proposition 4
summarizes that we can instead estimate the function f using the same data-generating
process as in the main text.
Proposition 4. The data-generating process of Assumption 4 identifies (ρ, f) in equa-
tion (C.1) (while maintaining Assumptions 2 and 3 in the main text) by estimating the
regression
log(π′i) = c+ ρ log(πi) + f
(πiπi
)+ ε
This result follows directly from an established literature on partially linear regres-
sions. See, for instance, Yatchew (1997) on differencing estimators in this class of mod-
els.36 Hardle et al. (2000) provides a detailed review.
C.2 More general relationship between observables and z
In Assumption 2 we assume that some observable (e.g., profit) π has the characteristic
that π ∝ z. We extend that here.
Assumption 6. There exists a known function g : X → R++ that maps observable
characteristics x to ability z up to a potentially unknown constant of proportionality.
That is, g(x) = Cz for some potentially unknown constant C.
36The basic idea is that by ordering the data such that π1/π1 < π2/π2 . . . < πN/πN , once can difference out thenonlinear f in the limit under conditions that guarantee that the gap between i and i + 1 goes to zero. This allowsstraightforward OLS to estimate ρ. Then f can be estimated non-parametrically by any number of methods.
52
By assuming π ∈ x and g(x) = π, we recover the original assumption π ∝ z. But
Assumption 6 allows for more complicated possibilities. For example, one could estimate
a production function using the control group panel data. Such a procedure would imply
a mapping g(y, n, k) = Cz. That is, it takes output data y and input data for labor and
capital (n, k) (or any other input bundle) and infers the value z.37
Proposition 5. The parameters (β, ρ, θ) are identified when we replace Assumption 2
with Assumption 6.
This follows almost directly, as the regression
log(g(x′)) = c+ ρ log(g(x)) + β log
(max
{1,g(x)
g(x)
})+ ε,
is straightforwardly estimated with known g. By Assumption 6 this collapses to the
required equation that gives (β, ρ). The second step goes through with the same adjust-
ment. The key to our procedure is not the proportionality of any one variable with z,
but a proportional mapping between any set of observables and z.
C.3 Mis-measurement
We now assume that profit is mis-measured. This affects our estimation procedure,
introducing bias into the parameters. The regression error is not additively separable
from the true value in our non-linear model, which implies that standard instrumental
variable methods to correct linear measurement error no longer hold. Yet, there is a
substantial and active literature on mis-measurement in non-linear models that provides
a number of ways to overcome this issue.
We discuss this issue in the context of a more general ability law of motion, to
emphasize that it does not depend on the specific choices we have made on functional
forms. We provide a quantitative evaluation of these issues in Appendix E.
Assumption 7. The law of motion for diffusion can be written as
log(π′) =M∑j=1
βjgj(~π) + ε
where ~π = (π, π) for a known function (gj)Mj=1.
Note that we have already imposed the maintained ability to move between ability z
and profit π. The key additional assumption is an adjustment to Assumption 4, which
governs the type of data to which we have access.
37g is not the production function here, but is inferred from it.
53
Assumption 8. Profit is measured with error, and we observe two outcomes that are
mis-measured versions of the true value, ~π∗ = (π∗, π∗). We denote ~πk as the two entries
in the vector. We denote these observable values as (πk1 , πk2), k = 1, 2. The measurement
error is classical, so that for each individual i we observe
~πk1i = ~π∗ki + νk1i, k = 1, 2
~πk2i = ~π∗ki + νk2i, k = 1, 2
where ν1 and ν2 are unobserved disturbances. We assume the following relationships
between the measurement error and true values:
E[νk1 |π∗k, νk2 ] = 0, k = 1, 2
νk2 is independent from ~π∗, ν−k2 , where − k 6= k
The assumption of a repeated measurement opens up a suite of tools related to
repeated measurement adjustments in non-linear estimation. While other methods exist
to solve this problem, the existence of this approach has the benefit of almost always
being available in a firm-level survey.38
The basic idea behind this identification approach comes from Kotlarki’s lemma,
which in R1 is
φπ∗(t) = exp
(∫ t
0
E[iπ1eitπ2 ]
E[eitπ2 ]
)and φπ∗(t) is the characteristic function φπ∗(t) =
∫Reitπ
∗f ∗π(π∗)dx. An inverse Fourier
transform gives us the distribution of true values fπ∗ , which can be used to construct the
relevant estimator moments.
Our model requires ~π∗ = (π∗, π∗) ∈ R2, which introduces some complications. The
second part of Assumption 8 provides necessary conditions for identification in a multi-
dimensional nonlinear model.
Proposition 6. If E[|~πk|] and E[|ηk1 |] are finite, then there exists a closed form for any
function E[u(~π∗, β)] whenever it exists.
Proof. Provided in Schennach (2004). �
Schennach (2004) provides the details of the approach and how to develop an estima-
tor from Proposition 6. The key feature, however, is that this result allows a broad class
of extremum estimators to be deployed to identify β.
38For example, π1 could be profit asked directly while π2 could be measured as revenue minus costs. Another would beto use the same variable measured at two points in time, as highlighted by Schennach (2020). Finally, many models allowrelationships between input expenditures, revenue, and profit that could be utilized, albeit with more structure than wehave here. An example is a Cobb-Douglas production function with competitive factor markets.
54
C.4 Additional characteristics
One might also suspect that alternative characteristics influence learning. For example,
firm owners may retain more ability when meeting with another owner of a similar age.
This amounts to allowing β to depend on a set of characteristics of the firm owner x and
her match x. In this case, we can write
log(π′i) = c+ ρ log(πi) + β(x, x) log
(max
{1,πiπi
})or, binning characteristics X× X in some way,
log(π′ib) = c+ ρ log(πib) +B∑b=1
βb log
(max
{1,πibπib
})This regression identifies (ρ, β1, . . . , βB) under the same assumptions as in the main text.
The second step follows with a slight adjustment to Assumption 3. We assume there is
a discrete distribution over types Γb and, with a slight abuse of notation, re-write the
draws over types and profit as
M(z, b; z, θ) = Mb(z; z, θ)Γb
As long as Mb has the same properties as Assumption 3, the second step of the procedure
similarly identifies θ.
55
D Additional Results: Social Planner’s Problem
Here, we lay out the solution to the social planner’s problem, and show that it similarly
relies on properly measuring the intensive and extensive margins.
Before doing so, one conceptual issue to deal with is the role of suppliers. We exclude
them from our measure of welfare. In our context, these suppliers take their profits out of
Dandora, so that buying intermediates does indeed involve resources exiting the economy.
To operationalize this idea, we assume that the price paid follows the same solution as
the Nash bargaining problem solved in Proposition 3, in that the price px is given
px =
(1− η − ν(1− η − α)
α
)e−sz
α+η−1α .
We will write px(z, s) to denote this price.39
With those details, we are ready to define the planner’s problem. The social planner
allocates occupations o, firm inputs (x, n) and supplier search intensity s for each agent
to maximize ex ante utility, subject to the relevant aggregate resource constraints. Since
we will focus on the stationary equilibrium, we drop the dependence of the decision rules
on the aggregate state M for some notational simplicity. Defining the value to an agent
with ability z as
v(z) = ω log(c(z)) + (1− ω) log(1− s(z)) + (1− δ)∫ε
∫z
v
(ec+εzρ max
{1,z
z
}β)dM(z)dF (ε),
we can write the planner’s problem recursively as
maxo(·),c(·),x(·),n(·),s(·)
∫ ∞0
v(z)dM(z) (D.1)
s.t.
∫o(z)=1
(x(z)αn(z)η − px(z, s(z))x(z)
)dM(z) =
∫ ∞0
c(z)dM(z) (D.2)∫o(z)=1
n(z)dM(z) =
∫o(z)=0
dM(z) (D.3)
M(z′) = δG(z′) + (D.4)
(1− δ)∫ ∞
0
∫ ∞0
F (log(z′)− ρ log(z)− β log(max{1, z/z})− c)dM(z)dM(z)
M(z;M) =
( ∫ z0φ(z,M)dM(z)∫∞
0φ(z,M)dM(z)
) 11−θ
. (D.5)
39There is nothing conceptually difficult about including these suppliers in the measure of welfare. Our goal is only toremain faithful to the economic environment from which the empirics are derived. A further benefit of this assumption isthat it focuses attention on the role of the diffusion externality, as opposed to other types of inefficiencies that arise fromthe bargaining protocol.
56
While complicated looking, this problem has a straightforward interpretation. The
planner’s objective is to maximize the expected value of v. The first two constraints
are the aggregate resource constraints – (D.2) determines the resources that can be al-
located to consumption, while (D.3) equalizes labor supply and demand. The latter
two constraints show that the planner internalizes how her decisions affect the evolution
of the aggregate state (via D.4) and the imitation opportunities that arise from it (via
D.5). These constraints highlight the planner’s ability to overcome the diffusion exter-
nality, in that she takes into account the implications of occupational choice on learning
opportunities in a way that individual agents do not.
We measure the difference in welfare between the allocation chosen by the planner and
the baseline laissez faire equilibrium in consumption-equivalent terms. That is, we ask
by what percentage we would we have to increase each agent’s consumption in every state
and time period to equalize average utility between the two economies. This difference
is our measure of the aggregate importance of diffusion.
While the planner’s problem does not admit a closed-form solution, we can derive
some implications for comparison to the baseline laissez faire equilibrium.
Proposition 7. The planner chooses a cutoff rule for occupations, z, such that all z ≥ z
operate firms. Consumption cp is constant across agents and given by the constant returns
to scale aggregate production function cp = ANη
1−αs Z
1−α−η1−α , where A is a constant and the
two aggregate inputs are
Ns =
∫ z
0
dM(z) , Z =
∫ ∞z
z exp(s(z))α
1−α−η dM(z).
Moreover, if s(z) > 0 (which we verify at our estimated parameters), s(·) is a strictly
increasing and concave function of the form
s(z) =
(1− α− η
α
)W0
[(α
η + α− 1
)exp
(α
η + α− 1
)q(z/z, s(z))
]+ 1 ∀z ≥ z,
where W0(·) is the principal branch of the Lambert W function and q(·, ·) is a function
of relative ability z/z and supplier search at the cut-off value, s(z).
Proof. The proof is at the end of the Appendix, in Appendix Section G. �
One can see the goals of the planner in Proposition 7. Like in the baseline, the planner
uses a cut-off rule to define occupations. As we show below, she chooses fewer firms than
the baseline equilibrium, a function of the diffusion externality in the baseline economy.
Furthermore, she shifts the search for suppliers away from relatively low ability agents.
Instead, she takes advantage of complementarity between ability and supplier search
57
effort. This allows higher ability agents to procure resources that can be redistributed
to all agents. These incentives are naturally absent in the baseline equilibrium, where
individuals consume their income.
In terms of our inability to push further theoretically, the properties of W0 preclude
a closed form solution.40 We derive these results in Appendix G.4, and proceed with
quantitative results in the next section.
D.1 Quantitative Implications
Our interest here is understanding the importance of separately identifying the intensive
and extensive margin parameters to understand the welfare gains in the social planner’s
problem.
We follow a similar (but slightly simpler) approach to the main text. We exogenously
vary β, then re-estimate θ to continually match the same average treatment effect. Thus,
the average treatment effect is still used as an estimating moment, but our first stage
regression (run within the treatment group) is not.
Throughout, we hold the persistence ρ fixed for simplicity. Figure 12 plots the implied
value of θ and the consumption-equivalent welfare gains, the latter of which is normalized
to one at our estimated value.
The welfare implications are in Figure 12b. We focus first on the solid line. This is our
baseline counterfactual in which θ is re-estimated to achieve the same average treatment
effect. The differences here can be substantial. Increasing β from 0.25 to to our estimated
value 0.538, for example, increases the consumption equivalent welfare gain by 59 percent.
Perhaps more surprisingly, overestimating the intensive margin forces can similarly bias
downward welfare gains. Assuming β = 0.90 instead of 0.538 lowers the implied welfare
gains by 40 percent. We detail to the economic forces governing these trade-offs in the
next section. For now, however, the results show that understanding effecient welfare
gains depends critically on separately identifying the parameters governing the extensive
(here, θ) and intensive (here, β) margins, as our procedure does.
D.1.1 Understanding Forces Behind This Pattern
To better understand the forces behind the pattern above, we also re-estimate the welfare
gains while holding θ fixed at its baseline value. This is the dashed line in Figure 12b. The
limited difference between the two lines shows that the welfare gains are primarily driven
40The Lambert W function is the inverse of F : x 7→ x exp(x), and we can restrict attention to the princi-pal branch. The main issue for our purposes is that the only analytical characterization of W0 is the power seriesW0(a) =
∑∞n=1((−n)n−1/n!)an, which is of little help in attempting to attain a closed form for the planner’s problem
here. Put slightly more technically, the supplier search function s is the solution to a delay differential equation in z, butthis solution prevents an analytic characterization of the initial condition s(z).
58
Figure 12: Importance of Separately Identifying Two Diffusion Effects
(a) Implied θ
0 0.2 0.4 0.6 0.8 1-10
-8
-6
-4
-2
0
(b) Consumption-Equivalent Welfare
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
Figure notes: The left panel shows the implied θ required to estimate the same treatment effect observed in the data atthe exogenously given β. The dashed vertical line the minimum value of β for which there exists a θ that can rationalizethe observed average treatment effect. This is related to the discussion of Γmin defined in Proposition 2. θ → −∞ as βapproaches this value from above. The right panel shows the implied consumption-equivalent welfare, normalized to oneat our estimated parameters. Our estimated values are denoted by the circle in each graph.
by the direct effect of β, and not the implied differences in θ.41 Thus, understanding the
main results above primarily requires understanding the impact of changing β. We study
that here.
A useful starting point is to re-write total resources in the planner’s allocation in terms
of semi-elasticities. Recalling the aggregate production function defined in Proposition
7, we can then define the semi-elasticity of consumption with respect to the occupational
cut-off, εcp := (∂cp/∂z)/cp, as the sum of the aggregate input elasticities,
εcp =
(η
1− α
)εNs +
(1− α− η
1− α
)εZ . (D.6)
Equation (D.6) tells us the consumption change induced when the planner slightly shifts
the occupational cut-off. There are two components to the consumption gains. The
first is static – holding the ability distribution fixed, increasing the cut-off mechanically
increases labor supply. This positively affects εNs and negatively affects εZ . But key in
this model is that shifting z also affects the ability distribution M through diffusion. This
is the dynamic effect on production, as the mass of the population in each occupation
changes as M changes.42 Figure 13 plots the semi-elasticity εcp , along with those of the
41The difference between the two is driven by the fact that the welfare gains are declining in θ.42It is straightforward to show that each of these semi-elasticities can be decomposed into a static and dynamic term,
as εj = εSj + εDj for j = Ns, Z. The static reallocation across occupations plays almost no quantitative role here. Thus,when interpreting the results here, one should think of them as driven by the dynamic diffusion effects.
59
two aggregate inputs defined in (D.6). They are evaluated at the baseline equilibrium
cut-off.
Figure 13: The Semi-Elasticity of Consumption for the Social Planner
(a) Semi-Elasticity of Consumption, cp
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4(b) Components
0 0.2 0.4 0.6 0.8 10
0.2
0.4
0.6
0.8
1
1.2
1.4
Figure notes: These semi-elasticities are evaluated at the baseline equilibrium cut-off z∗ and equilibrium distributionM∗(z; z∗).
The first thing to note about Figure 13a is that the elasticity is positive for any
value of β. Thus, no matter the parameters, the planner can increase consumption by
transition some baseline entrepreneurs to wage work. This is entirely a function of the
diffusion externality. Moreover, as expected, it follows a similar shape to the overall
welfare gains. Finally, Figure 13b shows that this is in large part driven by εZ . That is,
the consumption response is primarily driven by the changes to ability in the economy.
Why, then, does εZ take this shape? This is at the heart of understanding how
the overall welfare gains vary with the critical parameter β. Figure 14a provides a me-
chanical rationale: the numerator of εZ is approximately linear while the denominator
is substantially more convex. But both of these curves have natural economic interpre-
tations. The numerator, ∂Z/∂z, measures the change in economy-wide ability from a
marginal increase in the cut-off. It is monotonically increasing – the higher β, the larger
the potential impact on ability. This means that all else equal, a higher β induces a
larger increase in consumption for the planner.
But the welfare gains depend on how that composite ability compares to the baseline
equilibrium. This baseline ability is given by Z in Figure 14a, and also depends on β.
In particular, higher β increases the gains from a good match. This increases ability and
thus drives up the return to labor, which increases the wage. The higher wage induces
60
more wage workers, as Proposition 3 shows that the baseline cut-off has the feature
z ∝ w1−α
1−η−α (Figure 14b shows it graphically). These same steps are then compounded
in equilibrium. With fewer, more able firms, diffusion accelerates even faster as agents
further increase ability. The stationary equilibrium wage then takes the convex share of
Figure 14a.
Together, these results highlight the two competing forces in the model, both of which
can be seen in Figure 14. At low levels of β, the matching technology limits the aggregate
welfare gains. At high levels of β, we see similar welfare gains, but for a different reason.
Here, standard equilibrium forces already accomplish most of what the planner would
want to accomplish. While the baseline economy is clearly richer, the returns to additional
intervention by the planner at high β are low. We summarize these forces in Figure 14c,
which plots average consumption across the baseline and efficient allocations. Consistent
with these results, average baseline consumption grows more slowly than the efficient
consumption at low β, then more quickly at high β.
Thus, these two competing economic forces – the race between technology and equi-
librium prices in generating diffusion – are critical in generating the non-monotonicity
observed in the headline results. Moreover, they are governed are the diffusion parameters
we estimate. Thus, estimating these parameters plays an important role in understanding
the aggregate implications in the economy.
Much like our main results, the planner’s problem shows that separating the extensive
and intensive margin is critical to understanding the gains from policy, whether they be
a change to the matching technology (as in the main text) or the fully efficient planner’s
allocation (here).
61
Figure 14: Equilibrium Forces and the Pattern of εZ
(a) Decomposing εZ
0 0.2 0.4 0.6 0.8 1
β
0
2
4
6
8
10
12
∂Z/∂z (numerator of εZ)Z (denominator of εZ)
(b) Wage w and Cut-off Ability z in the Baseline Equilibrium
0 0.2 0.4 0.6 0.8 1
β
0
0.5
1
1.5
2
2.5
3
3.5
4
w
z
(c) Average Consumption
0 0.2 0.4 0.6 0.8 1
β
0
0.5
1
1.5
2
2.5
3
3.5
4
Baseline
Efficient
Figure notes: In Figures 14b and 14c, our estimated values are denoted by the circle in each graph and normalized to one.
62
E Additional Results: Quantitative Implications of Mismea-
surement
We explore the quantitative consequences of mis-measuring profit here. We make the
following assumption: for all individuals, we observe π = τπ∗, where τ ∼ N(0, στ ) is
classical measurement error (in logs), π is observed profit, and π∗ is true profit. We
assume that τ ∼ N(0, στ ), where στ is known but the individual realizations are not. As
discussed in Appendix C this can be extended to estimate the distribution of τ . We leave
this aside for simplicity here. Define the function
g(π∗, π∗; Γ) = c+ ρ log(π∗) + β log
(max
{1,π∗
π∗
})where Γ := (c, ρ, β) is the parameter set. Our first stage regression is log(π′∗) =
g(π∗, π∗; Γ), but is complicated by the mis-measured profit on the right hand side of
this equation. Here, we re-estimate this regression with mis-measured profit and show
how the results change.
E.1 Estimation Procedure
Some notation is required. For any variable x, define x = log(x), fx(x) as the probability
density function, and φx(t) =∫Reitxfx(x)dx as its characteristic function.
We first estimate the characteristic functions of the observed π and π,
φπ(t) =
(1
n
n∑j=1
eit log(πj)
)φk,π(hπt)
φ˜π(t) =
(1
n
n∑j=1
eit log(πj)
)φk,π(hπt)
The first term in parenthesis is the empirical characteristic function using mis-mesaured
variables (πj, πj). The latter term, φk(ht), is the Fourier transform of a kernel density
estimator with bandwidth h.43
Since φπ∗(t) = φπ(t)/φτ (t) and similarly for π (due to the independence of π and π),
we recover the densities of π and π from an inverse Fourier transform,
fπ∗(π∗) =
1
2π
∫φπ∗(t)e
−itπ∗dt
fπ∗(π∗) =
1
2π
∫φπ∗(t)e
−itπ∗dt
43A well-known issue with this type of estimation is the inaccuracy of the empirical characteristic function in the tails ofthe distribution. A kernel density estimate is one version of what is generally referred to as a dampening factor to improveaccuracy. The fact that φk enters multiplicatively follows because a kernel estimator is also a type of convolution.
63
where π in the ratio 1/(2π) should be understood to be π ≈ 3.14 instead of profit (as to
not introduce any additional notation).
With the true distribution functions, we can estimate our regression in any number
of ways. We use the minimum distance estimator proposed by Hsiao (1989). That is, we
choose parameters Γ := (c, ρ, β) to solve
minΓ
n∑i=1
(π′i −G(πi, πi; Γ))2
(E.1)
where
G(π, π; Γ) =
∫ ∫g(π∗, π∗)fπ∗|π(π∗|π,Γ)fπ∗|π(π∗|π,Γ)dπ∗dπ∗
The minimum distance estimator in (E.1) is also extended to unknown error distri-
butions by Li (2002) using the repeated-measurement framework discussed in Appendix
C.
E.2 Updated Calibration
After estimating the diffusion parameters, we update the calibration to take these values
into account. The updated parameters are listed in Table 8, along with the baseline for
comparison. We do so in two scenarios: στ = 0.30 and στ = 1.
Table 8: Updated Parameter Values
Model Description Parameter Parameter Parameter
Parameter (Baseline) (στ = 0.3) (στ = 1)
Exogenously varied:
στ St. dev. of distortions 0 0.3 1.0
Group 1
β Intensity of diffusion 0.538 0.629 0.883
ρ Persistence of ability 0.595 0.371 0.494
θ Match technology “quality” -0.417 -0.326 -0.179
Group 2
δ Death rate of firms 0.016 0.016 0.016
σ0 St. dev. of new entrant ability distribution 0.961 0.961 0.961
ν Firm bargaining weight 0.5 0.5 0.5
Group 3
σ St. dev. of exogenous ability shock distribution 0.75 0.74 0.66
c Growth factor in ability evolution -1.92 -2.23 -2.41
ω Consumption utility weight 0.53 0.54 0.62
α Ability elasticity in supplier search 0.36 0.36 0.36
η Ability elasticity in supplier search 0.05 0.05 0.05
Table notes: Group 1 is jointly chosen from the experimental data. Parameters in Group 2 are calibrated tojointly match moments. Group 3 are also set to match baseline data moments, but match 1-1 with targetmoments. Both are set to match the same set of moments discussed in the main text (see Table 6 for details).
64
E.3 Quantitative Exercise
We now study the gains from the same exercise as in the text. The main results are in
Table 9. The first column fixes the wage at its baseline level, isolating the impact of the
changing ability distribution. The second column allows the wage to adjust, adding in
the additional general equilibrium effect on prices.
Table 9: Equilibrium Moments
στ = 0.3 στ = 1
(1) (2) (3) (4)
Fixed Wage At-Scale Fixed Wage At-Scale
Income 1.08 1.12 1.20 1.38
Ability 1.08 1.14 1.20 1.42
Aggregate Labor Supply 0.92 0.99 0.89 1.00
Wage 1.00 1.14 1.00 1.39
Table notes: All are measured relative to the baseline equilibrium at the give value of στ . Each columnreports the new stationary equilibrium after shocking the matching technology, where the first (columns1 and 3) holds the wage fixed at its baseline level and the second allows it to adjust.
Overall, by biasing our parameter β toward zero, our results are a lower bound on the
gains at-scale. Figure 15 shows that the same results on the many-to-one relationship
between the ATE and at-scale gains holds.
Figure 15: Range of Aggregate Gains for each ATE
(a) στ = 0.3
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
(b) στ = 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.70
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure notes: Shaded area is all possible aggregate gains for (β, ρ) ∈ [0.20, 0.95]× [0.20, 0.95], where θ is chosen to matchthe ATE on the horizontal axis. The baseline estimate is given by the dashed line.
65
F Additional Results: Using the RCT of Cai and Szeidl (2018)
F.1 RCT Details
In an innovative recent paper, Cai and Szeidl (2018) (CS hereafter) conduct an RCT
among 2,820 Chinese firms. Treated firms (1,500 of 2,820) are randomly placed into
a group of approximately 10 other firms, then compared to a control group with no
prearranged meetings.
While similar in style to the RCT used in the main body of the paper, there are a
number of differences. First, these are group meetings instead of individual meetings.
Second, the meetings are substantially more intense: firms are expected to meet monthly
for a year.44 Third, these firms are larger. At baseline the average firm has 35 workers.
Finally, they cross-randomize information about new financial products to track the
diffusion of information directly.
Firms are surveyed 3 times: before the treatment, 1 year post-treatment (i.e., at the
end of the treatment period), and 2 years post-treatment. We refer to these as t = 0, 1, 2.
We estimate our procedure off the t = 1 data and later ask whether we can match the
effect at t = 2.
F.2 Overview of Empirical Results
While we will not do justice to the full set of results provided in CS, we provide a brief
overview here to relate them back to our baseline RCT in the main text and provide
context for modeling decisions. Like our baseline results, CS find important effects on
firm performance. In a regression of the form
yit = λ0 +
≡ time effects︷ ︸︸ ︷λ1 · 1t=1 + λ2 · 1t=2 +
≡ per-period ATE︷ ︸︸ ︷λ3 · (1t=1 × Tit) + λ4 · (1t=2 × Tit) +FirmFEi + εit
they find positive and statistically significant effects at t = 1 (i.e., λ3 > 0) for sales, profit,
employment, and productivity. These meetings do indeed seem to leave firms better off.
Unlike our baseline RCT, however, these effects persist at t = 2, a full year after the
treatment concludes (i.e. λ4 > 0).45
A number of additional results provide context for these firm performance effects.
First, they find persistent changes in management practices.46 Thus, direct components
of firm-level productivity increase. Second, they cross-randomize information on a new44Take-up is still high, with average attendance of 87 percent.45Productivity here is measured as firm-level TFP from estimating a revenue production function among control firms.
This is the only outcome of those listed in which the point estimate at t = 2 is statistically insignificant, but it is stillpositive. We read less into this result, as it is likely the most noisily estimated.
46These include an overall management index, along with components on the evaluation and communication of employeeperformance, the setting of targets, process documentation, and delegation of power.
66
financial product to test whether information is indeed flowing between firms within the
group, and find that it does. We (and CS) take this as a direct measure that information
is flowing between firms in the group.
Finally, CS provide one measure of group-level heterogeneity, asking whether treated
firms who have larger average group members enjoy a larger treatment effect. Denoting
nmi as the average firm size of firm i’s matches, they run
yit = λ0 + λ1 log(nmi ) + εit for i in the treatment group. (F.1)
They find statistically significant increases in sales and profit. CS use (F.1) as an “internal
consistency check,” in the sense that most reasonable theories would predict λ1 > 0. As
is hopefully clear at this point, this type of regression has an additional role: it is a
critical test for extrapolating these results to at-scale implications. We will exploit this
regression below in our estimation.
F.3 Model
The empirical setting and results motivate our model structure. Because these are larger
firms, we we study this RCT in the context of a more classic Hopenhayn (1992) style
model. Given the results on productivity and management practices, we allow diffusion
of firm productivity directly, instead of the ability to seek out suppliers. This assumption
is more in line with the existing growth literature (Lucas, 2009; Perla and Tonetti, 2014,
and many others). Finally, we construct our learning process to conform to the available
empirical results, which we discuss more below.
Basics: Production and Households The model period is one year (the length of the
exogenous matches in the RCT). There is a measure one of firms, each of which produces
according to the production function yt = z1−α−ηt nαt k
ηt , where zt is firm-level productivity,
nt is labor, and kt is capital. Capital and labor are traded on a competitive spot markets
with prices rt and wt. Each period δ firms exit and are replaced by δ new firms, who
draw initial productivity from z ∼ G(z).47 As in the main text, firms draw idiosyncratic
shocks ε ∼ F and imitation shocks z ∼ M .
A representative household with flow utility u(Ct) provides labor and capital, and
47In the more classic Hopenhayn (1992) or Melitz (2003) sense, the model can be easily extended to include endogenousfirm entry/exit by introducing fixed costs, as opposed to our assumed exogenous entry/exit margin. Adding this additionalmargin does not change the results we focus on here and we therefore exclude it for simplicity. This assumption also doesnot affect identification. Our procedure relies only on firms in operation at a given point in time, so that the details ofpast entry are immaterial.
67
owns all firms. Its utility is given by:
max{Ct,Kt+1}≥0
∞∑t=0
(1− δ)tu(Ct)
s.t. Ct +Kt+1 − (1− λ)Kt = wt + rtKt + Πt
K0 given
where Πt is the aggregate profits from all operating firms and λ is depreciation rate of
capital. We assume the household discounts at the firm exit rate for simplicity.
The aggregate state of the economy is the distribution of firm-level productivity,
M(z), and the aggregate capital stock, K.
Learning and Diffusion Given the high cost-effectiveness of the program, CS posit a
number of possibilities for why firms did not self-organize these meetings. These include
search costs, trust barriers and lack of familiarity with other managers, and a free-rider
problem in which managers expect others to pay the cost of organization. Motivated
by these “missing” meetings, we set up a source distribution in which a firm receives no
meetings with probability 1− θ. With probability θ, the firm joins a group of exogenous
size N . These N firms are uniformly random draws from the equilibrium productivity
distribution M . We denote z1, . . . , zN to be the productivities of the N matched firms.
We next define a match z in this context, which in general can take any function over
the characteristics of these N firms. Here we are constrained by the results reported in
CS, who report how the treatment effect varies by only the average size of the N firms.
As such, we assume that z =∑N
i=1 zi/N so that we can use their provided moment.
If a firm does not receive a match, it gets z = 0.48 We can therefore write the source
distribution as
M(z) = 1− θ + θQ(z),
where Q(·) is the N -draw sampling distribution of the mean derived from the equilibrium
productivity distribution M .
Finally, we note that within treated firms, there is no differential effect between firms
with z > z or z < z. Therefore, we remove the max operator and assume that the law of
motion takes the form
zt+1 = ec+εtzρt
(1 +
ztzt
)β. (F.2)
Equation (F.2) shows that if the firm does not interact that period (Pr = 1 − θ), it
48This is of course not a critique of the extremely useful publicly-provided dataset of CS. We note this only to highlightthat there is no theoretical benefit to making this assumption, and we do so because it is the only estimating moment ofgroup-level heterogeneity available in their public data.
68
gains nothing from its zt = 0 draw. Conditional on meeting (Pr = θ), however, there
will be gains from doing so. Those gains are increasing in the average productivity of
matched firms.
Taken together, the firm’s problem can be written as (with the aggregate state sup-
pressed, as we will focus on a stationary equilibrium):
v(z) = maxn,k≥0
z1−α−ηnαkη − wn− rk + (1− δ)∫ε
∫z
v(z′(z, ε; z))M(dz,M)dF (ε)
s.t. z′(z, ε; z) = ec+εzρ(
1 +ztzt
)βEquilibrium The stationary equilibrium of this economy is an invariant distribution
M∗(z), household decision rules C, K ′, firm decision rules n, k, and value function v
such that the household’s and firms’ value function solves their respective problems with
the associated decision rules, markets clear
1. labor market:∫zn(z)dM∗(z) = 1
2. capital market:∫zk(z)dM∗(z) = K
3. consumption market:∫zz1−α−ηnαkηdM∗(z) = C + λK
and the relevant aggregates are consistent
1. profit:∫zπ(z)dM∗(z) = Π
2. law of motion for ability:
M ′ := Λ(M(z′))
= δG(z′) + (1− δ)∫ ∞
0
∫ ∞0
F (log(z′)− ρ log(z) + β log(1 + z/z)− c)dM(z)dM(z)
with M∗(z) = Λ(M∗(z)).
F.4 Estimating Diffusion Parameters
We begin by using our two-step procedure to estimate the key diffusion parameters
(β, ρ, θ).
The production function here generates the usual Cobb-Douglas result that inputs
labor and capital, sales, and profit are all proportional to z. Therefore, we are free to
use any of these moments for identification (see also Appendix Section C.2 for a further
generalization).
69
Averages are also proportional to z. If a firm matches with N firms of size n1, . . . , nN ,
then we can infer the average productivity as
n :=
∑Nj=1 nj
N=(αw
) 1−η1−α−η
(ηr
) η1−α−η
∑Nj=1 zj
N∝ z. (F.3)
Given these results, we focus on firm size here for our estimation. As discussed above,
it is the only moment available in the public CS data for the first step of our estimation
procedure. Rather than introduce a second dependent variable, we use firm size in the
second step as well.
Following our procedure in the text, we will estimate the diffusion parameters using
only t = 0, 1 data then later check whether the time series matches t = 2 outcomes.
Thus, the first step of our procedure is to estimate
log(n′i) = c+ ρ log(ni) + β log
(1 +
nini
)+ ε for all i in treatment (F.4)
where ni is the average size of matched firms as defined in (F.3). Given (ρ, β) we then
estimate θ with the same procedure as the main text.
minθ
abs
(E[n′T ]
E[n′C ]−∫ ∫
πρ (1 + n/n)β dHT (n)dHT (n)∫ ∫nρ (1 + n/n)β dM(n; θ)dHC(n)
)(F.5)
where HC , HT are the empirical baseline distributions of control and treatment firms, M
is the source distribution for control firms, and HT is the source distribution for treated
firms (given exogenously by the empirical implementation). We measure the empirical
ratio in (F.5) as the average treatment effect
log(n′i) = λ0 + λ1Ti + Controlsi + νi (F.6)
where Ti = 1 if i is in the treatment group. Table 10 provides the regression estimates
for (F.4) and (F.6).
Column (1) show that the average firm size is indeed a good predictor of treatment
effect magnitude. The ATE in column (2) then implies θ = 0.199. That is, the model
infers that it is quite unlikely that such a CS-style group would be created without the
intervention.
F.5 Remaining Calibration
We calibrate the remaining model to target parameters similar to the main text, using
CS data and other relevant moments. We choose 5 parameters that match directly to
70
Table 10: Identification Moments
(1) (2)
β 0.276(0.053)***
ρ 0.955(0.028)***
Treatment 0.076(0.044)*
R2 0.764 0.395Baseline Control Avg – 2.694
Table notes: Standard errors are in parentheses. Statistical significance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***.
their moments. These include a 9 percent firm death rate (δ = 0.09) to match exit rates
in China (e.g. Lu, 2021). The standard deviation of new entrant ability matches the
standard deviation of log profit for firms that have been open for less than 1 year, which
implies σ0 = 1.23. The depreciation rate is set to a standard value of λ = 0.06. Finally,
we have the Cobb-Douglas exponents on capital η and labor α. We set η = 0.20 to match
the median firm’s baseline capital-output ratio (rk)/y = 0.20 then set α to match the
median firm’s profit-sales ratio, π/y = 0.12. This implies α = 0.68. Finally, we set the
standard deviation of the exogenous ability shock to σ = 0.45 and the productivity drift
c = −1.11. These two parameters jointly match the standard deviation of log profit in
the economy and the ratio of average profit of all firms relative to those with less than 1
year of operation. Parameters and moments are in Table 11.
71
Table 11: Targets and Parameter Choices
Model Parameter Description Parameter Target Moment Source Target Model
Value Value Value
Group 1 From RCT
β Intensity of diffusion 0.276 Estimated parameter from regression RCT results 0.276 0.276
ρ Persistence of ability 0.955 Estimated parameter from regression RCT results 0.955 0.955
θ Match technology “quality” 0.199 Treatment effect at t = 1 RCT results 0.076 0.076
Group 2 Matched one-to-one with parameter
δ Death rate of firms 0.09 Average exit rate in China Literature (Lu, 2021) 34 34
σ0 St. dev. of new entrant ability distribution 1.23 Variance of log profit among new entrants CS Baseline 1.23 1.23
α Cobb-Douglas share, n 0.68 Median firm π/y CS Baseline 0.12 0.12
η Cobb-Douglas share, k 0.20 Median firm (rk)/y CS Baseline 0.20 0.20
λ Depreciation rate 0.06 Literature – – –
Group 3 Jointly targeted
σ St. dev. of exogenous ability shock distribution 0.45 Standard deviation of log profit in all firms CS Baseline 1.34 1.34
c Growth factor in ability evolution -1.11 Ratio of average profit of all firms to new entrants CS Baseline 1.12 1.12
Table notes: Group 1 is jointly chosen from the experimental data. Group 2 are also set to match baseline data moments, but match 1-1 with target moments. Parameters inGroup 3 are calibrated to jointly match moments.
72
F.6 Treatment Effect Persistence
In Section 5.3, we showed that the treatment effect persistence is increasing in ρ and de-
creasing in β. In our baseline Kenyan RCT we estimate (β, ρ) = (0.538, 0.595). Here, we
estimate (0.276, 0.955). Our estimates of both β and ρ predict more persistent treatment
effects than our baseline model.
We compare this to the empirics in Figure 16. We plot 3 time paths: the empirical
treatment effect (which is available for 2 years post-treatment), the estimated model
effect, and the estimated model effect when we assume our Kenyan RCT values for β
and ρ. We extend the latter two series for 5 years to trace the dynamics over a longer
horizon.
Figure 16: Dynamics of Average Treatment Effect (Firm Size)
0 1 2 3 4 5-0.1
0
0.1
0.2
0.3DataModelKenyan RCT params
Figure notes: Dashed line is the data with 95 percent confidence interval, for which data is available at t = 0, 1, 2. Solidline is the estimated model (β, ρ, θ) = (0.276, 0.955, 0.199). For comparison, we include the results at our Kenyan RCTvalues of β and ρ, re-estimating θ to hit the t = 1 ATE, which implies (β, ρ, θ) = (0.538, 0.595, 0.496). We extend themodel-derived RCT for 5 post-treatment years to study fade-out.
The model is consistent with the persistent gains.49 Even 5 years post-treatment, the
model predicts that 89 percent of the initial benefits remain. In comparison, if we replace
β and ρ by our estimated values in Kenya, the fade-out is more substantial and nearly
complete by t = 4. The results highlight the importance of estimating these parameters
in governing the time series of the treatment effect.
F.7 Quantitative Gains at Scale
We conduct the same exercises as the main text. To measure the aggregate implications,
we permanently shock the matching technology to increase average match quality, in
line with the extensive margin focus of the RCT results. We do so by shocking the
49The slight increase observed in the treatment effect from t = 1 to t = 2 is statistically insignificant by any reasonablecutoff.
73
parameter θ so that it is 25 percent closer to its limit of θ = 1. We study the new
stationary equilibrium and compare it to the baseline equilibrium. Aggregate moments
are reported in Table 12. We present two steady states, both of which operate under the
new matching function. The first (in column 2) fixes the wage at its baseline level. The
second (in column 3) allows the wage to adjust as well.
Table 12: Equilibrium Moments
(1) (2) (3)
Baseline Fixed Wage At-Scale
Income 1.00 1.13 1.05
Ability 1.00 1.39 1.39
Aggregate Labor Supply 1.00 1.00 1.00
Wage 1.00 1.00 1.05
Table notes: Column (1) is the initial equilibrium, normalized to one. (2) and (3)report the new stationary equilibrium after shocking the matching technology,where (2) holds the wage fixed at its baseline level and (3) allows it to adjust.
The direct effect of making it easier to learn from high ability agents is that average
ability rises by 39 percent and income by 13 percent. Some of that is eliminated by
the higher general equilibrium wage, with the net effect of a 5 percent increase in total
household income.
We next ask the importance of measuring treatment effect heterogeneity, as in the
main text. We vary (β, ρ) ∈ [0.20, 0.95] × [0.20, 0.95]. For each, we continually update
θ to match a given average treatment effect. We vary the ATE from 0 to 18 percent,
which traces out the range of possible aggregate outcomes by ATE.50 Those results are
in Figure 17. Similar results emerge as in the main text – the set of possible aggregate
outcomes for a given treatment effect can be large.
50The set of feasible ATEs for the range of (β, ρ) we consider is smaller here than in the main text, as our diffusionprocess requires θ ∈ [0, 1]. See the discussion surrounding Proposition 2 for more details on this constraint.
74
Figure 17: Range of Aggregate Gains for each ATE (Firm Size)
0 0.03 0.06 0.09 0.12 0.15 0.180
0.03
0.06
0.09
0.12
0.15
0.18
Figure notes: Shaded area is all possible aggregate gains for (β, ρ) ∈ [0.20, 0.95]× [0.20, 0.95], where θ is chosen to matchthe ATE on the horizontal axis. The baseline estimate at (β, ρ) = (0.276, 0.955) is given by the dashed line.
75
G Proofs
G.1 Proof of Proposition 1
Proof. Start from the law of motion defined in Assumption 1. In logs, and applying
Assumption 2 (π ∝ z) annd Assumption 4 (z > z), this is
log(π′i) = c+ ρ log(πi) + β log
(max
{1,πiπi
})+ εi, (G.1)
where πi and πi are baseline profit for individual i in the treatment group and her match,
while c is a constant equal to the structural parameter c if π = z. Since matches are
observable within the treatment, (G.1) is a simple panel regression with coefficients β and
ρ. Since matches are randomized, the parameters are identifiable. Thus, that βOLS and
ρOLS are equal to their structural counterparts follows directly from the within-treatment
exclusion restriction of Assumption 4. �
G.2 Proof of Proposition 2
Proof. Define zT and zC as average ex post ability the treatment and control groups.
The law of motion for ability (Assumption 1), combined with the implied variation in
matches (Assumption 4), implies
zTzC
=
∫ ∫ ∫ec+ε max
[z, zβz1−β]ρ dF (ε)dHT (z, z)dHT (z)∫ ∫ ∫
ec+ε max [z, zβz1−β]ρ dF (ε)dM(z; z, θ)dHC(z)
Using the orthogonality of the exogenous shocks ε (Assumption 4), we can re-write the
equation as
zTzC
=
∫ ∫zρ max [1, z/z]β dHT (z)dHT (z)∫ ∫
zρ max [1, z/z]β dM(z; z, θ)dHC(z).
or, writing it in terms of profit via Assumption 2,
Γ ≡ E[π′T ]
E[π′C ]=
∫ ∫πρ max [1, π/π]β dHT (π)dHT (π)∫ ∫
πρ max [1, π/π]β dM(π; π, θ)dHC(π). (G.2)
First, note that the lefthand side of (G.2) is simply the change in ex post average
profit between treatment and control. This is observable in the data. Second, the only
only unknown on the right hand side of (G.2) is θ. This follows from Assumption 2 and
Proposition 1. All that is left is to show that there exists a unique θ that solves (G.2).
The right hand side is continuous in θ by the continuity of M in Assumption 3. The
intermediate value theorem then guarantees existence when Γ ∈ [Γmin,Γmax]. Finally,
76
uniqueness follows from the strict monotonicity of the right hand side in θ, which is
guaranteed by the assumed first order stochastic dominance in Assumption 3. �
G.3 Proof of Proposition 3
We begin by detailing the underlying arithmetic of the model, then use those results to
prove Proposition 3 at the end of this section.
Solving the Bargaining Problem as a Function of Supplier Marginal Cost m Solving the
optimal input choices for the firm implies profit is
πf (c, w) =
(α
px
) α1−η−α ( η
w
) η1−η−α
(1− α− η). (G.3)
A supplier has profit function πs = (px − m)x where m is its given marginal cost.
Given that they take as given the firm’s decision on inputs, this implies we can write this
profit as
πs(px,m) =
(α
px
) 1−η1−η−α ( η
w
) η1−η−α
(px −m).
Re-writing the bargaining problem π(px)νπs(px,m)1−ν taking these derivations into ac-
count yields
maxpx
( ηw
) η1−η−α
(1− α− η)να1−η−ν(1−η−α)
1−η−α cν(1−η−α)+η−1
1−η−α (px −m)1−ν
Log differentiating gives the solution
px =
(1− η − ν(1− η − α)
α
)m (G.4)
Replacing px in the firm’s profit function (G.3) with the value from (G.4) yields
π =
(α
1− η − ν(1− η − α)
) α1−η−α
(1− η − α)αα
1−η−α
( ηw
) η1−η−α
m−α
1−η−α (G.5)
Optimal Search Intensity Now that we have profit as a function of the supplier’s marginal
cost m in (G.5), we need to solve for search intensity s. Recall that m = exp(−s)z α+η−1α .
Plugging this into (G.5),
πf (m) =
(α
1− η − ν(1− η − α)
) α1−η−α
(1− η − α)αα
1−η−α
( ηw
) η1−η−α
exp(−s)−α
1−η−α z
The relevant part of the firm’s utility function for this problem is the static utility flow
77
ω log(π) + (1− ω) log(1− s). Plugging in π and solving for s yields
s = 1−(
1− ωω
)(1− η − α
α
)(G.6)
Occupational Choice Given the decision rules derived above, the model therefore has a
cutoff rule for occupational choice. To see this, note that because the continuation values
between workers and entrepreneurs are identical, we can focus on the flow utility payoff.
After a bit of algebra, these are
uf (w) = ω log(C1) + (1− ω) log(1− s)− ηω
1− η − αlog(w) + ω log(z)
with constants
C1 =
(α
1− η − ν(1− η − α)
) α1−η−α
(1− η − α)αα
1−η−αηη
1−η−α exp
((1− ωω
)(1− η − α
α
)− 1
) −α1−η−α
s = 1−(
1− ωω
)(1− η − α
α
)More simply for workers, flow utility is uw(w) = ω log(w). Firm operation is preferred
when uf (w) ≥ uw(w), which implies a cut-off z
z = w1−α
1−η−α exp
(− log(C1)− 1− ω
ωlog
[(1− ωω
)(1− η − α
α
)])
Thus, the cutoff has the feature that log(z) ∝ log(w), where w is the equilibrium wage.
Proof of Proposition 3 With these results, Proposition 3 follows quickly.
Proof. Plugging (G.6) into the profit function yields
π =
(α
1− η − ν(1− η − α)
) α1−η−α
(1− η − α)αα
1−η−α
( ηw
) η1−η−α
× exp
((1− ωω
)(1− η − α
α
)− 1
) −α1−η−α
z.
Thus, we have π = A(w)z in equilibrium, where A depends on parameters and the equi-
librium wage w. Replacing m = exp(−s)z α+η−1α in the equilibrium input price function
(G.4) gives
px =
(1− η − ν(1− η − α)
α
)exp
((1− ωω
)(1− η − α
α
)− 1
)zα+η−1α
as required. �
78
G.4 Proof of Proposition 7 (from Social Planner’s Problem)
Proof. Since these are static decisions, they solve the simplified static problem
maxc,s,x,n
ω
∫ ∞0
log(c(z))dM(z) + (1− ω)
∫ ∞z
log(1− s(z))dM(z)
s.t.
∫ ∞z
x(z)αn(z)ηdM(z)−(
1− η − ν(1− η − α)
α
)∫ ∞z
exp(−s(z))zα+η−1α x(z)dM(z) =
∫ ∞0
c(z)dM(z)∫ ∞z
n(z)dM(z) =
∫ z
0
dM(z) ≡ Ns
M(z) given
Note that for simplicity here, we have already imposed a cut-off value for z, z. Much
like the laissez faire equilibrium, it is straightforward to show that the planner also
chooses to set occupations this way.
The first piece to note is that, given the separability of utility in c and s, the planner
allocates a constant level of consumption c(z) = c. Thus, we just need to determine total
resources in the economy to determine consumption. Solving the optimal input choices
x(z) and n(z) collapses the simplified static planner’s problem to
maxs(·)≥0
ω log(c) + (1− ω)
∫ ∞z
log(1− s(z))dM(z)
s.t.
(α2
1− η − ν(1− η − α)
) α1−α
(1− α)Nη
1−αs
(∫ ∞z
z exp(s(z))α
1−α−η dM(z)
) 1−α−η1−α
= c
M(z) given
The first order condition for this problem is
(1− ω)m(z)
1− s(z)= λC
(∫ ∞z
z exp(s(z))α
1−α−ηm(z)dz
) −η1−α((
α
1− η − α
)z exp(s(z))
α1−α−ηm(z)
)−λ2
where λ is the Lagrange multiplier on the resource constraint, λ2 is the multiplier on
the non-negativity constraint s ≥ 0, and C is the constant in front of the integral of the
resource constraint. For any z1 and z2 with interior solutions (λ2 = 0), we have
1− s(z2)
1− s(z1)=
(z1
z2
)(exp(s(z1))
exp(s(z2))
) α1−α−η
(G.7)
Define for ease of notation q(z2/z1, s(z1)) =(z1z2
)exp(s(z1))
α1−α−η (1 − s(z1)) and the
transformation −t =(
αα+η−1
)s(z2)−
(α
α+η−1
). After some algebra, we can rewrite (G.7)
79
as (qα
η + α− 1
)exp
(α
η + α− 1
)= t exp(t)
The solution to this problem is given by the principal branch of the Lambert W function,
t = W0
[(α
η + α− 1
)exp
(α
η + α− 1
)q
].
Undoing the transformation and setting z1 = z, if the economy is at an interior solution
to s for all z ≥ z (which we verify at our estimated parameters), we can write this as
s(z2) =
(1− α− η
α
)W0
[(α
η + α− 1
)exp
(α
η + α− 1
)q(z2/z, s(z))
]+ 1,
Since q is decreasing in z2 and η + α < 1, that s(·) is increasing follows from the fact
that that W0 is an increasing function. Concavity similarly follows from properties of
W0. Since W0 is increasing and x exp(x) is convex, its inverse W0 is concave.
�
If there is a corner solution, this analysis remains nearly identical, except one would
instead be required to solve for z = argminz λ2(z) = 0 instead of s(z). That is, the
z at which supplier search intensity becomes positive. While this is not at issue at our
estimated parameters, we of course take this into consideration when counterfactually
varying parameters to study how the quantitative implications change.
80