From Micro to Macro in an Equilibrium Di usion Model

Shocking Firm-Level Interactions:

From Micro Intervention to Aggregate Implications*

Wyatt BrooksArizona State University

Kevin DonovanYale University

Terence R. JohnsonUniversity of Virginia

October 2021

Abstract

Facilitating interactions that transmit skills or information can spur large in-creases in firm-level profitability, as evidenced by a growing set of interventionsgenerating inter-firm interaction in developing countries. How informative arethese micro-level returns about the aggregate gains that could be achieved byscaling the same intervention? We study this question in a model that linksfirm-level interactions to aggregate outcomes via diffusion. Two key diffusionchannels differently affect micro and macro elasticities, implying the observedtreatment effect is generally unrelated to aggregate gains at scale. We providea new procedure by which the micro data identifies the parameters governingthese elasticities in aggregate diffusion models and disciplines their relative im-portance. The aggregate gains are primarily related to treatment effect hetero-geneity, as it closely corresponds to the main channels in the aggregate model.The results provide guidance on moving from micro to aggregate implications:quantitatively, the bias from not utilizing this moment can be substantial.

JEL Classification Codes: O11, O33, O40

Keywords: diffusion, general equilibrium, ability

*This paper has benefited from comments and discussions by participants at the NBER Growth Meeting, the Societyfor Economic Dynamics, Y-RISE, and various seminars, especially those from Daron Acemoglu, Lorenzo Caliendo, TeresaFort, Mushfiq Mobarak, Melanie Morten, Ezra Oberfield, Jesse Perla, Tommaso Porzio, Peter Schott, Kevin Song, Mered-ith Startz, Adam Szeidl, Duncan Thomas, Mike Waugh, and Daniel Xu. Brian Mukhaya provided outstanding researchassistance. Contact Information: Brooks, [email protected]; Donovan, [email protected]; Johnson, [email protected]

mailto:[email protected]




1 Introduction

A substantial amount of economic know-how is embodied in other agents. The prof-

itability of a firm, for example, depends in part on skills, management practices, and

other innovations gleaned from, or originally discovered by, other firms. However, fric-

tions limiting the inter-firm interactions necessary to extract this information may limit

firm growth in developing countries. A growing body of evidence supports this view.

Recent interventions facilitating these interactions find large changes in firm-level profit,

technology adoption, and management practices.1

These promising micro-level effects put this class of policies at the center of a funda-

mental question of economic growth: whether we can develop at-scale policy that combats

the low managerial skill (Bloom and Van Reenen, 2007; McKenzie and Woodruff, 2017)

and technology adoption (Comin and Hobijn, 2010) in developing countries, both of

which contribute to the substantial productivity and income gaps between rich and poor

countries (Klenow and Rodrıguez-Clare, 1997; Prescott, 1998). Indeed, more than US$1

billion per year is spent attempting to correct skill deficiencies and the corresponding low

profit that plague entrepreneurs in developing countries (Blattman and Ralston, 2017).

Unfortunately, we have little information about what the empirical evidence derived

from these interventions teaches us about the gains that could be achieved at scale. Does

a large treatment effect from an intervention that encourages inter-firm interaction imply

large gains from scaling it? And if not, how do we use the evidence to inform our ultimate

goal of inferring at-scale implications? We study these questions here, joining a small

body of recent work studying the potential differences between micro and macro effects

of various policies (e.g., Berguist et al., 2019; Greenwood et al., 2019; Buera et al., 2021b;

Fried and Lagakos, 2021; Fujimoto et al., 2021). We ask how the micro-level elasticities of

better matching relate to the aggregate-level elasticities that are relevant when scaling the

same intervention. We use the results to investigate the proper interpretation of reduced-

form results from this class of interventions when attempting to inform at-scale policy, and

whether extrapolating directly from treatment effects generates a quantitatively relevant

bias at scale.

We answer these questions in three steps. First, within a broad class of aggregate

models that take seriously firm-level interactions, we clarify how the average treatment

effect is informed by various economic forces that play different roles at the micro and

1For example, Cai and Szeidl (2018) create randomly-formed business groups in China and Brooks et al. (2018) createrandom one-to-one matches among Kenyan microenterprise owners. Relatedly, Atkin et al. (2017) introduce randomvariation in buyer-seller links in Egypt, Beaman et al. (2020) randomly seed information on new technology in Malawi, andFafchamps and Quinn (2018) introduce smaller-scale firm owners to managers of high-profit firms in Ethiopia, Tanzania,and Zambia during a business plan competition. Broadly, all of these are designed to test the importance of giving firmsaccess to new information embedded in other economic agents.

1

macro levels. Directly extrapolating the smaller-scale intervention results to at-scale

gains is therefore infeasible, except in knife-edge cases. Second, we provide a method by

which the micro data can be used to separately identify the parameters governing these

channels in this same class of models, thus solving the first problem. Finally, we show

quantitatively that the bias induced by not doing so – for example, by directly extrapo-

lating the average treatment effect – can be large. Throughout, our results highlight the

importance of measuring treatment effect heterogeneity when extrapolating to at-scale

implications, as it closely corresponds to important theoretical channels that generate

the at-scale gains.

We study these issues in a general equilibrium framework that takes seriously the

role of individual-level interactions, building off a body of work recently reviewed in

Buera and Lucas (2018). Agents interact and exchange “ideas” (or some related profit-

enhancing notion), with higher-quality interactions generating more profit. This micro-

foundation shares a tight intuition with the interactions induced by the aforementioned

micro literature, and we will think of these interventions as a shock to this process. But

aggregating the impact of these interactions must take seriously that they do not exist in

a vacuum. Today’s interactions affect the distribution of ideas in the economy, which has

implications for future interactions and thus future income. This equilibrium diffusion

process has long been emphasized in the development process and we allow for it here.2

Formalizing it in the model links individual-level interactions with aggregate outcomes.3

We use this framework to study the relationship between the average treatment effect

(ATE) from a particular type of intervention and the gains that could be achieved by

scaling it. Our intervention of interest encompasses those discussed above, in which eco-

nomic agents get access to other firms or firm-embodied ideas that they would otherwise

be unlikely to encounter. The ATE derived from this intervention is governed by two

forces that we focus on throughout this paper. First, it depends on the return to a match:

how much the exogenously-provided match changes a treated firm’s profit. But this is

insufficient, as untreated firms still interact with others via the usual, non-intervention

matching process. Therefore, the ATE also depends on the extensive margin: how likely

that same match would have been without the intervention.4 These two forces affect the

ATE in opposite ways. The first affects the ATE positively, as the benefit of an exogenous

2See Munshi (2007) for a review of diffusion in agricultural technology, health, and education, along with Alatas et al.(2016) and others on recent implications for policy. Romer (1990) highlights the importance of knowledge spillovers forlong-run growth.

3Early work on the topic includes Jovanovic and Rob (1989) and is extended to endogenous growth in Lucas (2009),Lucas and Moll (2014), and Perla and Tonetti (2014). These models have been further extended to include internationaltrade (Sampson, 2016; Buera and Oberfield, 2020; Perla et al., 2021; Van Patten, 2020), innovation policy (Benhabibet al., 2020; Lashkari, 2020), frictional labor markets (Engbom, 2020), and various types of learning among co-workers(Herkenhoff et al., 2018; Jarosch et al., 2020; Wallskog, 2021).

4Said slightly more technically, these represent the two “differences” in the difference-in-difference regression requiredto estimate the treatment effect.

2

high-quality match naturally increases when the return to a match is higher. The second

affects the ATE negatively. If finding good matches is already easy, exogenously offering

one provides little value to a treated firm.

The divergent implications of these two channels pose a hurdle in moving from mi-

cro to at-scale gains, in the form of an under-identification problem. Specifically, any

observed ATE can be rationalized by a continuum of economies defined by different diffu-

sion parameters. These economies differ only in the relative importance of the two forces

highlighted above. To the extent these same forces play different roles at scale, any bias

in the parameter estimates (by, for example, attributing the ATE to the wrong margin)

will bias the implied aggregate gains. This gives us our first, and somewhat negative,

result. There will be a (potentially large) set of aggregate gains for any treatment effect.

The average treatment effect therefore gives us little direct information about the gains

from scaling the same intervention in this class of models. Informing the at-scale gains

requires disciplining the relative importance of these two channels.

We therefore provide a method to separately identify the parameters governing these

two channels. While the difficulty in estimating diffusion parameters in aggregate models

is well known, the intervention’s empirical moments offer a powerful and straightforward

tool to do so. It relies on two simple moments: the average treatment effect and a

measure of heterogeneity in the treatment effect. The first step measures the return to

a match. At its heart is the fact that exogenously engineering random matches creates

observable variation in treatment intensity. Some treated firms meet relatively high

quality matches, others lower quality. We estimate the return to a match by measuring

how a marginally higher quality match affects an otherwise similar firm. Implementing

it requires a regression similar to reduced-form tests of treatment effect heterogeneity in

recent empirical work on this topic (Brooks et al., 2018; Cai and Szeidl, 2018). Once

estimated, the average treatment effect measures the extensive margin. Intuitively, if we

observe a large average effect (conditional on the measured return to a match), we infer

that the intervention matches must be substantially better than those usually afforded

to firms. This two-step procedure disciplines the relative importance of our two main

channels. A useful feature is that it leverages the exclusion restrictions built into the

empirics, and thus holds in a broad class of models without ancillary assumptions on the

shape of distributions, details of entry and exit, and other features usually employed to

estimate or calibrate these parameters.5

5As it contrasts with existing work estimating these parameters in aggregate models, our results can be thought of asshifting the burden of identification toward the data generating process and away from the structural economic environment.One extreme is calibration-style exercises (examples provided in Buera and Lucas, 2018, such as Caicedo et al. 2019). Theyuse relatively aggregated or cross-sectional data, relying on model structure to infer diffusion parameters (though of coursetoward different goals than we have here). We lie on the other extreme. We use the structure provided by the RCT-styleinterventions to limit required assumptions on the economic environment. Between the two is work utilizing more detailed,

3

Finally we study quantitative implications, asking whether the potential theoretical

biases highlighted earlier have any practical relevance. Doing so requires us to close to

the model and study a particular intervention, as quantifying these relationships requires

us to be explicit about the remaining economic environment. We therefore focus on

the context of a randomized controlled trial in Kenya, in which we introduce one-to-one

matches between low- and high-profit firm owners (Brooks et al., 2018). We construct the

remaining details of our model to remain faithful to the economic environment studied,

facilitated by the flexibility of the diffusion parameter identification. We model diffusion

as the knowledge of lower-cost suppliers, as our empirics point to this as a key channel by

which profit increases. We also allow for occupational choice, as our study is primarily

run among small firms for whom exit to wage work is a potential outside option.

Empirically, we find that average profit increases by 19 percent in response to the

intervention. We implement the at-scale counterpart to the RCT by permanently making

it easier for all agents to find high-quality matches. The model predicts that average

income rises by 11 percent. Our observed ATE, however, is actually consistent with a

wide range of aggregate gains. Indeed, when we do not use heterogeneity as an estimating

moment, we can construct a set of economies that all deliver our observed treatment

effect, but whose at-scale gains vary between 0.6 and 38 percent. These economies differ

only in the relative importance of the two channels highlighted above. The wide range of

at-scale gains for a given treatment effect has quantitatively important, policy-relevant

implications. It implies than an 11 percent increase in aggregate income is feasible in

economies delivering nearly any treatment effect. Therefore, the average treatment effect

or its implied cost-effectiveness need not predict the most cost-effective scale-up and,

quantitatively, can be substantially misleading. However, measuring treatment effect

heterogeneity provides a solution by disciplining the relative importance of the two main

channels that deliver the at-scale gains.

The variation in feasible at-scale gains suggests the micro and and macro elasticities

are being driven by different parameters. We show that this is indeed the case, and

provide some rationale for why measuring heterogeneity proves useful here. The result

follows from the fact that our empirical notion of heterogeneity is closely connected to the

main quantitative forces in aggregate diffusion models. In particular, aggregate diffusion

models are primarily governed by the right tail of the equilibrium ability distribution.

Tail weight has a natural relationship to our empirical measure of heterogeneity. Given a

set of matches created by the intervention, this moment measures the marginal increase in

own-ability from a slightly better match. Or, put differently, it tells us how much farther

but non-random, data like matched employer-employee datasets (Herkenhoff et al., 2018; Jarosch et al., 2020). See alsoNakamura and Steinsson (2018) for a broader discussion on this tradeoff in aggregate models.

4

into the tail we can push an agent by slightly increasing her match quality. It is therefore

closely linked with the key channels governing aggregate implications. Measuring it

is critical when attempting to extrapolate to at-scale implications from reduced-form

interventions of the class we study here.

On the other hand, the extensive margin plays a smaller role. If matches involve some

randomness, as has been pointed to empirically (e.g. Banerjee et al., 2020), most firms

will eventually interact with at least one high quality match. This tilts the quantitative

importance of this margin away from long-run aggregates. This result shares an intu-

itive connection to the high value of random seeding in an important class of network

diffusion models (Akbarpour et al., 2020), which we exploit to highlight the importance

of treatment effect heterogeneity in governing aggregate gains.

Motivated by the limited direct role of the ATE we extend our results to another

commonly used reduced-form moment, treatment effect persistence. It highlights one

of more interesting features of our empirical results: three quarters after the treatment,

there is no discernible effect remaining. One view of this result is that the intervention is

ineffective (e.g. McKenzie et al., 2020). Does it follow that the lack of long-term benefit

to RCT participants implies small gains at-scale? We use the model to study whether

this type of model-free inference is warranted and find it misleading in a particularly

severe way: the same force that increases the at-scale gains simultaneously hastens the

fade-out of the ATE. Indeed, our model replicates the short half-life of the ATE in

part because it predicts that diffusion is important at-scale. This reinforces our earlier

results. Drawing aggregate policy conclusions from the lack of persistence in this class

of experiments without considering why we observe this outcome potentially leads to

erroneous inferences.

To further this discussion, we replicate our exercise in the setting of Cai and Szeidl

(2018) in the Appendix. They run an innovative RCT creating randomly-formed busi-

ness groups in China, finding persistent changes in sales, employment, and management

practices. Our model predicts the persistence of their effects. The ability to match

these divergent patterns of persistence follows from the fact that our estimation proce-

dure infers different diffusion parameters from different patterns of heterogeneity within

the treatment group. Again, the result reinforces the importance of understanding the

underlying forces that govern observed empirical patterns matters when attempting to

derive scalable policy.

The remainder of the paper proceeds as follows. Section 2 lays out an economic

environment (or, a class of models) in which we will interpret a data generating process

with particular characteristics, the latter of which formalizes the data generated by a

5

class of interventions. We highlight the main issue that arises and our solution to it. We

then turn to a particular application to study quantitative implications. Section 3 details

the Kenyan RCT we use, then applies our diffusion estimation procedure to it. Section

4 builds those empirical results into an equilibrium diffusion model whose quantitative

implications are detailed in Section 5.

2 Facilitating Inter-Firm Interactions: Measurement and Iden-

tification

We begin by formalizing the type of intervention and a class of models within which we

interpret it. We first lay out the assumptions on the economic environment, then describe

the intervention as a data-generating process derived from this environment. We discuss

extensions of various assumptions made here at the end of this section.

2.1 Assumptions on the Economic Environment

Consider a dynamic economy populated by agents (“firms”) with heterogeneous ability

z. Ability here is some notion of managerial capital in the sense of Bruhn et al. (2010).

It makes the firm more profitable but need not be directly equal to a firm’s productivity

(TFPQ in Hsieh and Klenow, 2009).

Time is discrete and, in each period, every agent receives two shocks. First, they

receive an idiosyncratic imitation shock z. If her own ability z is greater than z, then the

imitation opportunity is useless and it has no effect on her future ability. If z > z, then

the imitation opportunity contains some useful information that the agent incorporates

it into her future ability. Second, each agent receive a random shock ε that enters next

period’s ability multiplicatively. The functional form of the subsequent ability z′ is given

by Assumption 1.6

Assumption 1. Given ability z this period, an imitation opportunity z, and a random

shock ε, ability next period z′ is given by

z′(z, ε, z) = ec+εzρ max

{1,z

z

}β, (2.1)

where the parameter c is a constant growth term, β is diffusion intensity, and ρ is per-

sistence.

This assumption parameterizes how a specified match z translates into an agent’s

future ability. The final term is the additional benefit from the imitation opportunity.6Throughout this paper, we will use ′ to denote values one period in the future.

6

If β = 0, this law of motion collapses to an exogenous AR(1) process, log(z′) = c +

ρ log(z) + ε. On the other hand, β > 0 allows ability to increase when matched with

z > z.7 At its extreme, ρ = β = 1 simplifies the law of motion to z′ = ec+ε max{z, z}.We refer to β as the “intensity” of diffusion and ρ as “persistence” throughout. Our goal

is, in part, to let the data tell us where these two parameters fall.

Given the notion of ability we consider here, we cannot observe it directly. Thus, we

require a link between ability and observable variables. The requirement is summarized

in Assumption 2.

Assumption 2. In any period, there exists some observable characteristic, denoted π,

that is proportional to ability. That is, for any two agents i and j with πi and πj,

πi/πj = zi/zj.

We denote this variable π because profit is an intuitive and commonly used measure

in the literature and will be the moment we focus on in our main empirical application.89

Finally, we specify the assumptions on the source distribution from which z is drawn.

We denote the cumulative density function of z as M(z; z, θ). The source distribution

allows agents with different abilities to draw from different distributions. These distri-

butions depend on the parameter θ, which is assumed to order a class of distributions in

the sense of first order stochastic dominance. This is summarized in Assumption 3.

Assumption 3. The imitation opportunity z is drawn by an agent with ability z from

a distribution characterized by the cumulative density function M(z; z, θ), a known func-

tion. For every z and z, M is continuous in θ and θ1 < θ2 =⇒ M(z; z, θ2) first order

stochastically dominates M(z; z, θ1).

This assumption admits a variety of search and assignment processes that we discuss

further below. Critical for our purposes is that θ shifts the distribution in a way that

increases the average match quality. In light of this, a useful way to think of θ is as

measure of the productivity or quality of the underlying matching technology, and we

will do so to fix intuition in the following results.

7Furthermore, notice that the max operator in the diffusion process rules out any productivity benefit accruing to thehigher ability agent in the match. We show later that this is not critical, but does have some empirical support (Brookset al., 2018; Jarosch et al., 2020). Since our empirical results will eventually require this functional form, we maintain itthroughout.

8Setting profit π = z satisfies this assumption (Lucas, 2009; Perla and Tonetti, 2014). Alternatively, a Cobb-Douglasproduction function y = z1−αnα would satisfy the same assumption in a competitive labor market where z is firm-levelproductivity.

9One in which this assumption can fail is if firms are subject to unobserved idiosyncratic variation, whether they bedistortions or measurement error. This can be included in a straightforward way. We discuss this and other generalizationsof these assumptions in Section 2.5 and detail them in the Appendix.

7

2.2 Intervention as a Data-Generating Process

Section 2.1 laid out a set of assumptions on the economic environment. Assumption 4

summarizes the type of intervention we seek to investigate.

Assumption 4. A set of agents with ability distributed H(z) are observed in two con-

secutive periods. The set of agents is partitioned into two subsets characterized by dis-

tributions HC(z) and HT (z) (i.e., “control” and “treatment”). The following conditions

hold:

1. Agents in HT and HC draw their ε shocks from the same distributions

2. The matches for agents in HC are not observable, and distributed M(z; z, θ)

3. The matches for agents in HT are observable, and distributed HT (z) 6= M(z; z, θ)

4. For any arbitrary partition of the treatment group, characterized by H1T (z) and

H2T (z), agents in both groups draw their ε shocks from the same distribution

The first assumption is the usual exclusion restriction. As required in any attempt to

measure causal effects, it puts restrictions on how the unobserved characteristics vary

between control and treatment. The second formalizes the intuitive notion that we

cannot observe individual-level control group matches, but also that matches continue

to happen in the background of the economy. That is, the control group does not stop

meeting other agents because of the intervention.

Finally, the third and fourth lay out what we require from the treatment. The third

states that we can observe all treatment matches, and those matches are drawn from

some other distribution than the control group. Finally, the last assumption states a

second exclusion restriction within the treatment group, guaranteeing that a comparison

between treated firms is unbiased.

2.2.1 Example

It is useful to highlight a concrete example of these assumptions to both fix ideas and

show how they relate to the literature. We take a deliberately simple version using the

discrete time analog of Lucas (2009), with notation adjusted for continuity. He develops

a model in which income π is equal to ability z. Each agent receives α ∈ N uniformly

random draws from the equilibrium ability distribution Mt(z), and keeps the max of

those draws. It is fully internalized if useful, but otherwise can be disregarded. Defining

z := max{z1, . . . , zα}, ability therefore evolves according to zt+1 = max{zt, zt}.

8

Linking to Assumptions 1 – 3 This model falls under our assumptions in a straightfor-

ward way. Assumption 2 (π ∝ z) follows directly from the assumption π = z. We next

define the source distribution M(z; z, θ) in Assumption 3 as

M(a; z, θ) = Prob(max{z1, . . . , zα} ≤ a)

=α∏i=1

Mt(a)

= Mt(a)1

1−θ

where θ = 1 − 1/α indexes the number of draws. A higher θ implies the firm receives

more matches and therefore increases the likelihood that z will be high. This is a specific

interpretation of the FOSD assumption in Assumption 3.10 Finally, the law of motion is

identical to that in Assumption 1 once we assume c = 0, the exogenous shock distribution

F is degenerate at ε = 0, and ρ = β = 1. This gives z′ = max{z, z}.

Intervention in this Model In a developing country, we may think it difficult for firms to

create these matches thereby keeping income low. Said differently, we may hypothesize

that α (or equivalently θ = 1 − 1/α) is low. A researcher could test this hypothesis by

creating an intervention in which she randomly creates groups of N firms, measuring

how profit changes relative to a control group that continues to match according to M .11

Treated firms therefore draw from an exogenously created source distribution HT (z).

If the true value of θ is indeed low, exogenously offering firms this opportunity will

generate a large treatment effect. The inverse is intuitively true as well. Exogenously

offering the opportunity to meet has little value if it is already easy to do so. This

intuition underlies a natural link between the average treatment effect and θ. Moreover,

Lucas (2009) shows that the arrival rate of ideas is critical for determining the long-run

growth path of the economy. Together, they comprise a clear link from the average

treatment effect to aggregate implications through the diffusion parameter θ.

Note, however, that this link rests critically on having already made an assumption

that determines how a given match z translates into future ability z′. Here, we assumed

ex ante that imitation draws translate one-for-one into own ability (ρ = β = 1). To

the extent that these parameters are unknown, perhaps because we want to allow for the

possibility of diminishing returns in imitation gains (e.g. Van Patten, 2020), the preceding10We provide other source distributions that fall under our assumptions in the Appendix. These include letting θ

index the probability of receiving a draw, the noise in the matching process from either difficulty or uncertainty inlearning (Beaman et al., 2020) or because of complementarity between the productivity of the match and some exogenousinnovation (Buera and Oberfield, 2020), the cost of effort choice or bargaining weight between matched agents, a measureof complementarity in deterministic assortative matching, and the cost for firms to seek out a match (Lucas and Moll,2014). All of these differ in their economic interpretations of θ, but can be characterized by the same broad functionalform of the source distribution.

11This is a similar idea to Cai and Szeidl (2018), but is only one example of a potential intervention.

9

intuition is incomplete.

The multiple channels at play highlight the difficulty in directly link the ATE and at-

scale gains. Only by ex ante assuming away the problem is there a one-to-one relationship

between the ATE and remaining diffusion parameters. This concern is not specific to the

exact example used here. We next generalize this to any model defined by Assumptions

1 – 3 and intervention defined by Assumption 4.

2.3 Linking the Treatment Effect to Model Parameters

An intervention of the type described in Assumption 4 allows us to measure the causal

average treatment effect (ATE). In percentage terms, this empirical moment is

ATEdata =E[π′i|i ∈ T]

E[π′i|i ∈ C]

where π′ is the ex post profit of a firm, T is the set of treated firms that draw from HT ,

and C are control firms drawing from M . The model provides its own counterpart to

this ATE. After a bit of algebra, it can be written as

ATEmodel =

∫ ∫πρ max {1, π/π}β dHT (π)dHT (π)∫ ∫

πρ max {1, π/π}β dM(π; π, θ)dHC(π). (2.2)

Conditional on the intervention, there are two ways the model can deliver a given

ATEdata. The first is how we motivated the example above: appealing to the source

distribution. But an alternative is an appeal to β and ρ. If good matches always create

large changes in profit, and the intervention creates more good matches, the treatment

effect will be large. This can be summarized as

∂ATEmodel

∂θ< 0 ,

∂ATEmodel

∂β> 0 ,

∂ATEmodel

∂ρ> 0.

The first implication here is that biasing the parameter choices (e.g., assuming ex ante

that β = ρ = 1 as in the example above) will have little impact on the model’s ability to

match ATEdata. For example, if the realized ATEdata is small, but we counterfactually

assume that there are large gains from a good imitation draw, the model responds by

assuming it must be extremely easy for agents to otherwise access those good draws.

That is, it infers a high θ. More generally, under loose restrictions we formalize in the

next section, we can always find an extensive margin parameter θ to rationalize a given

treatment effect for any values of (β, ρ).

The second implication forecasts our main concern in this paper. If these various

parameters play different aggregate quantitative roles, there need not be any relationship

10

between the ATE and the gains from scaling the intervention. There will be a range of

possible at-scale outcomes for a given ATE, and we show later quantitatively that this

range can be quite broad.

Therefore, our first goal is to separately identify these parameters. We must first

identify (β, ρ). Conditional on those parameters, we can use the intuition developed

above to formalize the link between the ATE and θ. It turns out that this first step can

be accomplished with one additional reduced-form moment that is both straightforward

to interpret and commonly used in applied work.

2.4 Identification

Our procedure works as follows. Using only the treatment data, we show how to identify

β and ρ uniquely. This step identifies the parameters that govern the intensive margin

– the gains derived conditional on a match. The second step lets us then measure the

underlying quality of the matching technology, conditional on the gains from a given

match. That is, we show that θ can be identified from the average treatment effect as a

function of (β, ρ). Proofs of the following results are available in Appendix G.

Using Treatment Data for (β, ρ) Identifying the intensity and persistence parameters use

only variation from treatment matches. This has the benefit that identification follows

almost directly from the law of motion for diffusion, and is summarized in Proposition 1.

Proposition 1. The parameters (β, ρ) are identified by coefficients from the regression

run only on treatment firms

log(π′i) = c+ ρ log(πi) + β log

(max

{1,πiπi

})+ εi, (2.3)

where c is the constant equal to the technological parameter c if π = z.

There are two ways to read the regression (2.3). The first is that β measures the

impact of receiving a better match, after controlling for initial income. An alternative

is that (2.3) estimates a standard AR(1) process, but controlling for the potential bias

induced by variation in match quality π/π. The within-treatment orthogonality condition

allows us to measure both forces in one regression.

Quality of the Matching Technology, θ The next step is to identify θ. We show that the

average treatment effect identifies this parameter as a function of (β, ρ). We begin with

the formal result in Proposition 2, with discussion following.

11

Proposition 2. Given the values (β, ρ), the value of θ is uniquely identified by ratio of

ex post average profit of treatment relative to control (i.e., the average treatment effect)

if that ratio falls within the bounds [Γmin,Γmax].

Γmin = infθ


πρ max {1, π/π}β dM(π; π, θ)dHC(π)(2.4)

Γmax = supθ


πρ max {1, π/π}β dM(π; π, θ)dHC(π). (2.5)

The intuition for Proposition 2 follows from what this data-generating process allows

us to infer about the control group, and Figure 1 provides a graphical representation.

The intervention guarantees all treated agents a high-quality match from some fixed

distribution. The average treatment effect is then governed by the difference between

that distribution and the baseline distribution from which the control group still draws

matches. A small treatment effect implies the control group must already be able to

access high-quality matches. Said differently, they are drawing from M indexed by a

high θ.12

Figure 1: Identification of θ

Productivity

pdf treatmentdraws

pdf control draws(θ = 0.9)

pdf control draws(θ = 0)

ztreat

Figure notes: Figure shows the distribution of draws for treatment and control firms under different θ.The distributions are drawn Pareto, but this is for the example’s sake only.

To relate back to the example in Section 2.2.1 and broader under-identification dis-

12A final point to mention here are the bounds Γmin and Γmax in equations (2.4) and (2.5). These bounds exist onlyto guarantee that the treatment effect stays within the set of values that a model can possibly rationalize. For example, ifβ = 0, the model naturally cannot rationalize any non-zero treatment effect. This implies Γmin = Γmax = 0. Thus, thesebounds provide the theoretical limits that govern the model.

12

cussion in Section 2.3, note that our identification of θ is a conditional one. It depends

critically on already knowing how a given match translates into future productivity. This

is the power of the within-treatment orthogonality condition. If those intensive margin

parameters are biased for some reason, Proposition 2 shows that the model will still ratio-

nalize the average treatment, but with a θ whose bias tracks that of the intensive margin

parameters. The extent to which this affects the relationship between the reduced-form

ATE and aggregate gains is a quantitative question, which we turn to in the coming

sections.

2.5 Discussion on Assumptions and Identification

Before turning to the application, we briefly digress to discuss the results in this Section.

Our results lay out a relatively parsimonious procedure to identify a limited set of

parameters. In the Appendix, we extend the results in a number of directions. We

first allow for a more general law of motion for ability which can be estimated semi-

parametrically with the same data generating process. Relatedly, the results can easily

be extended to the inclusion of heterogeneous returns by characteristics. As a simple

example, this allows for the possibility that gains from a match depend on the age

difference between two firm owners. Letting a1, . . . , aN be the age-difference bins, we

could instead estimate (ρ, β1, . . . , βN) in the regression

log(π′i) = c+ ρ log(πi) +N∑j=1

βaj1Gapi=aj log

(max

{1,πiπi

})+ εi.

We also extend the results to allow a more general relationship between observables

and ability, instead of our maintained assumption π ∝ z. This allows, for example,

the integration of production function estimation to derive the required link between

observables and ability. Finally, we show the procedure can be adjusted to include

measurement error or idiosyncratic distortions that affect the relationship between profit

and ability, building off an active literature on non-linear error-in-variables models (see

Schennach, 2020, for a thorough review).

On a practical level, all of these extensions come with some additional cost or compli-

cation, whether it be empirical (e.g., an increased sample size required for non-parametric

identification) or theoretical (the details of the required deconvolution methods to deal

with measurement error). But the underlying intuition behind the estimation procedure

itself, along with the economic outcomes we focus on, are robust to their inclusion. We

therefore save them for the Appendix.

13

3 Application to Kenyan Firms

Our results so far highlight a way to separate parameters governing two dimensions of

the diffusion process: the extensive margin (the probability a given imitation draw will

be realized) and the intensive margin (how a given match affects ability). Both of these

margins affect the micro-level treatment effect and the at-scale implications. The next

question we seek to answer is whether or not this process matters quantitatively. If the

same parameters governing the treatment effect similarly govern at-scale implications,

for example, the quantitative bias by not separately identifying them may be small.

Studying this quantitatively requires us to both close the model and bring to bear

evidence that satisfies Assumption 4. We therefore utilize randomized controlled trial

data from Brooks et al. (2018). Per the discussion in Section 2, we can focus on this

estimation independent of the remaining model structure, so we leave the model details

for Section 4.

3.1 Details of RCT and Data Collection

The experiment took place in Dandora, Kenya, a dense informal settlement on the out-

skirts of Nairobi. Self-employment is ubiquitous in Dandora with a huge number of

street-level businesses operating in a variety of industries, such as retail, simple manufac-

turing, repair and other services. We began by conducting a large scale cross-sectional

survey. We sampled a random cross-section of 3290 businesses. This sample is represen-

tative of the population of enterprises, and it includes businesses of a variety of ages and

industries. We use it to derive moments from the population of operating firms.

We randomly match more profitability entrepreneurs with less profitable ones. The

business owners are followed over 17 months to measure changes in business practices

and profitability over time. Outcomes are compared to a control group of similar firms.

Selection and Randomization We start from a sample of female business owners who

have been in operation for less than 5 years.13 We then randomly select a subset of

these business owners to randomly match with a more profitable owner. A control group

receives no such offer. Firms were then surveyed over 6 quarters to track the time series

of treatment.

The set of more profitable owners were selected from those businesses with owners

over 40 years old and at least 5 years of experience. This was to minimize the impor-

tance of “luck” in baseline profit realizations to allow us to focus on truly productive

13The sex selection criteria is to limit heterogeneity outside the model. Note, however, that females make up 65 percentof business owners in Dandora and 71 owners with businesses open less than 5 years.

14

business owners. We then recruited business owners with the highest profit until we had

a sufficient number for matches. Of those contacted, 95 percent accepted. Matches with

the treatment firms were randomly created conditional on industry.

To summarize, Figure 2 plots the cumulative distribution function of baseline profit

for the entire sample, the population we study, and the selected matches. One can see

that our study population is somewhat poorer than the entire population, while the

matches are drawn from the far right tail of the baseline profit distribution.

Figure 2: Baseline Profit Distributions

0.2

.4.6

.81

5 7 9 11Log(Monthly Profit)

Entire Pop. Study Pop. Defined matches

Note that this procedure satisfies all the requirements in Assumption 4. The random-

ization naturally satisfies the exclusion restriction, we do not assume observability of the

control matches, treatment draws are drawn from the right tail of the baseline firm profit

distribution, and the individual one-to-one matches are randomly assigned.

Details of a “Match” What does it mean to enter into one of our matches? We designed

the program to remain as truthful to the theoretical counterpart of the model as possible.

First, matches were designed to only last for one month, though of course there was no

restriction on meeting after the formal end of the program.

The program was pitched to both sides of the match as a mentee-mentor relationship,

and thus was explicitly focused on business success. The more successful business owners

were the “mentors,” while the less successful were the “mentees.” The mentors were told

they could potentially help other business owners learn the requisite skills required to

operate in Nairobi. However, we provided no topics to discuss, instead preferring that

the content of any discussions was self-directed. After signing up mentors we simply

provided the mentees with the mentor’s phone number and told them that a prominent

15

business owner in Dandora was willing to discuss business questions with them. Whether

they contacted the mentor, or ever met, was their decision. However, all matches met

at least once in the official month-long treatment period.14 For simplicity and ease of

reference to the more detailed discussion in Brooks et al. (2018), we refer to these two

groups as mentees and mentors. We emphasize, however, that they should more generally

be thought of as the more and less profitable members of a match.

3.2 Balance

Since our theoretical results rely on two layers of randomization, we need to verify balance

both on between control and treatment and within treatment. Our two balance tests are

yi0 = α0 + α1Ti + εi,

yi0 = α0 +∑

j∈{L,M,H}

αjMij + εi

where Ti = 1 if i is in the treatment group and Mij is an indicator denoting that firm i is

a treatment firm matched with a bottom 25th percentile (denoted MiL), 25-75 percentile

(MiM), or top 25 percentile firm (MiH) in terms of baseline profitability. The results are

in Table 3 and Table 4 in Appendix A.15 As expected given the randomization, there is

limited variation across 15 different characteristics. The only significant difference is in

age and the magnitude is small.

3.3 Treatment Effects and Underlying Mechanisms

Figure 3 begins by plotting the average treatment effect (as a percentage) over time,

along with the 95 percent confidence interval. There is a large treatment effect in the

immediate aftermath of the treatment period that fades quickly. We test whether our

model rationalizes this pattern after laying out the model below, and find that it does. We

also directly confront what inferences can and cannot be made about aggregate diffusion

from this pattern in Section 5.3.

Mechanisms The observed changes in profitability primarily come from lowering input

costs, with unit inventory costs falling by 49 percent relative to the control group.16

14One might be concerned that we indirectly primed mentees to believe these matches would be beneficial. We can dolittle to rule this out completely. We note, however, that evidence of the mentor’s business success are easily visible tothe mentee. Mentors had substantially more physical capital and workers, and had a fixed, relatively large buildings fromwhich they conducted business. Moreover, the first meeting took place at the mentor’s business. Thus, that the mentorwas “good” at running a business would likely have been understood with or without us.

15Some of the simple empirical results, like the standard balance test, are also reported in Brooks et al. (2018). We erron the side of including relevant results here to keep this paper self-contained. We are clear to note in each table whichresults are reported in the previous work.

16This is not to say that other components do not matter, and there is indeed substantially heterogeneity in topicsdiscussed within matches. The main topics covered in these meetings are attracting customers (63 percent), keeping

16

Figure 3: Time Series of the Average Treatment Effect (from Brooks et al., 2018)

0 1 2 3 4 5-0.4

-0.2

0

0.2

0.4

0.6

0.8

Figure notes: Figure plots average treatment effect as percentage above control mean (0.4 = 40%), alongwith the 95 percent confidence interval. Treatment takes place between quarters 0 and 1.

Consistent with this, we find that treated firms are 19 percent more likely to switch

suppliers in the aftermath of the treatment. When we close to model to study the

quantitative implications, we will take seriously this channel.

How, then, does this result square with the quick fade-out observed in Figure 3?

We find that the economic environment is characterized by massive amounts of turnover

– 62 percent of control firms switch suppliers in the 3 quarters immediately following

treatment. Thus, any value procured by firms on this dimension by meeting with others

is likely to have a short half-life. As we will show shortly, estimating the law of motion

for ability similarly points to limited persistence.

Impact on Higher-Profit Business Owner Finally, we note that the diffusion process in

Section 2 assumes that there should be no gains to the more productive members of

the match via the use of the max function in law of motion for ability (Assumption 1).

These individuals were not randomly selected relative to their peers, and thus cannot be

directly compared to a control group. However, our design allows us to use the selection

procedure to identify the causal impact of being chosen using a regression discontinuity

after resurveying both those chosen for the program and those just below the cutoff for

selection. We find no change in profitability, scale, or any practices one may associate

with ability (e.g., better book keeping, more marketing). These results are available in

records (57 percent), and lowering input costs (39 percent), where the number in parenthesis is the fraction of matchesthat discuss each topic.

17

Brooks et al. (2018), but we reproduce them in in Appendix A for simplicity.17

3.4 Estimating Diffusion Parameters from the RCT Results

We now turn to estimating diffusion parameters using these results. This estimation

requires only data on profit and treatment matches, and sets aside for the moment the

mechanisms highlighted in Section 3.3. Those details will return in Section 4 when we

close the model.

To identify the diffusion parameters, we restrict attention to the baseline and survey

wave 1 quarter post-treatment for estimation. Later we test whether the impulse response

of the model-computed RCT results matches the remaining empirical time series from

Figure 3, and find that it does.18

Step 1: Estimating (β, ρ) from Treatment Data Denoting the set of individuals in the

treatment as M and control as C, we first estimate equation (2.3) on treatment firm

data. We can write this more simply because πi ≥ πi for all i in the treatment, giving us


(πiπi

)+ εi for i ∈M.

As discussed in Section 2, this identifies β and ρ. Those results are in Column (1) of

Table 1, and we find that β = 0.538 and ρ = 0.595. These estimates can be ported into

any model in which ability evolves according to this law of motion, as they are identical

to their structural counterparts.

Step 2: Estimating θ from Relative Profit We now estimate θ. This requires a solution

to the problem

minθ

abs

(E[π′T ]

E[π′C ]−

∫ ∫πρ max [1, π/π]β dHT (π)dHT (π)∫ ∫

πρ max [1, π/π]β dM(π; π, θ)dHC(π)

)(3.1)

which is the difference between the empirical ex post profit ratio and its structural coun-

terpart under our assumptions.19 At this point, we are required to take a stand on the

type of diffusion process that underlies the economic environment. This is a tradeoff gov-

erned by the level of detail in the data collected by the researcher. With detailed network

data, one could use a more empirically-motivated approach (e.g. Beaman et al., 2020).

17Interestingly, Jarosch et al. (2020) find similar evidence for the max assumption among German co-workers. Asdiscussed in Section 2.5, this functional form assumption is not crucial. Given these results here, however, we would haveeventually required it and therefore maintained it throughout the paper.

18Alternatively, we could use the entire time series for identification. But given that the model matches it well estimatedoff only the first 2 quarters, the results are nearly identical.

19This derivation is detailed in the proof of Proposition 2 in Appendix G.

18

In the absence of such (costly) data, one can do so by assumption and test robustness of

results with other diffusion processes, as the same identification procedure holds in any

environment satisfying our assumptions.

We take this latter strategy in the absence of such data. We assume matches follow our

random directed search, in that imitation draws follow M(z; θ) = M f (z)1/(1−θ), where M f

is the distribution of operating firms in the economy. We note that such an assumption

has some empirical support in developing countries, emphasizing both random meetings

and the global nature of these spillovers across agents (e.g. Banerjee et al., 2020).

Since we have data from our baseline survey on M f , the only remaining unknown in

(3.1) is θ, and can thus be solved directly. The relevant empirical moment is in Column

(2) of Table 1. When we solve (3.1), this implies θ = −0.417.

Table 1: Identification Moments

(1) (2)

β 0.538(0.273)**

ρ 0.595(0.273)**

Treatment 891.990(280.720)***

R2 0.053 0.047Control Avg – 1897.851

Table notes: Standard errors are in parentheses. The top and bottom one percent of dependent variables are trimmed. Statisticalsignificance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***.

With the diffusion parameters in hand, we can now reintroduce the additional em-

pirical details to construct the remaining economic environment in which we will study

the quantitative implications of scaling this intervention.

4 Building the Results into a Model of Diffusion

The flexibility provided by the identification requires that we must only satisfy Assump-

tions 1 – 3 when closing the model. We first include occupational choice between firm

operation and wage work. Since the economy is primarily made up of relatively small

firms, this is the relevant income-generating trade-off for the firm owners in our study.

Second, as discussed in Section 3.3, lowering costs is a key driver of increased profitabil-

ity. We model this channel explicitly by formalizing the bargaining process between firms

and suppliers. Finally, we model a small open economy. That is, suppliers have access

to infinite supply of inputs, consistent with their connection to the broader Nairobi and

Kenyan economy. However, the labor market and diffusion of ability are local.

19

4.1 Model Details

Time is discrete and infinite. There are a unit mass of agents. The state of an agent

is her ability z, which evolves over time. Each agent dies with exogenous probability

δ and a mass δ of new agents replace them each period. New agents draw their initial

productivity from a fixed distribution with c.d.f. G(z).

Every agent has flow utility

u(c, s) = ω log(c) + (1− ω) log(1− s)

where c is consumption and s is effort (discussed below). ω is the relative weight of

consumption in utility. There are no borrowing and savings markets, so consumption is

equal to income.

Each period, an agent can choose between running a firm and working at one. Wage

work earns the market-clearing wage w, as labor is traded on a competitive market.

Firms earn profit π = xαnη−pxx−wn, where x and n are intermediate inputs and labor

services, and px and w are their respective prices.

Supplier Matching, Input Choice, and the Importance of Ability Unlike the competitive

labor market, intermediate purchases require a firm to seek out a supplier. There are

a continuum of suppliers indexed by their marginal cost m who earn profit πs(m,x) =

(px−m)x. Suppliers can source from some outside entity, consistent with their connection

to the broader Nairobi and Kenyan economy, and thus can provide whatever amount of

input x is requested by firms at marginal cost m.

Firms can seek out suppliers with lower marginal cost by exerting search effort s.

Given effort s and ability z, the firm matches with a supplier m = e−szα+η−1α . Thus,

ability makes it easier to seek out lower cost suppliers.20

Once they meet, the two parties Nash bargain over the price px that the firm will

pay, taking into account the optimal input choice by the firm at that price. This implies

an equilibrium price

p∗x(m) = argmaxpx (π)ν(πs)1−ν (4.1)

for bargaining weight ν and the participation constraint that both the supplier and firm

earn weakly positive profit.

20The constant returns assumption for suppliers eliminates any strategic interactions between supplier-firm bargaininggames.

20

Ability Transmission This ability to find lower-cost suppliers is transmitted between

agents. It evolves according to our maintained Assumption 1,

z′ = ec+εzρ max

{1,z

z

}β.

We assume learning occurs via operating firms. Denoting M f as the equilibrium dis-

tribution of operating firms (defined formally below), we write M(z;M) = M f (z;M)1/(1−θ)

as discussed in Section 3.

Recursive Formulation The timing and decisions described above implies that we can

write the problem recursively. The aggregate state of the economy is the distribution of

ability, M(z). At the start of each period, each agent chooses to either operate a firm or

engage in wage work given her ability z and aggregate state M ,

v(z,M) = max{vf (z,M), vw(z,M)}

We let the decision rule φ(z,M) = 1 denote firm operation and φ(z,M) = 0 denote wage

work. Thus, we write the distribution of draws as

M(z;M) =

( ∫ z0φ(z,M)dM(z)∫∞

0φ(z,M)dM(z)

) 11−θ

.

As discussed above, an agent who decides to operate a firm must exert effort s to find

a supplier, then bargain over the input price px. Her problem can be written recursively

as

vf (z,M) = maxs,x,n≥0

ω log(π) + (1− ω) log(1− s) + (4.2)

(1− δ)∫ε

∫z

v(z′(z, ε; z),M ′)M(dz,M)dF (ε)

s.t. m = f(s, z)

px = argmaxpx [π]ν [πs(m)]1−ν

π = xαnη − pxx− wn

z′(z, ε; z) = ec+εzρ max

{1,z

z

}βThe first constraint is the type of supplier met with search effort s. The second guarantees

that the realized cost is the outcome of Nash bargaining between the firm and its supplier.

The third and fourth are the definitions of profit and the law of motion for ability.

On the other hand, an agent who chooses to work for a wage receives flow consumption

21

w from wage work, but does not need to seek out a supplier, implying

vw(z,M) = ω log(w) + (1− δ)∫ε

∫z

v(z′(z, ε; z),M ′)M(dz,M)dF (ε) (4.3)

s.t. z′(z, ε; z) = ec+εzρ max

{1,z

z

}β.

Equilibrium We study the stationary equilibrium of this model, which includes value

functions v, vf , and vw, decision rules for occupation φ(z,M), effort s(z,M), labor

n(z,M), and intermediates x(z,M), bargaining outcomes px(z,M), and a distribution

of ability M(z), such that the value functions solve the agent’s problem above with the

associated decision rules, and the aggregate state evolves according to

M ′(z′) := Λ(M(z′)) (4.4)

= δG(z′)

+ (1− δ)∫ ∞

0

∫ ∞0

F (log(z′)− ρ log(z)− β log(max{1, z/z})− c)M(dz;M)M(dz),

Λ(M) is the law of motion for the aggregate state. It consists of the ability of new

entrants, δG(z), and the evolution of ability for surviving agents. In the stationary

equilibrium, M∗(z) = Λ(M∗(z)).

Guaranteeing Assumptions are Satisfied This model satisfies all of the necessary as-

sumptions required for our procedure to hold. We directly assume Assumption 1 as the

law of motion for ability, and Assumption 3 is satisfied by our assumption M(z;M) =

M f (z;M)1

1−θ . All that remains is Assumption 2, that π ∝ z. We provide details of these

derivations in Appendix G.4, but we summarize a few useful features of the equilibrium

in the following proposition.

Proposition 3. The following results hold in the equilibrium of the model:

1. The solution to the Nash bargaining game is a constant markup over marginal cost,

px(m) ∝ m. Thus further implies that the realized equilibrium cost paid by firms is

given by px(z) ∝ zα+η−1α .

2. The equilibrium profit function can be written as π(z, w) = A(w)z, where A(·) de-

pends only on the equilibrium wage and parameters of the model.

3. Occupational choice follows a cut-off rule by which all agents with z ≥ z operate

firms. That cut-off is increasing and convex in the equilibrium wage, and given by

the function z(w) = w1−α

1−η−αC for a constant C.

22

Proof. See Appendix G. �

We highlight these specific features because they turn out to be useful for calibration

and interpretation of the quantitative results. For now, however, the second result of

Proposition 3 shows that our model satisfies the necessary assumptions for our procedure

to remain valid.21

One additional point made clear by Proposition 3 is that the assumptions laid out

in Section 2 are restrictions only on the observed outcomes from the model. There are

many assumptions on the primitives that can generate these outcomes, which allows the

flexibility in modeling decisions used to fit the relevant environment.

4.2 Calibration of Remaining Parameters

Given the previously-estimated diffusion parameters, we require the remaining parame-

ters before turning to quantitative results. This follows a relatively standard calibration

procedure and we choose parameters to match moments of the same set of firms in

which the experiment was conducted. We make use of the baseline field data that is a

representative sample of firms in Dandora, Kenya.

The model parametrization can be broken into three parts. First, as we showed

previously, the diffusion parameters are independent of the remaining model parameters.

Thus, we can simply impose our estimated parameters β = 0.538, ρ = 0.595, and θ =

−0.417. The model period is assumed to be one quarter.

That leaves 9 remaining parameters. On the utility side, they include the relative

weight of consumption ω and the agent death rate δ. The remaining parameters dealing

with technology and ability evolution are the parameters α and η, the ability growth

term c, and the terms governing the exogenous shocks governed by distributions G and

F . We assume that G ∼ logN(µ0, σ0) and that F ∼ logN(µ, σ), and normalize µ0 = 0.

We also note that c and µ are not separately identified, so we choose µ = −σ2/2 so that

E[eε] = 1. Finally, we have the bargaining weight ν. After the normalization and choice

of µ, we have 8 remaining parameters to calibrate.

The remaining 8 parameters can be broken into two groups. The first group is matched

one-to-one with a given moment or value (δ, σ0, ν). The second group is jointly targeted

to jointly target a set of moments (σ, c, ω, α, η). The death rate δ, which we match

to the average age of population in the study, which is 34. The constant death rate δ

implies a geometrically distributed age distribution with mean 1/δ and, assuming a new

21Note that the result π(z, w) = A(w)z in Proposition 3, which gives us the requirement that π ∝ z, follows becausematching between suppliers and firms is deterministic. Uncertainty these matches is straightforward to include. It requiresan adjustment to the diffusion parameter estimation analogous to a measurement error adjustment. We discuss thismeasurement error extension in the Appendix.

23

agent is 18 years old, implies an average age of 64 quarters in the model.22 The standard

deviation of new entrant ability matches the variance of log profit for firms that have

been open for less than 3 months, which implies σ0 = 1.00. Finally, we note that given

the model set-up the bargaining power of firms in its supplier negotiation (ν) has no

effect the results. Thus, we set it as ν = 0.5 for simplicity.

This leaves 5 parameters – σ, c, ω, α, and η – which we target to jointly hit 5

moments. While jointly calibrated, each has a clear intuitive counterpart. We choose σ

to match the standard deviation of log profit in the economy, equal to 0.99. The drift

parameter c in ability is targeted to match the average profit of all firms relative to those

who entered less than one year ago (1.51). The utility parameter ω is set to match the

share of employment in wage work. To measure this, we use the most recent Kenyan

census (via IPUMS, 2020) to measure the share the workforce who report wage work as

their main economic activity in Embakasi Contituency – the local area in Kenya that

includes our study site. Forty-eight percent of employment is in wage work.

The final two parameters – α and η – are set to match two moments. The first is the

standard deviation of the log unit intermediate cost across firms. After removing industry

fixed effects that may bias those estimates, the standard deviation in the data is 1.61.

The second moment is the cost ratio, defined as the wage bill relative to intermediate

spending. The average firm at baseline has a ratio of 0.13. Our Cobb-Douglas assumption

implies that wn/cx = η/α, so we set η = 0.13α. The standard deviation of intermediate

cost is then exactly pinned down by the cost ratio and the variability in log profit. This

follows from the fact that Proposition 3 implies the relationship log(px) = Constant +(α+η−1α

)log(π), and therefore σlog(px) =

(1−α−ηα

)σlog(π). Plugging η = 0.13α yields

α =σlog(π)

σlog(px) + 1.13σlog(π)

.

The full set of moments and parameter values are reported in Table 6 in Appendix

A.3. The model matches these moments well.

5 Quantitative Results

We begin with the aggregate implications of scaling the RCT in Section 5.1. We then

use the model to study whether measuring the average treatment effect can differentiate

the aggregate gains in Section 5.2, highlighting the importance of separately identifying

the intensive and extensive margins. Finally, in Section 5.3, we discuss the dynamic path

of the treatment effect drawing on results developed in Sections 5.1 and 5.2.22Because agents can move between firm operation and wage work over the course of their lives, we match actual age

rather than age of the firm.

24

5.1 Impact of Scaling the Intervention

We start by measuring the aggregate gains from policy. We think of this as a govern-

ment or other policy-making body implementing a program that makes it easier to find

high-quality matches, in line with the extensive margin focus of the RCT results. We

implement this by shocking the matching technology parameter θ so that it moves 25

percent closer to its limit θ = 1. We study the new stationary equilibrium and com-

pare it to the baseline equilibrium.23 This is a simple way to replicate a policy that

makes matching easier, for example through extension programs, better ICT, or explicit

mentoring programs, among others.24 Mechanically, it involves shocking the stationary

economy with θ = −0.417 to θnew = −0.063.

Aggregate moments are reported in Table 2. We present two steady states, both of

which operate under the new matching function. The first (in column 2) fixes the wage

at its baseline level. The second (in column 3) allows the wage to adjust as well.

Table 2: Equilibrium Moments

(1) (2) (3)

Baseline Fixed Wage At-Scale

Income 1.00 1.07 1.11

Ability 1.00 1.08 1.12

Aggregate Labor Supply 1.00 0.92 0.98

Wage 1.00 1.00 1.13

Table notes: Column (1) is the initial equilibrium, normalized to one. (2) and (3)report the new stationary equilibrium after shocking the matching technology,where (2) holds the wage fixed at its baseline level and (3) allows it to adjust.

Overall, income rises by 11 percent. This is made up of two general equilibrium effects.

The first is that the new matching technology directly affects the ability distribution by

making it easier to learn from high ability agents. The second is an amplification effect

through prices.

We decompose the relative importance of these two channels in columns 2 and 3 of

Table 2. Column 2 holds the wage fixed at its baseline level to but allows the ability

23Recall that given our matching technology, θ = 1 would imply all agents draw from the limit of the right tail of theability distribution. Thus, we take the economy 25 percent closer to that limit. Specifically, we set θnew = 0.25 + 0.75θ,which guarantees 1 − θnew = 0.75(1 − θ). A seemingly simpler alternative is to increase θ by 25 percent, but sinceθ ∈ (−∞, 1), such an exercise has misleading implications around θ ≈ 0.

24The choice of 25 percent is somewhat arbitrary, but also irrelevant net of some differences in magnitudes. Analternative, for example, would be to increase θ until we exactly replicate the treatment effect from the Kenyan RCT when

we offer treated firms the source distribution M(z; θnew). We think of the policy-maker more broadly than this exactreplication: she creates an aggregate program that adjusts the same margin as the RCT, but does not necessarily needexactly replicate the RCT gains (which may be impractical for a number of reasons outside the scope of this paper). Thus,these results are one example of an extensive-margin adjustment. Regardless, our main results hold for any change in thisdimension. In Appendix A.4 we provide the results when the aggregate policy exactly replicates the estimated RCT gains.The same results hold with larger magnitudes.

25

distribution to adjust. This isolates the direct effect on ability. Average ability rises by 7

percent and the labor supply declines by 8 percent as the distribution shifts mass across

the (fixed) cut-off ability level that defines occupational choice. Average income rises by

7 percent.

Column 3 allows the equilibrium wage to adjust, which increases by 13 percent as

higher TFPR firms demand more workers. Correspondingly, labor supply increases (rel-

ative to Column 2) as marginal entrepreneurs are drawn into wage work.25 This addi-

tionally amplifies ability. Since these marginal entrepreneurs most congest the learning

process, removing them makes it easier for others to learn. Thus, ability rises even fur-

ther. Of the total increase in ability, 67 is from the direct effect and 37 is from the

amplification through prices. A similar magnitude holds for income: 64 percent is from

the direct effect, while 36 is through the price adjustment.

By design, neither of these equilibrium effects are present in the causal reduced-form

results from randomly pairing entrepreneurs, though this random matching similarly

micro-founds the model from which the aggregate results are derived. We next turn to

the relationship between empirical moments from this RCT and the at-scale gains derived

from the model.

5.2 Relationship Between Treatment Effect and Aggregate Gains

We use the model to study the mapping between reduced-form RCT results and aggregate

gains. This requires replicating the RCT in the model.

Running the RCT in the Model We draw a set of treatment and control firms with profit

identical to our empirical distributions detailed in Section 3 from the baseline steady state

distribution M∗(z). Then we draw another set of agents – the matches – again conforming

to our empirical distribution. We randomly match our treatment agents with a match,

and track the evolution of both treatment and control for 6 quarters. Throughout, we

hold M∗ fixed so that the treatment does not result in control group contamination as

required for causal interpretation.

Quantitative Exercise and Results We then ask the relationship between the RCT results

and at-scale gains, and the importance of separately identifying the diffusion parameters

(β, ρ, θ). We do so in the following way. We fix the average treatment effect (ATE)

then vary the intensive margin parameters (β, ρ) ∈ [0.20, 0.95] × [0.20, 0.95].26 At each

25Recall from Proposition 3 that occupational choice is defined by a cut-off rule. All agents with ability z < z ∝ wbecome workers. Increasing the wage therefore shifts those entrepreneurs near the cut-off into wage work.

26Below β = 0.16, the estimation of θ may not be feasible at certain levels of the assumed treatment effect. This occursbecause the average treatment effect falls outside the bounds formalized in Proposition 2. We therefore constrain (β, ρ)

26

(β, ρ) we re-estimate θ to match the assumed ATE. We then compute the aggregate gains

from shocking the matching technology as described above. This exercise therefore gives

the range of possible aggregate outcomes that could be achieved by assuming intensive

margin parameters (β, ρ) instead of utilizing heterogeneity as an estimating moment.

Figure 4 presents the results at our baseline ATE. As (β, ρ) counterfactually decrease,

the model is forced to rationalize the ATE via the extensive margin. Since firms now

(counterfactually) gain little from a good match, the model instead requires control firms

to receive worse baseline matches, thus increasing the implied value of the treatment.

Thus, bias in (β, ρ) induces bias in θ to match the ATE. This is the quantitative coun-

terpart to the theoretic discussion in Section 2 and detailed in Figure 4a. Figure 4b

shows that the choice of (β, ρ) has critical implications for measuring aggregate gains,

even conditional on matching the ATE. The aggregate gains vary from 0.6 to 38 percent,

a 63-fold range.

Figure 4: Aggregate Implications of Varying (β, ρ) at Baseline ATE

(a) Implied θ (b) Aggregate Gains

Figure notes: For each value of (β, ρ), (a) plots the implied θ that guarantees the model hits our baselineATE. (b) then plots the implied aggregate gains from scaling the intervention as a percentage change inaverage income from the baseline equilibrium (i.e., 0.4 = 40%).

We next replicate the same exercise, but do so for every ATE between 0.01 and

70 percent. These results are in Figure 5, which plots the band of possible aggregate

outcomes for each ATE. Our 11 percent aggregate increase in income can be derived in

economies with nearly any ATE, depending on how one sets (β, ρ). Moreover, the wide

range of possible outcomes shows that it would be relatively simple to come up with a set

of economies in one could observe a negative relationship between the RCT treatment

above 0.20 to guarantee we are always comparing within the same set of parameters, instead of allowing the feasible set ofparameters to vary with the assumed treatment effect.

27

effect and the gains from scale.

Figure 5: Relationship between average treatment effect and gains at scale

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure notes: The shaded region shows all possible realizations of aggregate gains that can be achieved while holding theaverage treatment effect fixed by re-estimating the extensive margin parameter θ for each (β, ρ) combination. Multiply by100 for percentage gains.

Why do we see this type of variation in the model and how can it help us understand

how to use intervention-level evidence to measure aggregate potential? Figure 6 starts

with some mechanics. Holding all other parameters fixed, we vary θ (Figure 6a) and β

(Figure 6b).27 We then study how both the aggregate gains at scale and the RCT-level

treatment effect vary.

The patterns in Figure 6 show why our headline results take the shape they do.

While the RCT-level treatment effects are primarily driven by θ, the aggregate gains

are primarily driven by β (and, to a lesser extent, are reinforced by ρ). Thus, when we

vary β away from its estimated value, the impact on the ATE can be easily made-up

for by varying θ. Yet this adjustment of θ has little impact on the aggregate outcomes,

allowing the direct effect of β to dominate. This gives us the result that at-scale gains

can vary widely for a given ATE: the two moments are driven quantitatively by different

parameters in the model.

The Link Between Reduced-Form Heterogeneity and Aggregate Gains Why is β so crit-

ical here and how does it relate economically to measuring variation in the treatment

effect? We detail that relationship here.

The driving force behind aggregate diffusion models is the mass of agents in the right

tail of the ability distribution. At an intuitive level, the rationale is simple: if agents learn

27We omit ρ in the interest of clarity, as β plays a larger role. A similar pattern emerges for ρ, though with smallermagnitudes.

28

Figure 6: Variation at RCT and Aggregate Levels by Parameters

(a) θ

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1-2

-1

0

1

2

3

4

5AggregateRCT

(b) β

0 0.2 0.4 0.6 0.8 1-2

-1

0

1

2

3

4

5AggregateRCT

Figure notes: Each figure varies the listed parameter while holding the remaining parameterizationfixed at its baseline value. “RCT” is the implied average treatment effect, while “Aggregate” is thegains from scaling the intervention as defined in the text. Results are presented as growth rates g :=(E[ypost]/E[ypre])−1, then normalized relative to our baseline parameterization as g/gbaseline so that alllines cross through 1.

from those with higher ability, a heavier-tailed distribution will facilitate more learning.

But this distribution is an equilibrium object and therefore requires some mechanism by

which agents move into the tail of the distribution. Put differently, it requires that if an

average agent meets a high-ability agent, she must be able to internalize a large portion

of that ability.

This is exactly what the parameter β governs in the model, and its ability to shift the

ability distribution is crucial for its role in the at-scale gains observed here. To see this,

Figure 7 plots how the ability distribution responds to the improved matching technology

under changes for β and, for comparison, θ. Figures (a) and (b) plot the survival function

1−M(z), or the fraction of the population whose ability is larger than z. Both axes are

measured in logs.28 We do so under our baseline parameterization, then also vary β and

θ to make them 50 percent closer to their theoretical maximum of one (i.e., (1-0.769)/(1-

0.538) = 0.5 in Figure 7a). Figure 7c shows how the difference varies for each of these

two parameters.

While at-scale results rely on what happens if an agent meets another high-ability

agent, the RCT dispenses with the uncertainty and measures what happens when an

agent meets with another high-ability agent. That is, the RCT allows us to measure the

relative impact of a good match within a set of exogenously provided matches. Within

this fixed set of matches we provide, we estimate β by measuring how many agents get

28If the distribution of ability were Pareto, for example, the line would have a constant slope in this graph.

29

Figure 7: Equilibrium Survival Function as Parameters Change

(a) Increasing β

-2 0 2 4 6-10

-8

-6

-4

-2

0

(b) Increasing θ

-2 0 2 4 6-10

-8

-6

-4

-2

0

(c) ∆ Change in Ability Distribution as ParametersChange

-2 0 2 4 6-0.2

-0.1

0

0.1

0.2

0.3

Figure notes: Figures (a) and (b) plot the ability distribution at the baseline and improved matchingtechnology, measured as the log survival function. Figure (c) plots the difference in the mass across theability distribution. This is measured as the difference-in-difference of the curves plotted in (a) and (b).That is, denoting S = log(1−M(z)) as the log survival function, it plots ∆S = (Snewx′ −Sbasex′ )− (Snewx −Sbasex ) where x ∈ {θ, β} and x′ is the counterfactually higher level of that parameter.

pushed into the tail of the distribution. Figure 8 plots the survival function for treated

firms, based on two levels of β.29

29As a more explicit example, assume treated firms are distributed z ∼ logN(µz , σz), their matches z ∼ logN(µz , σz),and the exogenous noise ε ∼ N(µε, σε). Because our treatment matches are drawn from a small part of the right tailin the economy-wide ability distribution, we have µz > µz and σz < σz . Ex post ability for treated firms is givenby z′ = exp(c + ε)zρ(z/z)β after imposing our assumptions. This implies that z′ will be approximately distributed asz′ ∼ logN(µz′ , σz′ ), where µz′ := c + (ρ − β)µz + βµz + µε and σz′ := (ρ − β)σz + βσz + σε. This is not exact becausethe lognormal assumption does not generically allow for zi > zi among all treated firms, but the overlap will be small ifthe distributions are substantially different. Therefore, by shifting weight toward the match mean and variance, higherβ increases the mean and decreases the variance of the ex post treatment firm profit. Said differently, the treatmentcompresses ability in the right tail, which is the theoretical counterpart to Figure 8.

30

Figure 8: RCT Treatment Firms Survival Function

0 2 4 6-8

-6

-4

-2

0

Figure notes: The horizontal axis is ability. The vertical axis is the survival function of treated firms, or 1 minus thecumulative distribution function of ability. Both axes are measured in logs. The distribution of ability is the ex post abilitydistribution of only treated firms.

5.3 Dynamic Treatment Patterns

Our results so far highlight the critical importance of the intensive margin – the param-

eters governing the gains from a given match – for aggregate outcomes. How important

are these parameters for governing other reduced form moments? We study that question

here in the context of a unique aspect of our empirical results – the quick fade out of the

ATE discussed in Section 3. One seemingly ex ante reasonable takeaway from this result

is that there are limited gains from scaling the intervention. We study whether such an

interpretation is warranted here and find that it is not.

We start with an out-of-sample test of the model, asking whether it can match the

empirical time series of the average treatment effect over 5 quarters. These results are

in Figure 9. While the first two quarters (t = 0 and t = 1) are matched by construction,

both the model and data predict no treatment effect by t = 3. The model under-predicts

the effect in t = 2, so if anything, the model understates the partial equilibrium RCT

dynamics. An immediate consequence is that the short half-life of the treatment effect

need not necessarily be an indication of the diffusion-induced gains at scale.

31

Figure 9: Dynamics of Average Treatment Effect on Profit

0 1 2 3 4 5

Quarters Since Treatment

-0.4

-0.2

0

0.2

0.4

0.6

0.8Data

Model

Yet the relationship between ATE persistence and at-scale gains turns out to be even

more strained. Figure 10 varies β, a key parameter governing at scale-gains, and studies

how the implied treatment effect varies over time. We do so at our baseline persistence

ρ = 0.595 (Figure 10a) and a high persistence ρ = 0.99 (Figure 10b).30

Figure 10: Relationship between Treatment Persistence and Diffusion Intensity β

(a) At Estimated Persistence ρ = 0.595

0 1 2 3 4 5


0

0.1

0.2

0.3

0.4

0.5 β = 0.16

β = 0.538

(b) At High Persistence ρ = 0.99

0 1 2 3 4 5


0

0.1

0.2

0.3

0.4

0.5 β = 0.16

β = 0.538

Figure 10 shows that higher β (which implies larger at-scale gains) generates quicker

fade-out of the treatment effect. This has important implications for its use: the ATE

persistence may be negatively correlated the magnitude of the gains at-scale, much in the30We follow a similar procedure to our earlier exercise. We set (β, ρ), then re-estimate θ to match the same 1-quarter

ATE. This focuses the discussion on the fade-out from the same 1-quarter treatment effect.

32

same way the level of the ATE can be. The result reinforces the importance of measuring

this parameter directly for aggregate implications.

The intuition for this result stems from the fact that control firms continue to engage

in matches throughout the study period. A higher β allows them to internalize a larger

portion of a good match’s ability. As more and more control firms eventually meet a

high-quality match over time, a higher β allows them to more quickly catch treated

firms.31

We continue this discussion in the Appendix by conducting the same exercise with

a different experiment (Cai and Szeidl, 2018) in which gains persist longer, finding that

we can similarly replicate the persistence of those gains in that context because our

estimation procedure implies different parameters.

5.4 Discussion

Before concluding, we discuss a number of extensions we take up in the Appendix. De-

tailed results, and others discussed throughout the text, are available there.

First, we extend the identification results in a number of directions. These are dis-

cussed in Section 2.5, and include measurement error, generalizations of the ability law

of motion, and broader relationships between observables and ability.32 The robustness

of this procedure highlights the useful relationship between the experimentally-induced

data and estimation of hard-to-observe parameters in equilibrium models.

We then provide a number of additional quantitative results. First, and related to the

identification discussion, we quantify the bias induced by mismeasured profit. Our results

are a lower bound on the at-scale gains. Idiosyncratic distortions naturally generate a

similar, but not identical, result.

Second, we solve the social planner’s problem to study the gains that could be achieved

by transitioning the economy to its efficient allocation. This requires solving for efficient

consumption, search behavior, and occupational choice of all agents, taking into account

the diffusion externality generated by this allocation.33 Such gains are likely impossible

to achieve practically, but do provide a hypothetical upper bound from the optimal

management of diffusion. We show theoretically how the planner takes advantage of

complementarity between ability and search effort to tilt search toward high-ability agents

and simultaneously limits the number of firms to increase learning quality. She uses these

31Note also that we hold the aggregate state of the economy fixed when running the RCT in the model. Therefore theresult is not due to control group contamination.

32It turns out that under some additional conditions, the results can be pushed even further. Evdokimov (2010) providesnonparametric identification separating individual fixed effects from transitory errors in nonlinear models. His results couldbe used to extend the model and identification to arbitrary heterogeneity in some fixed, non-transmittable aspect of ability,though practical questions of power start to become more relevant in this case.

33As is common in aggregate diffusion models, our baseline economy is inefficient. This occurs in our model becauseindividuals do not take into account the effect of their occupational choices on the learning opportunities of other agents.

33

gains to subsidize consumption for those newly-excluded entrepreneurs (i.e., those who

run businesses in the laissez faire equilibrium). We also find that welfare gains are largest

at intermediate levels of β. This is a function of a two competing forces governed by this

parameter. First, all else equal, the planner has more scope for gains at higher β. Second,

however, general equilibrium effects at higher β incentivize agents to make decisions more

in line with the planner’s goals, thus lowering the gains from further intervention. We

detail this tradeoff and how it varies across the parameter space. These results deliver

a similar message as the main text: measuring the various forces that inform aggregate

gains is critical.

Third and finally, we use the work of Cai and Szeidl (2018) to apply our results to

a different experiment, in which gains from matches persist longer than in our baseline

RCT. Despite using a similar model to the one in the main text, we replicate the per-

sistence of their treatment effect. The reason is because our estimation procedure infers

different parameters from different patterns of treatment effect heterogeneity. The results

highlight the flexibility of the procedure, in that each model need not require a new esti-

mation strategy for key elasticities. Moreover, it shows that understanding how various

empirical moments inform these parameters plays an important role in understanding

what can and cannot be inferred directly from the empirical results.

6 Conclusion

Modern economic development research is grounded in well-identified studies that allow

a detailed snapshot of specific interventions and economic environments. Yet, sustained

economic growth requires understanding how such results inform aggregate policy. In a

recent review, Buera et al. (2021a) highlight the importance of this link.

This paper focuses on the relationship between firm-level interactions at the micro

and aggregate level, showing how the former can be misleading about the latter once

equilibrium forces are taken into account. We provide a solution, in that measuring a

type of treatment effect heterogeneity can provide a differentiating moment that allows

one to draw a ranking between intervention-level results and potential aggregate gains.

In our Kenyan application, we show the quantitative importance of doing so. Mis-

measuring the relative importance of these forces can cause the aggregate gains can vary

substantially despite identical average treatment effects. We further show how attempting

to extrapolate from other RCT moments – such as the persistence of the treatment effect

– suffer similar issues driven by same forces.

We note that the results leave open questions for future work. First, we leave out

34

additional interesting features that may arise at scale, such as the potential for better

interaction to be dampened by increased competition. Even without such model features,

our results show that moving from micro to aggregates is already a nuanced task. Second,

understanding the exact details of how firms interact and the frictions they face will allow

a more detailed and targeted policy approach. Different field experiments, designed with

an eye toward aggregate theory, could further refine our understanding of key aggregates

governed by a number of difficult-to-measure elasticities that affect these at-scale gains.

References

Akbarpour, Mohammad, Suraj Malladi, and Amin Saberi, “Just a Few Seeds

More: Value of Network Information for Diffusion,” 2020. Working Paper.

Alatas, Vivi, Abhijit Banerjee, Arun G. Chandrasekhar, Rema Hanna, and

Benjamin A. Olken, “Network Structure and the Aggregation of Information: The-

ory and Evidence from Indonesia,” American Economic Review, 2016, 106 (7), 1663–

1704.

Atkin, David, Amit K. Khandelwal, and Adam Osman, “Exporting and Firm

Performance: Evidence from a Randomized Experiment,” Quarterly Journal of Eco-

nomics, 2017, 132 (2), 551–615.

Banerjee, Abhijit, Emily Breza, Arun G. Chandrasekhar, Esther Duflo,

Matthew O. Jackson, and Cynthia Kinnan, “Changes in Social Network Struc-

ture in Response to Exposure to Formal Credit Markets,” 2020. Working Paper.

Beaman, Lori, Ariel BenYishay, Jeremy Magruder, and A. Mushfiq Mobarak,

“Can Network Theory-based Targeting Increase Technology Adoption?,” American

Economic Review, 2020. forthcoming.

Benhabib, Jess, Jesse Perla, and Christopher Tonetti, “Reconciling Models of

Diffusion and Innovation: A Theory of the Productivity Distribution and Technology

Frontier,” Econometrica, 2020. forthcoming.

Berguist, Lauren, Benjamin Faber, Matthias Hoelzlein, Edward Miguel, and

Andres Rodriguez-Claire, “Scaling Agricultural Policy Interventions: Theory and

Evidence from Uganda,” April 2019. Working Paper.

Blattman, Christopher and Laura Ralston, “Generating employment in poor and

fragile states: Evidence from labor market and entrepreneurship programs,” June 2017.

Working Paper.

35

Bloom, Nicholas and John Van Reenen, “Measuring and Explaining Management

Practices Across Firms and Countries,” Quarterly Journal of Economics, 2007, 72 (4),

1351–1408.

Brooks, Wyatt, Kevin Donovan, and Terence R. Johnson, “Mentors or Teachers?

Microenterprise Training in Kenya,” American Economic Journal: Applied Economics,

2018, 10 (4), 196–221.

Bruhn, Miriam, Dean Karlan, and Antoinette Schoar, “What Capital Is Missing

in Developing Countries?,” American Economic Review Papers and Proceedings, 2010,

100 (2), 629–633.

Buera, Francisco J. and Ezra Oberfield, “The Global Diffusion of Ideas,” Econo-

metrica, 2020, 88 (1), 83–114.

and Robert E. Lucas, “Idea Flows and Economic Growth,” Annual Review of

Economics, 2018, 10, 315–345.

, Joseph P. Kaboski, and Robert M. Townsend, “From Micro to Macro Devel-

opment,” 2021. Working Paper.

, , and Yongseok Shin, “The Macroeconomics of Microfinance,” Review of Eco-

nomic Studies, 2021, 88 (1), 126–161.

Cai, Jing and Adam Szeidl, “Interfirm Relationships and Business Performance,”

Quarterly Journal of Economics, 2018, 133 (3), 1229–1282.

Caicedo, Santiago, Robert E. Lucas Jr., and Esteban Rossi-Hansberg, “Learn-

ing, Career Paths, and the Distribution of Wages,” American Economic Journal:

Macroeconomics, 2019, 11 (1), 49–88.

Comin, Diego and Bart Hobijn, “An Exploration of Technology Diffusion,” American

Economic Review, 2010, 100 (3), 2031–2059.

Engbom, Niklas, “Misallocative Growth,” 2020. Working Paper.

Evdokimov, Kirill, “Identification and Estimation of a Nonparametric Panel Data

Model with Unobserved Heterogeneity,” 2010. Working Paper.

Fafchamps, Marcel and Simon Quinn, “Networks and Manufacturing Firms in

Africa: Results from a Randomized Field Experiment,” World Bank Economic Re-

view, 2018, 32 (3), 656–675.

Fried, Stephie and David Lagakos, “Electricity and Firm Productivity: A General

Equilibrium Approach,” 2021. Working Paper.

36

Fujimoto, Junichi, David Lagakos, and Mitch Vanvuren, “The Aggregate Effects

of “Free” Secondary Schooling in the Developing World,” 2021. Working Paper.

Greenwood, Jeremy, Philipp Kircher, Cezar Santos, and Michele Tertilt, “An

Equilibrium Model of the African HIV/AID Epidemic,” Econometrica, 2019, 87 (4),

1081–1113.

Hardle, Wolfgang, Hua Liang, and Jiti Gao, Partially Linear Models, Springer-

Verlag Berlin Heidelberg, 2000.

Herkenhoff, Kyle, Jeremy Lise, Guido Menzio, and Gordon Phillips, “Knowl-

edge Diffusion in the Workplace,” July 2018. Working Paper.

Hopenhayn, Hugo A., “Entry, Exit, and Firm Dynamics in Long Run Equilibrium,”

Journal of Political Economy, 1992, 60 (5), 1127–1150.

Hsiao, Cheng, “Consistent estimation for some nonlinear errors-in-variables models,”

Journal of Econometrics, 1989, 41, 159–185.

Hsieh, Chang-Tai and Peter J. Klenow, “Misallocation and Manufacturing TFP in

China and India,” Quarterly Journal of Economics, 2009, 124 (4), 1403–1448.

Imbens, Guido and Karthik Kalyanaraman, “Optimal Bandwidth choice for the

Regression Discontinuity Estimator,” Review of Economic Studies, 2012, 79 (3), 933–

959.

IPUMS, “Minnesota Population Center. Integrated Public Use Micro-

data Series, International: Version 7.3 [dataset],” Minneapolsi, MN.

https://doi.org/10.18128/D020.V7.2 2020.

Jarosch, Gregor, Ezra Oberfield, and Esteban Rossi-Hansberg, “Learning from

Coworkers,” Econometrica, 2020. forthcoming.

Jovanovic, Boyan and Rafael Rob, “The Growth and Diffusion of Knowledge,” Re-

view of Economic Studies, 1989, 56 (4), 569–582.

Klenow, Peter and Andres Rodrıguez-Clare, “The Neoclassical Revival in Growth

Economics: Has It Gone Too Far?,” in Ben S. Bernanke and Juliio Rotemberg, eds.,

NBER Macroeconomics Annual 1997, 1997, pp. 73–114.

Lashkari, Danial, “Innovation Policy in a Theory of Knowledge Diffusion and Selec-

tion,” 2020. Working Paper.

Li, Tong, “Robust and consistent estimation of nonlinear errors-in-variables models,”

Journal of Econometrics, 2002, 110, 1–26.

37

Lu, Will Jianyu, “Transport, Infrastructure and Growth: Evidence from Chinese

Firms,” 2021. Working Paper.

Lucas, Robert E., “Ideas and Growth,” Economica, 2009, 76 (301), 1–19.

and Benjamin Moll, “Knowledge Growth and the Allocation of Time,” Journal of

Political Economy, 2014, 122 (1), 1–51.

McKenzie, David and Christopher Woodruff, “Business Practices in Small Firms

in Developing Countries,” Management Science, 2017, 63 (9), 2967–2981.

, , Kjetil Bjorvatn, Miriam Bruhn, Jing Cai, Juanita Gonzalez-Uribe,

Simon Quinn, Tetsushi Sonobe, and martin Valdivia, “Training Entrepreneurs,”

VoxDevLit, 2020, 1 (1).

Melitz, Marc J., “The Impact of Trade on Intra-Industry Reallocations and Aggregate

Industry Productivity,” Econometrica, 2003, 71 (6), 1695–1725.

Munshi, Kaivan, “Information Networks in Dynamic Agrarian Economies,” in T. Paul

Schultz and John Strauss, eds., Handbook of Development Economics, 2007, pp. 3085–

3113.

Nakamura, Emi and Jon Steinsson, “Identification in Macroeconomics,” Journal of

Economic Perspectives, 2018, 32 (3), 59–86.

Perla, Jesse and Christopher Tonetti, “Equilibrium Imitation and Growth,” Journal

of Political Economy, 2014, 122 (1), 52–76.

, , and Michael E. Waugh, “Equilibrium Technology Diffusion, Trade, and

Growth,” 2021, 111 (1), 73–128.

Prescott, Edward C., “Needed: A Theory of Total Factor Productivity,” International

Economic Review, 1998, 39 (3), 525–551.

Romer, Paul M., “Endogenous Techniological Change,” Journal of Political Economy,

1990, 98 (5), S71–S102.

Sampson, Thomas, “Dynamic Selection: An Idea Flows Theory of Entry, Trade, and

Growth,” Quarterly Journal of Economics, 2016, 131 (1), 315–380.

Schennach, Susanne M., “Estimation of Nonlinear Models with Measurement Error,”

Econometrica, 2004, 72 (1), 33–75.

, “Mismeasured and unobserved variables,” in Steven N. Durlauf, Lars Peter Hansen,

James J. Heckman, and Rosa L. Matzkin, eds., Handbook of Econometrics, 2020,

pp. 487–565.

38

Van Patten, Diana, “International Diffusion of Technology: Accounting for Heteroge-

neous Learning Ability,” 2020. Working Paper.

Wallskog, Melanie, “The Mechanisms and Implications of Entrepreneurial Spillovers

Across Coworkers,” 2021. Working Paper.

Yatchew, A., “An elementary estimator of the partial linear model,” Economic Letters,

1997, 57 (2), 135–143.

39

Table of Contents for Online Appendix

A Additional Details from the Main Text 42

A.1 Balance Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

A.2 Empirical Impact on More Productive Member of the Match . . . . . . . . 44

A.3 Parameter Targets and Values from Calibration . . . . . . . . . . . . . . . 46

A.4 Increasing θ to match RCT treatment effect . . . . . . . . . . . . . . . . . 47

B Examples of Different Matching Processes 48

B.1 Noise in the Imitation Process . . . . . . . . . . . . . . . . . . . . . . . . . 48

B.2 Effort Choice and Bargaining . . . . . . . . . . . . . . . . . . . . . . . . . 48

B.3 Deterministic Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

B.4 Cost to Receive a Match . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

C Additional Results: Extensions of Identification Procedure 52

C.1 Semi-parametric identification . . . . . . . . . . . . . . . . . . . . . . . . . 52

C.2 More general relationship between observables and z . . . . . . . . . . . . 52

C.3 Mis-measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

C.4 Additional characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

D Additional Results: Social Planner’s Problem 56

D.1 Quantitative Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

E Additional Results: Quantitative Implications of Mismeasurement 63

E.1 Estimation Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

E.2 Updated Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

E.3 Quantitative Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

F Additional Results: Using the RCT of Cai and Szeidl (2018) 66

F.1 RCT Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

F.2 Overview of Empirical Results . . . . . . . . . . . . . . . . . . . . . . . . . 66

F.3 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

F.4 Estimating Diffusion Parameters . . . . . . . . . . . . . . . . . . . . . . . 69

F.5 Remaining Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

F.6 Treatment Effect Persistence . . . . . . . . . . . . . . . . . . . . . . . . . 73

F.7 Quantitative Gains at Scale . . . . . . . . . . . . . . . . . . . . . . . . . . 73

G Proofs 76

G.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76


40


G.4 Proof of Proposition 7 (from Social Planner’s Problem) . . . . . . . . . . . 79

41

A Additional Details from the Main Text

A.1 Balance Tests

Our basic balance check is

yi0 = α0 + α1Ti + εi,

where yi0 is the baseline outcome for individual i and Ti = 1 if i is eventually treated.

Those results are in Table 3.

Table 3: Balance Test (from Brooks et al., 2018)

Control Mean Mentor - Control(1) (2)

Firm Scale:Profit (last month) 10,054 -975.25

(1186.76)

Firm Age 2.39 -0.05(0.23)

Has Employees? 0.21 -0.02(0.05)

Number of Emp. 0.21 0.02(0.06)

Business Practices:Offer credit 0.74 -0.02

(0.06)

Have bank account 0.30 -0.03(0.06)

Taken loan 0.14 -0.05(0.04)

Practice accounting 0.11 0.00(0.04)

Advertise 0.07 0.04(0.03)

Sector:Manufacturing 0.04 -0.03

(0.02)

Retail 0.69 0.03(0.06)

Restaurant 0.14 -0.02(0.05)

Other services 0.17 0.06(0.05)

Owner Characteristics:Age 29.1 -0.25

(0.64)

Secondary Education 0.51 -0.00(0.06)

Table Notes: Columns 1-2 are the coefficient estimates from the regression yi = α + βTi + εi, where Ti is an indicator for

treatment. Column 1 is α. Statistical significance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***.

42

We conduct the second balance test

yi0 = α0 +∑

j=L,M,H

αjTij + εi

where the indicator now depends on whether firm i is a treatment firm matched with a

bottom 25th percentile (denoted MiL), 25-75 percentile (TiM), or top 25 percentile firm

(TiH) in terms of baseline profitability.34 Table 4 reports the results. The only significant

difference is in age, and the magnitude is small.

Table 4: Balancing Test at Baseline

Control Mean TL - Control TM - Control TH - Control(1) (2) (3) (4)

Firm Scale:Profit (last month) 10,054 -732.65 -1337.06 -760.08

(1314.56) (1393.38) (2128.41)

Firm Age 2.39 0.04 -0.19 0.08(0.28) (0.30) (0.46)

Has Employees? 0.25 -0.10 -0.07 0.10(0.07) (0.07) (0.11)

Number of Emp. 0.23 -0.05 0.00 0.18(0.08) (0.08) (0.13)

Business Practices:Offer credit 0.74 -0.07 0.04 -0.03

(0.07) (0.08) (0.12)

Have bank account 0.30 -0.04 -0.05 0.06(0.07) (0.08) (0.12)

Taken loan 0.14 -0.07 -0.06 0.03(0.05) (0.05) (0.08)

Practice accounting 0.01 -0.01 0.01 -0.01(0.01) (0.02) (0.02)

Advertise 0.07 0.04 0.01 0.11(0.05) (0.05) (0.07)

Sector:Manufacturing 0.04 -0.02 -0.04 -0.04

(0.02) (0.03) (0.04)

Retail 0.69 -0.03 0.00 -0.10(0.08) (0.08) (0.12)

Restaurant 0.14 -0.06 0.00 0.03(0.05) (0.06) (0.09)

Other services 0.17 0.09 0.02 0.07(0.06) (0.07) (0.10)

Owner Characteristics:Age 29.1 0.92 -1.88 0.50

(0.79) (0.84)∗∗ (1.28)

Secondary Education 0.51 0.02 -0.08 0.13(0.08) (0.09 (0.13)

Table notes: Columns 1-4 are the coefficient estimates from the regression above, with column one being the estimate of theconstant α0. Statistical significance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***. All constants are significant at onepercent.

34We have experimented with a number of different ways to compute the balance table, and all show the same results.

43

A.2 Empirical Impact on More Productive Member of the Match

Since the more productive members of treatment matches were not randomly selected,

we require a different approach to identify any effect on these business owners. Brooks et

al. (2018) details these results, but Figure 11 reproduces some key results for simplicity’s

sake. Figure 11 suggests no statistically discernible discontinuity around the cutoff.

Figure 11: Profit for mentors and non-mentors (from Brooks et al., 2018)

(a) Full sample

−5000

05000

10000

15000

Pro

fit (K

sh)

−2 −1 0 1 2 3Normalized running variable

n=172, bins=15

(b) IK Optimal Bandwidth

−5000

05000

10000

15000

Pro

fit (K

sh)

−1 −.5 0 .5 1Normalized running variable

n=103, bins = 15

Figure notes: Figure 11 plots profit along with a fitted quadratic and its 95 percent confidence interval. Figure 11a usesthe entire sample, while Figure 11b uses the Imbens and Kalyanaraman (2012) procedure to choose the optimal bandwidth.Both use 15 bins on either side of the cutoff.

We next test this more formally. In particular, letting ε be the cut-off value for

mentors, we run the regression

πi = α + τDi + f(Ni) + νi (A.1)

where πi is profit, Di = 1 if individual i was chosen as a mentor (εi ≥ ε, f(Ni) is a

flexible function of the normalized running variable Ni = (εi − ε)/σε, and νi is the error

term. The parameter τ captures the causal impact of being chosen as a mentor. We use

local linear regressions to estimate the treatment effects on profit and inventory, along

with business practices of record keeping and marketing. The results are in Table 5, and

we find that being a mentor has no statistically significant effect on profits. Moreover,

there is no change in marketing or record-keeping practices, which one might associate

with ability. There is some evidence that inventory spending decreases, but it cannot

be statistically distinguished from zero. Overall, we find little evidence that entering

into a match changes either business scale or business practices for the more productive

member of the match. This is consistent with the max function in the forward equation

44

for ability (equation 2.1), which is assumed here and in much of the existing literature.

Table 5: Regression discontinuity results for matched firm treat-ment effect (Brooks et al., 2018)

Percent of IK Scale Practices

optimal bandwidth Profit Inventory Marketing Record

keeping

100 -503.18 -3105.87 0.01 0.02

(1321.82) (2698.11) (0.11) (0.18)

150 300.19 -2585.22 0.01 0.07

(1407.26) (2291.34) (0.09) (0.14)

200 322.09 -123.59 0.01 0.10

(1324.17) (1964.08) (0.08) (0.13)

Treatment Average 4387.34 8435.79 0.08 0.85

Control Average 1794.09 4039.20 0.13 0.63

Table notes: Statistical significance at 0.10, 0.05, and 0.01 is denotedby *, **, and, ***. Profit and inventory are both trimmed at 1 percent.

45

A.3 Parameter Targets and Values from Calibration

Table 6: Targets and Parameter Choices

Model Parameter Description Parameter Target Moment Source Target Model

Value Value Value

Group 1 From RCT

β Intensity of diffusion 0.538 Estimated parameter from regression (2.3) RCT results 0.538 0.538

ρ Persistence of ability 0.595 Estimated parameter from regression (2.3) RCT results 0.595 0.595

θ Match technology “quality” -0.417 Treatment effect in quarter 2 (as % above control) RCT results 0.403 0.403

Group 2 Matched one-to-one with parameter

δ Death rate of firms 0.016 Average age of baseline business owners Baseline survey 0.09 0.09

σ0 St. dev. of new entrant ability distribution 0.961 Variance of log profit among new entrants Baseline survey 1.00 1.00

ν Firm bargaining weight 0.50 Set exogenously – – –

Group 3 Jointly targeted

σ St. dev. of exogenous ability shock distribution 0.75 Standard deviation of log profit in all firms Baseline survey 0.99 0.99

c Growth factor in ability evolution -1.92 Ratio of average profit of all firms to new entrants Baseline survey 1.51 1.51

ω Consumption utility weight 0.53 Fraction of employment in wage work IPUMS 0.48 0.48

α Ability elasticity in supplier search 0.36 Standard deviation of log inventory cost Baseline survey 1.61 1.61

η Ability elasticity in supplier search 0.05 Average cost ratio Baseline survey 0.13 0.13

Table notes: Group 1 is jointly chosen from the experimental data. Group 2 are also set to match baseline data moments, but match 1-1 with target moments. Parametersin Group 3 are calibrated to jointly match moments.

46

A.4 Increasing θ to match RCT treatment effect

In the main text, we increase θ from θ = −0.417 to θ = −0.063. We replicate the results

here under a different quantitative experiment: we increase θ such that the policy change

induces the same partial equilibrium effect in the model as in our empirical results.

Specifically, we create a treatment and control group identical to our empirical RCT. We

then shock the treatment group by allowing them to draw from a new source distribution

M(z; θT ) instead of M(z; θ). We set θT such that running the RCT in the model delivers

an identical treatment effect to our empirical results.35 This implies that θ increases to

θ = 0.512.

Table 9 replicates Table 2 from the main text under this different quantitative exper-

iment.


(1) (2) (3)

Baseline Fixed Wage New Equilibrium

Income 1.00 1.35 1.59

Ability 1.00 1.38 1.67


Wage 1.00 1.00 1.76

Table notes: Column (1) is the initial equilibrium, normalized to one. (2) and (3) report thenew stationary equilibrium after shocking the matching technology, where (2) holds the wagefixed at its baseline level and (3) allows it to adjust.

Overall, average income rises by 59 percent. When we decompose the aggregate

change, 57 percent of the total ability change comes from the direct effect on ability

(Column 2) and the remaining comes from the amplification through the wage. Similarly

for income, the direct effect accounts for 59 percent. The mechanisms for these changes

are discussed in the main text. Note, however, that these results point to a larger role

for price amplification (i.e., the price amplification accounts for 41 percent of the average

income gain here and 37 percent in the main text). This occurs because the induced

wage change is convex in the imposed θ change. Thus, the results point to a larger role

for price amplification as the aggregate policy change becomes larger.

35In the empirics, we let treated firms draw from the source distribution HT . The idea is the same, but does not require

the treatment source distribution to be in the same functional class as M .

47

B Examples of Different Matching Processes

In the main body of the paper, we provided two examples of matching processes that fall

under our assumptions, and we detail additional versions here.

B.1 Noise in the Imitation Process

An agent with ability z receives new arrivals of ideas that have two components: zm that

comes from a random match from another agent, and γ a random innovation on that idea.

Then z = γ1/θzm. Here, zm is a uniform draw from the distribution of productivities.

Then if γ has a cumulative density function given by Γ, then:

M(c) = Prob(z ≤ c) = Prob(zm ≤ cγ−1/θ) =

∫M(cγ−1/θ)dΓ(γ) (B.1)

B.2 Effort Choice and Bargaining

Each period, every agent characterized by ability z is matched to an agent that owns a

potential imitation opportunity zm as a uniform draw from the distribution of operating

firms M . The agent has an effort endowment of 1 that must be divided between imitation

and providing a utility benefit to the owner of the imitation opportunity zm. If z ≥ zm,

then no effort is put into imitation and z = z. If zm > z, then the agent and the owner

of the imitation opportunity must first agree on the distribution of effort, then the choice

of effort x and the values of z and zm together generate the value of z for the agent in

that period according to:

z =(zmz

)xz (B.2)

That is, by putting in more effort x ∈ [0, 1] the agent is able to close the gap between

their z and zm. The benefit to the owner of zm is given by the function b(x), which is

decreasing in x.

Agents and owners of imitation opportunities have one-off interactions and each re-

ceive 0 benefit if no agreement is made. They bargain over the assignment of the agent’s

effort between imitation and utility benefits for the owner of the imitation opportunity

according to a Nash bargaining problem where the bargaining weight of the agent is θ.

The bargaining problem is:

maxx∈[0,1]

([zmz

]xz)θb(x)1−θ (B.3)

48

Suppose that b(x) is given by b(x) = 1− x. Then it is easy to show that:

x = max

[0, 1− 1− θ

θ log(zm/z)

](B.4)

z = max[z, zme

1−1/θ]

(B.5)

As expected, the more bargaining power that the learning agents have, the greater is x,

resulting in greater z.

Note that, in the model, draws of imitation opportunities z < z are not useful. Hence,

the distribution M can be written, for any value c, as:

M(c) = Prob(z ≤ c) = Prob(zme1−1/θ ≤ c) = Prob(zm ≤ ce1/θ−1) = M(ce1/θ−1) (B.6)

or following the notation more standard in the paper:

∀z, M(z; z, θ) = M(ze1/θ−1) (B.7)

B.3 Deterministic Assignment

Here we consider a case where M arises when all agents can interact with one another

and sort into relationships endogenously. Suppose that every agent with ability z has the

option to influence any other agent that has ability z. Every agent can only be influenced

by one other agent each period, and they always prefer to be influenced by the highest

ability possible.

The utility of an agent with ability z influencing an agent with ability z is given by:

z

z− 1− 1

2θ

(z

z− 1

)2

(B.8)

That is, the agent with z gains benefit in proportion to how large the benefit is for

the other agent, but their cost is quadratic in the distance between their productivities.

For example, the influencer is happy when the other agent is helped by their influence,

but it takes more effort to influence when the distance between them is great. Therefore,

if there is a continuous distribution of z < z, the ideal agent that the influencer would

like to interact with has ability:

z∗(z) = z/(1 + θ) (B.9)

That is, the lower is the cost of influencing low ability firms, the deeper into the left tail

of the distribution is the agent willing to go.

However, since every agent can only be influenced by one agent each period and they

49

strictly prefer to be influenced by agents of higher ability, it is possible that (even if

the distribution is continuous) that the ideal agent for z is already matched to another

influencer. Therefore, intuitively, the probability distribution over assignment between

z and z is constructed by starting at the upper support of the distribution M , allowing

the highest ability firms to choose their most preferred matches, then descending down

through the distribution letting each firm choose to influence its preferred firm among

those remaining. Note that not all firms need have another firm to influence if their

utility from doing so be negative.

Formally, the probability distribution over imitation opportunities can be constructed

in the discretized case as follows, when the ability grid takes values z ∈ {z1, ...zN}, which

are ordered (i < j =⇒ zi < zj).

Define µ(z, z) as the measure of z influencing z (a N ×N matrix). We can construct

µ in the following steps given the measure µ of agents of each z type:

1. Let U(z, z) be the N × N matrix of utilities of z influencing z, and µ be a N × Nmatrix of zeros. Let µ be the N × 1 vector of unassigned influencers and µu be the

N × 1 vector of unassigned imitators. Set µ = µu = µ, n = N , and m = 1.

2. Let l be the m-argmax of U(·, zn). If U(zl, zn) ≤ 0, set µ(z1, zn) = µu(zn) and skip

to step 5.

3. If µ(zn) ≤ µu(zl), then µ(zn) = 0, µu(zl) = µu(zl) − µ(zn), and µ(zl, zn) = µ(zn).

Skip to step 5. Otherwise, go to 4.

4. If µ(zn) > µu(zl), then set µ(zl, zn) = µu(zl), µu(zn) = 0 and µ(zn) = µ(zn)−µu(zl).Set m = m+ 1 and return to step 2.

5. Set n = n− 1 and m = 1. If n = 0, go to step 6. Otherwise, go to step 2.

6. Set µ(·, z1) = µ(·, z1) + µu, and stop.

Given this matrix µ(z, z), the measure of assignments M is given by:

M(zi, zj) =

∑ik=1 µ(zj, zk)

µ(zj)

B.4 Cost to Receive a Match

Firms pay a cost to receive a uniform random draw from the productivity distribution.

We model this as a function f(s; θ), in that a firm that pays cost f(s; θ) receives a uniform

random draw with probability s. We assume that fθ > 0, so that the cost to achieve any

level of s is increasing in θ. In a stationary equilibrium, under standard conditions this

50

implies a stationary decision rule s(z,M∗; θ) with sθ < 0. We can write the distribution

of draws as

M(c; z, θ) = (1− s(z,M∗; θ)) + s(z,mM∗; θ)M∗(c)

≡ Q(c; z,M∗, θ)

Thus, in the stationary equilibrium of this economy, our same procedure goes through.

This example provides important context for our set of assumptions – they need not

be assumptions only on the primitives of the model. Additional assumptions, such as

stationarity, may guarantee the model is covered under our assumptions. The use of

stationarity here is similar to its role in the identification procedure of Jarosch et al.

(2020), who study learning among German co-workers.

51

C Additional Results: Extensions of Identification Procedure

In this Appendix, we focus on theoretical extensions of the main estimation procedure

in the paper to show that the procedure itself is robust to any number of extensions.

In Appendix E, we provide a quantitative evaluation of a particular extension related to

mis-measurement.

C.1 Semi-parametric identification

Assumption 1 laid out a function form for the law of motion of ability: z′(z, ε, z) =

ec+εzρ max{

1, zz

}β. We broaden this in Assumption 5 by replacing the max function,

Assumption 5. Given ability z this period, an imitation opportunity z, and a random

shock ε, ability next period z′ is given by

z′(z, ε, z) = ec+εzρf

(z

z

)(C.1)

Assuming f(z/z) = max{1, (z/z)}β gives us the original Assumption 1. Proposition 4

summarizes that we can instead estimate the function f using the same data-generating

process as in the main text.

Proposition 4. The data-generating process of Assumption 4 identifies (ρ, f) in equa-

tion (C.1) (while maintaining Assumptions 2 and 3 in the main text) by estimating the

regression

log(π′i) = c+ ρ log(πi) + f

(πiπi

)+ ε

This result follows directly from an established literature on partially linear regres-

sions. See, for instance, Yatchew (1997) on differencing estimators in this class of mod-

els.36 Hardle et al. (2000) provides a detailed review.

C.2 More general relationship between observables and z

In Assumption 2 we assume that some observable (e.g., profit) π has the characteristic

that π ∝ z. We extend that here.

Assumption 6. There exists a known function g : X → R++ that maps observable

characteristics x to ability z up to a potentially unknown constant of proportionality.

That is, g(x) = Cz for some potentially unknown constant C.

36The basic idea is that by ordering the data such that π1/π1 < π2/π2 . . . < πN/πN , once can difference out thenonlinear f in the limit under conditions that guarantee that the gap between i and i + 1 goes to zero. This allowsstraightforward OLS to estimate ρ. Then f can be estimated non-parametrically by any number of methods.

52

By assuming π ∈ x and g(x) = π, we recover the original assumption π ∝ z. But

Assumption 6 allows for more complicated possibilities. For example, one could estimate

a production function using the control group panel data. Such a procedure would imply

a mapping g(y, n, k) = Cz. That is, it takes output data y and input data for labor and

capital (n, k) (or any other input bundle) and infers the value z.37

Proposition 5. The parameters (β, ρ, θ) are identified when we replace Assumption 2

with Assumption 6.

This follows almost directly, as the regression

log(g(x′)) = c+ ρ log(g(x)) + β log

(max

{1,g(x)

g(x)

})+ ε,

is straightforwardly estimated with known g. By Assumption 6 this collapses to the

required equation that gives (β, ρ). The second step goes through with the same adjust-

ment. The key to our procedure is not the proportionality of any one variable with z,

but a proportional mapping between any set of observables and z.

C.3 Mis-measurement

We now assume that profit is mis-measured. This affects our estimation procedure,

introducing bias into the parameters. The regression error is not additively separable

from the true value in our non-linear model, which implies that standard instrumental

variable methods to correct linear measurement error no longer hold. Yet, there is a

substantial and active literature on mis-measurement in non-linear models that provides

a number of ways to overcome this issue.

We discuss this issue in the context of a more general ability law of motion, to

emphasize that it does not depend on the specific choices we have made on functional

forms. We provide a quantitative evaluation of these issues in Appendix E.

Assumption 7. The law of motion for diffusion can be written as

log(π′) =M∑j=1

βjgj(~π) + ε

where ~π = (π, π) for a known function (gj)Mj=1.

Note that we have already imposed the maintained ability to move between ability z

and profit π. The key additional assumption is an adjustment to Assumption 4, which

governs the type of data to which we have access.

37g is not the production function here, but is inferred from it.

53

Assumption 8. Profit is measured with error, and we observe two outcomes that are

mis-measured versions of the true value, ~π∗ = (π∗, π∗). We denote ~πk as the two entries

in the vector. We denote these observable values as (πk1 , πk2), k = 1, 2. The measurement

error is classical, so that for each individual i we observe

~πk1i = ~π∗ki + νk1i, k = 1, 2

~πk2i = ~π∗ki + νk2i, k = 1, 2

where ν1 and ν2 are unobserved disturbances. We assume the following relationships

between the measurement error and true values:

E[νk1 |π∗k, νk2 ] = 0, k = 1, 2

νk2 is independent from ~π∗, ν−k2 , where − k 6= k

The assumption of a repeated measurement opens up a suite of tools related to

repeated measurement adjustments in non-linear estimation. While other methods exist

to solve this problem, the existence of this approach has the benefit of almost always

being available in a firm-level survey.38

The basic idea behind this identification approach comes from Kotlarki’s lemma,

which in R1 is

φπ∗(t) = exp

(∫ t

0

E[iπ1eitπ2 ]

E[eitπ2 ]

)and φπ∗(t) is the characteristic function φπ∗(t) =

∫Reitπ

∗f ∗π(π∗)dx. An inverse Fourier

transform gives us the distribution of true values fπ∗ , which can be used to construct the

relevant estimator moments.

Our model requires ~π∗ = (π∗, π∗) ∈ R2, which introduces some complications. The

second part of Assumption 8 provides necessary conditions for identification in a multi-

dimensional nonlinear model.

Proposition 6. If E[|~πk|] and E[|ηk1 |] are finite, then there exists a closed form for any

function E[u(~π∗, β)] whenever it exists.

Proof. Provided in Schennach (2004). �

Schennach (2004) provides the details of the approach and how to develop an estima-

tor from Proposition 6. The key feature, however, is that this result allows a broad class

of extremum estimators to be deployed to identify β.

38For example, π1 could be profit asked directly while π2 could be measured as revenue minus costs. Another would beto use the same variable measured at two points in time, as highlighted by Schennach (2020). Finally, many models allowrelationships between input expenditures, revenue, and profit that could be utilized, albeit with more structure than wehave here. An example is a Cobb-Douglas production function with competitive factor markets.

54

C.4 Additional characteristics

One might also suspect that alternative characteristics influence learning. For example,

firm owners may retain more ability when meeting with another owner of a similar age.

This amounts to allowing β to depend on a set of characteristics of the firm owner x and

her match x. In this case, we can write

log(π′i) = c+ ρ log(πi) + β(x, x) log

(max

{1,πiπi

})or, binning characteristics X× X in some way,

log(π′ib) = c+ ρ log(πib) +B∑b=1

βb log

(max

{1,πibπib

})This regression identifies (ρ, β1, . . . , βB) under the same assumptions as in the main text.

The second step follows with a slight adjustment to Assumption 3. We assume there is

a discrete distribution over types Γb and, with a slight abuse of notation, re-write the

draws over types and profit as

M(z, b; z, θ) = Mb(z; z, θ)Γb

As long as Mb has the same properties as Assumption 3, the second step of the procedure

similarly identifies θ.

55

D Additional Results: Social Planner’s Problem

Here, we lay out the solution to the social planner’s problem, and show that it similarly

relies on properly measuring the intensive and extensive margins.

Before doing so, one conceptual issue to deal with is the role of suppliers. We exclude

them from our measure of welfare. In our context, these suppliers take their profits out of

Dandora, so that buying intermediates does indeed involve resources exiting the economy.

To operationalize this idea, we assume that the price paid follows the same solution as

the Nash bargaining problem solved in Proposition 3, in that the price px is given

px =

(1− η − ν(1− η − α)

α

)e−sz

α+η−1α .

We will write px(z, s) to denote this price.39

With those details, we are ready to define the planner’s problem. The social planner

allocates occupations o, firm inputs (x, n) and supplier search intensity s for each agent

to maximize ex ante utility, subject to the relevant aggregate resource constraints. Since

we will focus on the stationary equilibrium, we drop the dependence of the decision rules

on the aggregate state M for some notational simplicity. Defining the value to an agent

with ability z as

v(z) = ω log(c(z)) + (1− ω) log(1− s(z)) + (1− δ)∫ε

∫z

v

(ec+εzρ max

{1,z

z

}β)dM(z)dF (ε),

we can write the planner’s problem recursively as

maxo(·),c(·),x(·),n(·),s(·)

∫ ∞0

v(z)dM(z) (D.1)

s.t.

∫o(z)=1

(x(z)αn(z)η − px(z, s(z))x(z)

)dM(z) =

∫ ∞0

c(z)dM(z) (D.2)∫o(z)=1

n(z)dM(z) =

∫o(z)=0

dM(z) (D.3)

M(z′) = δG(z′) + (D.4)

(1− δ)∫ ∞

0

∫ ∞0

F (log(z′)− ρ log(z)− β log(max{1, z/z})− c)dM(z)dM(z)

M(z;M) =

( ∫ z0φ(z,M)dM(z)∫∞

0φ(z,M)dM(z)

) 11−θ

. (D.5)

39There is nothing conceptually difficult about including these suppliers in the measure of welfare. Our goal is only toremain faithful to the economic environment from which the empirics are derived. A further benefit of this assumption isthat it focuses attention on the role of the diffusion externality, as opposed to other types of inefficiencies that arise fromthe bargaining protocol.

56

While complicated looking, this problem has a straightforward interpretation. The

planner’s objective is to maximize the expected value of v. The first two constraints

are the aggregate resource constraints – (D.2) determines the resources that can be al-

located to consumption, while (D.3) equalizes labor supply and demand. The latter

two constraints show that the planner internalizes how her decisions affect the evolution

of the aggregate state (via D.4) and the imitation opportunities that arise from it (via

D.5). These constraints highlight the planner’s ability to overcome the diffusion exter-

nality, in that she takes into account the implications of occupational choice on learning

opportunities in a way that individual agents do not.

We measure the difference in welfare between the allocation chosen by the planner and

the baseline laissez faire equilibrium in consumption-equivalent terms. That is, we ask

by what percentage we would we have to increase each agent’s consumption in every state

and time period to equalize average utility between the two economies. This difference

is our measure of the aggregate importance of diffusion.

While the planner’s problem does not admit a closed-form solution, we can derive

some implications for comparison to the baseline laissez faire equilibrium.

Proposition 7. The planner chooses a cutoff rule for occupations, z, such that all z ≥ z

operate firms. Consumption cp is constant across agents and given by the constant returns

to scale aggregate production function cp = ANη

1−αs Z

1−α−η1−α , where A is a constant and the

two aggregate inputs are

Ns =

∫ z

0

dM(z) , Z =

∫ ∞z

z exp(s(z))α

1−α−η dM(z).

Moreover, if s(z) > 0 (which we verify at our estimated parameters), s(·) is a strictly

increasing and concave function of the form

s(z) =

(1− α− η

α

)W0

[(α

η + α− 1

)exp

(α

η + α− 1

)q(z/z, s(z))

]+ 1 ∀z ≥ z,

where W0(·) is the principal branch of the Lambert W function and q(·, ·) is a function

of relative ability z/z and supplier search at the cut-off value, s(z).

Proof. The proof is at the end of the Appendix, in Appendix Section G. �

One can see the goals of the planner in Proposition 7. Like in the baseline, the planner

uses a cut-off rule to define occupations. As we show below, she chooses fewer firms than

the baseline equilibrium, a function of the diffusion externality in the baseline economy.

Furthermore, she shifts the search for suppliers away from relatively low ability agents.

Instead, she takes advantage of complementarity between ability and supplier search

57

effort. This allows higher ability agents to procure resources that can be redistributed

to all agents. These incentives are naturally absent in the baseline equilibrium, where

individuals consume their income.

In terms of our inability to push further theoretically, the properties of W0 preclude

a closed form solution.40 We derive these results in Appendix G.4, and proceed with

quantitative results in the next section.

D.1 Quantitative Implications

Our interest here is understanding the importance of separately identifying the intensive

and extensive margin parameters to understand the welfare gains in the social planner’s

problem.

We follow a similar (but slightly simpler) approach to the main text. We exogenously

vary β, then re-estimate θ to continually match the same average treatment effect. Thus,

the average treatment effect is still used as an estimating moment, but our first stage

regression (run within the treatment group) is not.

Throughout, we hold the persistence ρ fixed for simplicity. Figure 12 plots the implied

value of θ and the consumption-equivalent welfare gains, the latter of which is normalized

to one at our estimated value.

The welfare implications are in Figure 12b. We focus first on the solid line. This is our

baseline counterfactual in which θ is re-estimated to achieve the same average treatment

effect. The differences here can be substantial. Increasing β from 0.25 to to our estimated

value 0.538, for example, increases the consumption equivalent welfare gain by 59 percent.

Perhaps more surprisingly, overestimating the intensive margin forces can similarly bias

downward welfare gains. Assuming β = 0.90 instead of 0.538 lowers the implied welfare

gains by 40 percent. We detail to the economic forces governing these trade-offs in the

next section. For now, however, the results show that understanding effecient welfare

gains depends critically on separately identifying the parameters governing the extensive

(here, θ) and intensive (here, β) margins, as our procedure does.

D.1.1 Understanding Forces Behind This Pattern

To better understand the forces behind the pattern above, we also re-estimate the welfare

gains while holding θ fixed at its baseline value. This is the dashed line in Figure 12b. The

limited difference between the two lines shows that the welfare gains are primarily driven

40The Lambert W function is the inverse of F : x 7→ x exp(x), and we can restrict attention to the princi-pal branch. The main issue for our purposes is that the only analytical characterization of W0 is the power seriesW0(a) =

∑∞n=1((−n)n−1/n!)an, which is of little help in attempting to attain a closed form for the planner’s problem

here. Put slightly more technically, the supplier search function s is the solution to a delay differential equation in z, butthis solution prevents an analytic characterization of the initial condition s(z).

58

Figure 12: Importance of Separately Identifying Two Diffusion Effects

(a) Implied θ

0 0.2 0.4 0.6 0.8 1-10

-8

-6

-4

-2

0

(b) Consumption-Equivalent Welfare

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

Figure notes: The left panel shows the implied θ required to estimate the same treatment effect observed in the data atthe exogenously given β. The dashed vertical line the minimum value of β for which there exists a θ that can rationalizethe observed average treatment effect. This is related to the discussion of Γmin defined in Proposition 2. θ → −∞ as βapproaches this value from above. The right panel shows the implied consumption-equivalent welfare, normalized to oneat our estimated parameters. Our estimated values are denoted by the circle in each graph.

by the direct effect of β, and not the implied differences in θ.41 Thus, understanding the

main results above primarily requires understanding the impact of changing β. We study

that here.

A useful starting point is to re-write total resources in the planner’s allocation in terms

of semi-elasticities. Recalling the aggregate production function defined in Proposition

7, we can then define the semi-elasticity of consumption with respect to the occupational

cut-off, εcp := (∂cp/∂z)/cp, as the sum of the aggregate input elasticities,

εcp =

(η

1− α

)εNs +

(1− α− η

1− α

)εZ . (D.6)

Equation (D.6) tells us the consumption change induced when the planner slightly shifts

the occupational cut-off. There are two components to the consumption gains. The

first is static – holding the ability distribution fixed, increasing the cut-off mechanically

increases labor supply. This positively affects εNs and negatively affects εZ . But key in

this model is that shifting z also affects the ability distribution M through diffusion. This

is the dynamic effect on production, as the mass of the population in each occupation

changes as M changes.42 Figure 13 plots the semi-elasticity εcp , along with those of the

41The difference between the two is driven by the fact that the welfare gains are declining in θ.42It is straightforward to show that each of these semi-elasticities can be decomposed into a static and dynamic term,

as εj = εSj + εDj for j = Ns, Z. The static reallocation across occupations plays almost no quantitative role here. Thus,when interpreting the results here, one should think of them as driven by the dynamic diffusion effects.

59

two aggregate inputs defined in (D.6). They are evaluated at the baseline equilibrium

cut-off.

Figure 13: The Semi-Elasticity of Consumption for the Social Planner

(a) Semi-Elasticity of Consumption, cp

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4(b) Components

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

Figure notes: These semi-elasticities are evaluated at the baseline equilibrium cut-off z∗ and equilibrium distributionM∗(z; z∗).

The first thing to note about Figure 13a is that the elasticity is positive for any

value of β. Thus, no matter the parameters, the planner can increase consumption by

transition some baseline entrepreneurs to wage work. This is entirely a function of the

diffusion externality. Moreover, as expected, it follows a similar shape to the overall

welfare gains. Finally, Figure 13b shows that this is in large part driven by εZ . That is,

the consumption response is primarily driven by the changes to ability in the economy.

Why, then, does εZ take this shape? This is at the heart of understanding how

the overall welfare gains vary with the critical parameter β. Figure 14a provides a me-

chanical rationale: the numerator of εZ is approximately linear while the denominator

is substantially more convex. But both of these curves have natural economic interpre-

tations. The numerator, ∂Z/∂z, measures the change in economy-wide ability from a

marginal increase in the cut-off. It is monotonically increasing – the higher β, the larger

the potential impact on ability. This means that all else equal, a higher β induces a

larger increase in consumption for the planner.

But the welfare gains depend on how that composite ability compares to the baseline

equilibrium. This baseline ability is given by Z in Figure 14a, and also depends on β.

In particular, higher β increases the gains from a good match. This increases ability and

thus drives up the return to labor, which increases the wage. The higher wage induces

60

more wage workers, as Proposition 3 shows that the baseline cut-off has the feature

z ∝ w1−α

1−η−α (Figure 14b shows it graphically). These same steps are then compounded

in equilibrium. With fewer, more able firms, diffusion accelerates even faster as agents

further increase ability. The stationary equilibrium wage then takes the convex share of

Figure 14a.

Together, these results highlight the two competing forces in the model, both of which

can be seen in Figure 14. At low levels of β, the matching technology limits the aggregate

welfare gains. At high levels of β, we see similar welfare gains, but for a different reason.

Here, standard equilibrium forces already accomplish most of what the planner would

want to accomplish. While the baseline economy is clearly richer, the returns to additional

intervention by the planner at high β are low. We summarize these forces in Figure 14c,

which plots average consumption across the baseline and efficient allocations. Consistent

with these results, average baseline consumption grows more slowly than the efficient

consumption at low β, then more quickly at high β.

Thus, these two competing economic forces – the race between technology and equi-

librium prices in generating diffusion – are critical in generating the non-monotonicity

observed in the headline results. Moreover, they are governed are the diffusion parameters

we estimate. Thus, estimating these parameters plays an important role in understanding

the aggregate implications in the economy.

Much like our main results, the planner’s problem shows that separating the extensive

and intensive margin is critical to understanding the gains from policy, whether they be

a change to the matching technology (as in the main text) or the fully efficient planner’s

allocation (here).

61

Figure 14: Equilibrium Forces and the Pattern of εZ

(a) Decomposing εZ

0 0.2 0.4 0.6 0.8 1

β

0

2

4

6

8

10

12

∂Z/∂z (numerator of εZ)Z (denominator of εZ)

(b) Wage w and Cut-off Ability z in the Baseline Equilibrium

0 0.2 0.4 0.6 0.8 1

β

0

0.5

1

1.5

2

2.5

3

3.5

4

w

z

(c) Average Consumption

0 0.2 0.4 0.6 0.8 1

β

0

0.5

1

1.5

2

2.5

3

3.5

4

Baseline

Efficient

Figure notes: In Figures 14b and 14c, our estimated values are denoted by the circle in each graph and normalized to one.

62

E Additional Results: Quantitative Implications of Mismea-

surement

We explore the quantitative consequences of mis-measuring profit here. We make the

following assumption: for all individuals, we observe π = τπ∗, where τ ∼ N(0, στ ) is

classical measurement error (in logs), π is observed profit, and π∗ is true profit. We

assume that τ ∼ N(0, στ ), where στ is known but the individual realizations are not. As

discussed in Appendix C this can be extended to estimate the distribution of τ . We leave

this aside for simplicity here. Define the function

g(π∗, π∗; Γ) = c+ ρ log(π∗) + β log

(max

{1,π∗

π∗

})where Γ := (c, ρ, β) is the parameter set. Our first stage regression is log(π′∗) =

g(π∗, π∗; Γ), but is complicated by the mis-measured profit on the right hand side of

this equation. Here, we re-estimate this regression with mis-measured profit and show

how the results change.

E.1 Estimation Procedure

Some notation is required. For any variable x, define x = log(x), fx(x) as the probability

density function, and φx(t) =∫Reitxfx(x)dx as its characteristic function.

We first estimate the characteristic functions of the observed π and π,

φπ(t) =

(1

n

n∑j=1

eit log(πj)

)φk,π(hπt)

φ˜π(t) =

(1

n

n∑j=1

eit log(πj)

)φk,π(hπt)

The first term in parenthesis is the empirical characteristic function using mis-mesaured

variables (πj, πj). The latter term, φk(ht), is the Fourier transform of a kernel density

estimator with bandwidth h.43

Since φπ∗(t) = φπ(t)/φτ (t) and similarly for π (due to the independence of π and π),

we recover the densities of π and π from an inverse Fourier transform,

fπ∗(π∗) =

1

2π

∫φπ∗(t)e

−itπ∗dt

fπ∗(π∗) =

1

2π

∫φπ∗(t)e

−itπ∗dt

43A well-known issue with this type of estimation is the inaccuracy of the empirical characteristic function in the tails ofthe distribution. A kernel density estimate is one version of what is generally referred to as a dampening factor to improveaccuracy. The fact that φk enters multiplicatively follows because a kernel estimator is also a type of convolution.

63

where π in the ratio 1/(2π) should be understood to be π ≈ 3.14 instead of profit (as to

not introduce any additional notation).

With the true distribution functions, we can estimate our regression in any number

of ways. We use the minimum distance estimator proposed by Hsiao (1989). That is, we

choose parameters Γ := (c, ρ, β) to solve

minΓ

n∑i=1

(π′i −G(πi, πi; Γ))2

(E.1)

where

G(π, π; Γ) =

∫ ∫g(π∗, π∗)fπ∗|π(π∗|π,Γ)fπ∗|π(π∗|π,Γ)dπ∗dπ∗

The minimum distance estimator in (E.1) is also extended to unknown error distri-

butions by Li (2002) using the repeated-measurement framework discussed in Appendix

C.

E.2 Updated Calibration

After estimating the diffusion parameters, we update the calibration to take these values

into account. The updated parameters are listed in Table 8, along with the baseline for

comparison. We do so in two scenarios: στ = 0.30 and στ = 1.

Table 8: Updated Parameter Values

Model Description Parameter Parameter Parameter

Parameter (Baseline) (στ = 0.3) (στ = 1)

Exogenously varied:

στ St. dev. of distortions 0 0.3 1.0

Group 1

β Intensity of diffusion 0.538 0.629 0.883

ρ Persistence of ability 0.595 0.371 0.494

θ Match technology “quality” -0.417 -0.326 -0.179

Group 2

δ Death rate of firms 0.016 0.016 0.016

σ0 St. dev. of new entrant ability distribution 0.961 0.961 0.961

ν Firm bargaining weight 0.5 0.5 0.5

Group 3

σ St. dev. of exogenous ability shock distribution 0.75 0.74 0.66

c Growth factor in ability evolution -1.92 -2.23 -2.41

ω Consumption utility weight 0.53 0.54 0.62

α Ability elasticity in supplier search 0.36 0.36 0.36

η Ability elasticity in supplier search 0.05 0.05 0.05

Table notes: Group 1 is jointly chosen from the experimental data. Parameters in Group 2 are calibrated tojointly match moments. Group 3 are also set to match baseline data moments, but match 1-1 with targetmoments. Both are set to match the same set of moments discussed in the main text (see Table 6 for details).

64

E.3 Quantitative Exercise

We now study the gains from the same exercise as in the text. The main results are in

Table 9. The first column fixes the wage at its baseline level, isolating the impact of the

changing ability distribution. The second column allows the wage to adjust, adding in

the additional general equilibrium effect on prices.


στ = 0.3 στ = 1

(1) (2) (3) (4)

Fixed Wage At-Scale Fixed Wage At-Scale

Income 1.08 1.12 1.20 1.38

Ability 1.08 1.14 1.20 1.42

Aggregate Labor Supply 0.92 0.99 0.89 1.00

Wage 1.00 1.14 1.00 1.39

Table notes: All are measured relative to the baseline equilibrium at the give value of στ . Each columnreports the new stationary equilibrium after shocking the matching technology, where the first (columns1 and 3) holds the wage fixed at its baseline level and the second allows it to adjust.

Overall, by biasing our parameter β toward zero, our results are a lower bound on the

gains at-scale. Figure 15 shows that the same results on the many-to-one relationship

between the ATE and at-scale gains holds.

Figure 15: Range of Aggregate Gains for each ATE

(a) στ = 0.3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

(b) στ = 1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure notes: Shaded area is all possible aggregate gains for (β, ρ) ∈ [0.20, 0.95]× [0.20, 0.95], where θ is chosen to matchthe ATE on the horizontal axis. The baseline estimate is given by the dashed line.

65

F Additional Results: Using the RCT of Cai and Szeidl (2018)

F.1 RCT Details

In an innovative recent paper, Cai and Szeidl (2018) (CS hereafter) conduct an RCT

among 2,820 Chinese firms. Treated firms (1,500 of 2,820) are randomly placed into

a group of approximately 10 other firms, then compared to a control group with no

prearranged meetings.

While similar in style to the RCT used in the main body of the paper, there are a

number of differences. First, these are group meetings instead of individual meetings.

Second, the meetings are substantially more intense: firms are expected to meet monthly

for a year.44 Third, these firms are larger. At baseline the average firm has 35 workers.

Finally, they cross-randomize information about new financial products to track the

diffusion of information directly.

Firms are surveyed 3 times: before the treatment, 1 year post-treatment (i.e., at the

end of the treatment period), and 2 years post-treatment. We refer to these as t = 0, 1, 2.

We estimate our procedure off the t = 1 data and later ask whether we can match the

effect at t = 2.

F.2 Overview of Empirical Results

While we will not do justice to the full set of results provided in CS, we provide a brief

overview here to relate them back to our baseline RCT in the main text and provide

context for modeling decisions. Like our baseline results, CS find important effects on

firm performance. In a regression of the form

yit = λ0 +

≡ time effects︷︸︸︷λ1 · 1t=1 + λ2 · 1t=2 +

≡ per-period ATE︷︸︸︷λ3 · (1t=1 × Tit) + λ4 · (1t=2 × Tit) +FirmFEi + εit

they find positive and statistically significant effects at t = 1 (i.e., λ3 > 0) for sales, profit,

employment, and productivity. These meetings do indeed seem to leave firms better off.

Unlike our baseline RCT, however, these effects persist at t = 2, a full year after the

treatment concludes (i.e. λ4 > 0).45

A number of additional results provide context for these firm performance effects.

First, they find persistent changes in management practices.46 Thus, direct components

of firm-level productivity increase. Second, they cross-randomize information on a new44Take-up is still high, with average attendance of 87 percent.45Productivity here is measured as firm-level TFP from estimating a revenue production function among control firms.

This is the only outcome of those listed in which the point estimate at t = 2 is statistically insignificant, but it is stillpositive. We read less into this result, as it is likely the most noisily estimated.

46These include an overall management index, along with components on the evaluation and communication of employeeperformance, the setting of targets, process documentation, and delegation of power.

66

financial product to test whether information is indeed flowing between firms within the

group, and find that it does. We (and CS) take this as a direct measure that information

is flowing between firms in the group.

Finally, CS provide one measure of group-level heterogeneity, asking whether treated

firms who have larger average group members enjoy a larger treatment effect. Denoting

nmi as the average firm size of firm i’s matches, they run

yit = λ0 + λ1 log(nmi ) + εit for i in the treatment group. (F.1)

They find statistically significant increases in sales and profit. CS use (F.1) as an “internal

consistency check,” in the sense that most reasonable theories would predict λ1 > 0. As

is hopefully clear at this point, this type of regression has an additional role: it is a

critical test for extrapolating these results to at-scale implications. We will exploit this

regression below in our estimation.

F.3 Model

The empirical setting and results motivate our model structure. Because these are larger

firms, we we study this RCT in the context of a more classic Hopenhayn (1992) style

model. Given the results on productivity and management practices, we allow diffusion

of firm productivity directly, instead of the ability to seek out suppliers. This assumption

is more in line with the existing growth literature (Lucas, 2009; Perla and Tonetti, 2014,

and many others). Finally, we construct our learning process to conform to the available

empirical results, which we discuss more below.

Basics: Production and Households The model period is one year (the length of the

exogenous matches in the RCT). There is a measure one of firms, each of which produces

according to the production function yt = z1−α−ηt nαt k

ηt , where zt is firm-level productivity,

nt is labor, and kt is capital. Capital and labor are traded on a competitive spot markets

with prices rt and wt. Each period δ firms exit and are replaced by δ new firms, who

draw initial productivity from z ∼ G(z).47 As in the main text, firms draw idiosyncratic

shocks ε ∼ F and imitation shocks z ∼ M .

A representative household with flow utility u(Ct) provides labor and capital, and

47In the more classic Hopenhayn (1992) or Melitz (2003) sense, the model can be easily extended to include endogenousfirm entry/exit by introducing fixed costs, as opposed to our assumed exogenous entry/exit margin. Adding this additionalmargin does not change the results we focus on here and we therefore exclude it for simplicity. This assumption also doesnot affect identification. Our procedure relies only on firms in operation at a given point in time, so that the details ofpast entry are immaterial.

67

owns all firms. Its utility is given by:

max{Ct,Kt+1}≥0

∞∑t=0

(1− δ)tu(Ct)

s.t. Ct +Kt+1 − (1− λ)Kt = wt + rtKt + Πt

K0 given

where Πt is the aggregate profits from all operating firms and λ is depreciation rate of

capital. We assume the household discounts at the firm exit rate for simplicity.

The aggregate state of the economy is the distribution of firm-level productivity,

M(z), and the aggregate capital stock, K.

Learning and Diffusion Given the high cost-effectiveness of the program, CS posit a

number of possibilities for why firms did not self-organize these meetings. These include

search costs, trust barriers and lack of familiarity with other managers, and a free-rider

problem in which managers expect others to pay the cost of organization. Motivated

by these “missing” meetings, we set up a source distribution in which a firm receives no

meetings with probability 1− θ. With probability θ, the firm joins a group of exogenous

size N . These N firms are uniformly random draws from the equilibrium productivity

distribution M . We denote z1, . . . , zN to be the productivities of the N matched firms.

We next define a match z in this context, which in general can take any function over

the characteristics of these N firms. Here we are constrained by the results reported in

CS, who report how the treatment effect varies by only the average size of the N firms.

As such, we assume that z =∑N

i=1 zi/N so that we can use their provided moment.

If a firm does not receive a match, it gets z = 0.48 We can therefore write the source

distribution as

M(z) = 1− θ + θQ(z),

where Q(·) is the N -draw sampling distribution of the mean derived from the equilibrium

productivity distribution M .

Finally, we note that within treated firms, there is no differential effect between firms

with z > z or z < z. Therefore, we remove the max operator and assume that the law of

motion takes the form

zt+1 = ec+εtzρt

(1 +

ztzt

)β. (F.2)

Equation (F.2) shows that if the firm does not interact that period (Pr = 1 − θ), it

48This is of course not a critique of the extremely useful publicly-provided dataset of CS. We note this only to highlightthat there is no theoretical benefit to making this assumption, and we do so because it is the only estimating moment ofgroup-level heterogeneity available in their public data.

68

gains nothing from its zt = 0 draw. Conditional on meeting (Pr = θ), however, there

will be gains from doing so. Those gains are increasing in the average productivity of

matched firms.

Taken together, the firm’s problem can be written as (with the aggregate state sup-

pressed, as we will focus on a stationary equilibrium):

v(z) = maxn,k≥0

z1−α−ηnαkη − wn− rk + (1− δ)∫ε

∫z

v(z′(z, ε; z))M(dz,M)dF (ε)

s.t. z′(z, ε; z) = ec+εzρ(

1 +ztzt

)βEquilibrium The stationary equilibrium of this economy is an invariant distribution

M∗(z), household decision rules C, K ′, firm decision rules n, k, and value function v

such that the household’s and firms’ value function solves their respective problems with

the associated decision rules, markets clear

1. labor market:∫zn(z)dM∗(z) = 1

2. capital market:∫zk(z)dM∗(z) = K

3. consumption market:∫zz1−α−ηnαkηdM∗(z) = C + λK

and the relevant aggregates are consistent

1. profit:∫zπ(z)dM∗(z) = Π

2. law of motion for ability:

M ′ := Λ(M(z′))

= δG(z′) + (1− δ)∫ ∞

0

∫ ∞0

F (log(z′)− ρ log(z) + β log(1 + z/z)− c)dM(z)dM(z)

with M∗(z) = Λ(M∗(z)).

F.4 Estimating Diffusion Parameters

We begin by using our two-step procedure to estimate the key diffusion parameters

(β, ρ, θ).

The production function here generates the usual Cobb-Douglas result that inputs

labor and capital, sales, and profit are all proportional to z. Therefore, we are free to

use any of these moments for identification (see also Appendix Section C.2 for a further

generalization).

69

Averages are also proportional to z. If a firm matches with N firms of size n1, . . . , nN ,

then we can infer the average productivity as

n :=

∑Nj=1 nj

N=(αw

) 1−η1−α−η

(ηr

) η1−α−η

∑Nj=1 zj

N∝ z. (F.3)

Given these results, we focus on firm size here for our estimation. As discussed above,

it is the only moment available in the public CS data for the first step of our estimation

procedure. Rather than introduce a second dependent variable, we use firm size in the

second step as well.

Following our procedure in the text, we will estimate the diffusion parameters using

only t = 0, 1 data then later check whether the time series matches t = 2 outcomes.

Thus, the first step of our procedure is to estimate

log(n′i) = c+ ρ log(ni) + β log

(1 +

nini

)+ ε for all i in treatment (F.4)

where ni is the average size of matched firms as defined in (F.3). Given (ρ, β) we then

estimate θ with the same procedure as the main text.

minθ

abs

(E[n′T ]

E[n′C ]−∫ ∫

πρ (1 + n/n)β dHT (n)dHT (n)∫ ∫nρ (1 + n/n)β dM(n; θ)dHC(n)

)(F.5)

where HC , HT are the empirical baseline distributions of control and treatment firms, M

is the source distribution for control firms, and HT is the source distribution for treated

firms (given exogenously by the empirical implementation). We measure the empirical

ratio in (F.5) as the average treatment effect

log(n′i) = λ0 + λ1Ti + Controlsi + νi (F.6)

where Ti = 1 if i is in the treatment group. Table 10 provides the regression estimates

for (F.4) and (F.6).

Column (1) show that the average firm size is indeed a good predictor of treatment

effect magnitude. The ATE in column (2) then implies θ = 0.199. That is, the model

infers that it is quite unlikely that such a CS-style group would be created without the

intervention.

F.5 Remaining Calibration

We calibrate the remaining model to target parameters similar to the main text, using

CS data and other relevant moments. We choose 5 parameters that match directly to

70

Table 10: Identification Moments

(1) (2)

β 0.276(0.053)***

ρ 0.955(0.028)***

Treatment 0.076(0.044)*

R2 0.764 0.395Baseline Control Avg – 2.694

Table notes: Standard errors are in parentheses. Statistical significance at 0.10, 0.05, and 0.01 is denoted by *, **, and, ***.

their moments. These include a 9 percent firm death rate (δ = 0.09) to match exit rates

in China (e.g. Lu, 2021). The standard deviation of new entrant ability matches the

standard deviation of log profit for firms that have been open for less than 1 year, which

implies σ0 = 1.23. The depreciation rate is set to a standard value of λ = 0.06. Finally,

we have the Cobb-Douglas exponents on capital η and labor α. We set η = 0.20 to match

the median firm’s baseline capital-output ratio (rk)/y = 0.20 then set α to match the

median firm’s profit-sales ratio, π/y = 0.12. This implies α = 0.68. Finally, we set the

standard deviation of the exogenous ability shock to σ = 0.45 and the productivity drift

c = −1.11. These two parameters jointly match the standard deviation of log profit in

the economy and the ratio of average profit of all firms relative to those with less than 1

year of operation. Parameters and moments are in Table 11.

71

Table 11: Targets and Parameter Choices

Model Parameter Description Parameter Target Moment Source Target Model

Value Value Value

Group 1 From RCT

β Intensity of diffusion 0.276 Estimated parameter from regression RCT results 0.276 0.276

ρ Persistence of ability 0.955 Estimated parameter from regression RCT results 0.955 0.955

θ Match technology “quality” 0.199 Treatment effect at t = 1 RCT results 0.076 0.076

Group 2 Matched one-to-one with parameter

δ Death rate of firms 0.09 Average exit rate in China Literature (Lu, 2021) 34 34

σ0 St. dev. of new entrant ability distribution 1.23 Variance of log profit among new entrants CS Baseline 1.23 1.23

α Cobb-Douglas share, n 0.68 Median firm π/y CS Baseline 0.12 0.12

η Cobb-Douglas share, k 0.20 Median firm (rk)/y CS Baseline 0.20 0.20

λ Depreciation rate 0.06 Literature – – –

Group 3 Jointly targeted

σ St. dev. of exogenous ability shock distribution 0.45 Standard deviation of log profit in all firms CS Baseline 1.34 1.34

c Growth factor in ability evolution -1.11 Ratio of average profit of all firms to new entrants CS Baseline 1.12 1.12

Table notes: Group 1 is jointly chosen from the experimental data. Group 2 are also set to match baseline data moments, but match 1-1 with target moments. Parameters inGroup 3 are calibrated to jointly match moments.

72

F.6 Treatment Effect Persistence

In Section 5.3, we showed that the treatment effect persistence is increasing in ρ and de-

creasing in β. In our baseline Kenyan RCT we estimate (β, ρ) = (0.538, 0.595). Here, we

estimate (0.276, 0.955). Our estimates of both β and ρ predict more persistent treatment

effects than our baseline model.

We compare this to the empirics in Figure 16. We plot 3 time paths: the empirical

treatment effect (which is available for 2 years post-treatment), the estimated model

effect, and the estimated model effect when we assume our Kenyan RCT values for β

and ρ. We extend the latter two series for 5 years to trace the dynamics over a longer

horizon.

Figure 16: Dynamics of Average Treatment Effect (Firm Size)

0 1 2 3 4 5-0.1

0

0.1

0.2

0.3DataModelKenyan RCT params

Figure notes: Dashed line is the data with 95 percent confidence interval, for which data is available at t = 0, 1, 2. Solidline is the estimated model (β, ρ, θ) = (0.276, 0.955, 0.199). For comparison, we include the results at our Kenyan RCTvalues of β and ρ, re-estimating θ to hit the t = 1 ATE, which implies (β, ρ, θ) = (0.538, 0.595, 0.496). We extend themodel-derived RCT for 5 post-treatment years to study fade-out.

The model is consistent with the persistent gains.49 Even 5 years post-treatment, the

model predicts that 89 percent of the initial benefits remain. In comparison, if we replace

β and ρ by our estimated values in Kenya, the fade-out is more substantial and nearly

complete by t = 4. The results highlight the importance of estimating these parameters

in governing the time series of the treatment effect.

F.7 Quantitative Gains at Scale

We conduct the same exercises as the main text. To measure the aggregate implications,

we permanently shock the matching technology to increase average match quality, in

line with the extensive margin focus of the RCT results. We do so by shocking the

49The slight increase observed in the treatment effect from t = 1 to t = 2 is statistically insignificant by any reasonablecutoff.

73

parameter θ so that it is 25 percent closer to its limit of θ = 1. We study the new

stationary equilibrium and compare it to the baseline equilibrium. Aggregate moments

are reported in Table 12. We present two steady states, both of which operate under the

new matching function. The first (in column 2) fixes the wage at its baseline level. The

second (in column 3) allows the wage to adjust as well.


(1) (2) (3)

Baseline Fixed Wage At-Scale

Income 1.00 1.13 1.05

Ability 1.00 1.39 1.39


Wage 1.00 1.00 1.05

Table notes: Column (1) is the initial equilibrium, normalized to one. (2) and (3)report the new stationary equilibrium after shocking the matching technology,where (2) holds the wage fixed at its baseline level and (3) allows it to adjust.

The direct effect of making it easier to learn from high ability agents is that average

ability rises by 39 percent and income by 13 percent. Some of that is eliminated by

the higher general equilibrium wage, with the net effect of a 5 percent increase in total

household income.

We next ask the importance of measuring treatment effect heterogeneity, as in the

main text. We vary (β, ρ) ∈ [0.20, 0.95] × [0.20, 0.95]. For each, we continually update

θ to match a given average treatment effect. We vary the ATE from 0 to 18 percent,

which traces out the range of possible aggregate outcomes by ATE.50 Those results are

in Figure 17. Similar results emerge as in the main text – the set of possible aggregate

outcomes for a given treatment effect can be large.

50The set of feasible ATEs for the range of (β, ρ) we consider is smaller here than in the main text, as our diffusionprocess requires θ ∈ [0, 1]. See the discussion surrounding Proposition 2 for more details on this constraint.

74

Figure 17: Range of Aggregate Gains for each ATE (Firm Size)

0 0.03 0.06 0.09 0.12 0.15 0.180

0.03

0.06

0.09

0.12

0.15

0.18

Figure notes: Shaded area is all possible aggregate gains for (β, ρ) ∈ [0.20, 0.95]× [0.20, 0.95], where θ is chosen to matchthe ATE on the horizontal axis. The baseline estimate at (β, ρ) = (0.276, 0.955) is given by the dashed line.

75

G Proofs

G.1 Proof of Proposition 1

Proof. Start from the law of motion defined in Assumption 1. In logs, and applying

Assumption 2 (π ∝ z) annd Assumption 4 (z > z), this is


(max

{1,πiπi

})+ εi, (G.1)

where πi and πi are baseline profit for individual i in the treatment group and her match,

while c is a constant equal to the structural parameter c if π = z. Since matches are

observable within the treatment, (G.1) is a simple panel regression with coefficients β and

ρ. Since matches are randomized, the parameters are identifiable. Thus, that βOLS and

ρOLS are equal to their structural counterparts follows directly from the within-treatment

exclusion restriction of Assumption 4. �


Proof. Define zT and zC as average ex post ability the treatment and control groups.

The law of motion for ability (Assumption 1), combined with the implied variation in

matches (Assumption 4), implies

zTzC

=

∫ ∫ ∫ec+ε max

[z, zβz1−β]ρ dF (ε)dHT (z, z)dHT (z)∫ ∫ ∫

ec+ε max [z, zβz1−β]ρ dF (ε)dM(z; z, θ)dHC(z)

Using the orthogonality of the exogenous shocks ε (Assumption 4), we can re-write the

equation as

zTzC

=

∫ ∫zρ max [1, z/z]β dHT (z)dHT (z)∫ ∫

zρ max [1, z/z]β dM(z; z, θ)dHC(z).

or, writing it in terms of profit via Assumption 2,

Γ ≡ E[π′T ]

E[π′C ]=

∫ ∫πρ max [1, π/π]β dHT (π)dHT (π)∫ ∫

πρ max [1, π/π]β dM(π; π, θ)dHC(π). (G.2)

First, note that the lefthand side of (G.2) is simply the change in ex post average

profit between treatment and control. This is observable in the data. Second, the only

only unknown on the right hand side of (G.2) is θ. This follows from Assumption 2 and

Proposition 1. All that is left is to show that there exists a unique θ that solves (G.2).

The right hand side is continuous in θ by the continuity of M in Assumption 3. The

intermediate value theorem then guarantees existence when Γ ∈ [Γmin,Γmax]. Finally,

76

uniqueness follows from the strict monotonicity of the right hand side in θ, which is

guaranteed by the assumed first order stochastic dominance in Assumption 3. �


We begin by detailing the underlying arithmetic of the model, then use those results to

prove Proposition 3 at the end of this section.

Solving the Bargaining Problem as a Function of Supplier Marginal Cost m Solving the

optimal input choices for the firm implies profit is

πf (c, w) =

(α

px

) α1−η−α ( η

w

) η1−η−α

(1− α− η). (G.3)

A supplier has profit function πs = (px − m)x where m is its given marginal cost.

Given that they take as given the firm’s decision on inputs, this implies we can write this

profit as

πs(px,m) =

(α

px

) 1−η1−η−α ( η

w

) η1−η−α

(px −m).

Re-writing the bargaining problem π(px)νπs(px,m)1−ν taking these derivations into ac-

count yields

maxpx

( ηw

) η1−η−α

(1− α− η)να1−η−ν(1−η−α)

1−η−α cν(1−η−α)+η−1

1−η−α (px −m)1−ν

Log differentiating gives the solution

px =

(1− η − ν(1− η − α)

α

)m (G.4)

Replacing px in the firm’s profit function (G.3) with the value from (G.4) yields

π =

(α

1− η − ν(1− η − α)

) α1−η−α

(1− η − α)αα

1−η−α

( ηw

) η1−η−α

m−α

1−η−α (G.5)

Optimal Search Intensity Now that we have profit as a function of the supplier’s marginal

cost m in (G.5), we need to solve for search intensity s. Recall that m = exp(−s)z α+η−1α .

Plugging this into (G.5),

πf (m) =

(α

1− η − ν(1− η − α)

) α1−η−α

(1− η − α)αα

1−η−α

( ηw

) η1−η−α

exp(−s)−α

1−η−α z

The relevant part of the firm’s utility function for this problem is the static utility flow

77

ω log(π) + (1− ω) log(1− s). Plugging in π and solving for s yields

s = 1−(

1− ωω

)(1− η − α

α

)(G.6)

Occupational Choice Given the decision rules derived above, the model therefore has a

cutoff rule for occupational choice. To see this, note that because the continuation values

between workers and entrepreneurs are identical, we can focus on the flow utility payoff.

After a bit of algebra, these are

uf (w) = ω log(C1) + (1− ω) log(1− s)− ηω

1− η − αlog(w) + ω log(z)

with constants

C1 =

(α

1− η − ν(1− η − α)

) α1−η−α

(1− η − α)αα

1−η−αηη

1−η−α exp

((1− ωω

)(1− η − α

α

)− 1

) −α1−η−α

s = 1−(

1− ωω

)(1− η − α

α

)More simply for workers, flow utility is uw(w) = ω log(w). Firm operation is preferred

when uf (w) ≥ uw(w), which implies a cut-off z

z = w1−α

1−η−α exp

(− log(C1)− 1− ω

ωlog

[(1− ωω

)(1− η − α

α

)])

Thus, the cutoff has the feature that log(z) ∝ log(w), where w is the equilibrium wage.

Proof of Proposition 3 With these results, Proposition 3 follows quickly.

Proof. Plugging (G.6) into the profit function yields

π =

(α

1− η − ν(1− η − α)

) α1−η−α

(1− η − α)αα

1−η−α

( ηw

) η1−η−α

× exp

((1− ωω

)(1− η − α

α

)− 1

) −α1−η−α

z.

Thus, we have π = A(w)z in equilibrium, where A depends on parameters and the equi-

librium wage w. Replacing m = exp(−s)z α+η−1α in the equilibrium input price function

(G.4) gives

px =

(1− η − ν(1− η − α)

α

)exp

((1− ωω

)(1− η − α

α

)− 1

)zα+η−1α

as required. �

78

G.4 Proof of Proposition 7 (from Social Planner’s Problem)

Proof. Since these are static decisions, they solve the simplified static problem

maxc,s,x,n

ω

∫ ∞0

log(c(z))dM(z) + (1− ω)

∫ ∞z

log(1− s(z))dM(z)

s.t.

∫ ∞z

x(z)αn(z)ηdM(z)−(

1− η − ν(1− η − α)

α

)∫ ∞z

exp(−s(z))zα+η−1α x(z)dM(z) =

∫ ∞0

c(z)dM(z)∫ ∞z

n(z)dM(z) =

∫ z

0

dM(z) ≡ Ns

M(z) given

Note that for simplicity here, we have already imposed a cut-off value for z, z. Much

like the laissez faire equilibrium, it is straightforward to show that the planner also

chooses to set occupations this way.

The first piece to note is that, given the separability of utility in c and s, the planner

allocates a constant level of consumption c(z) = c. Thus, we just need to determine total

resources in the economy to determine consumption. Solving the optimal input choices

x(z) and n(z) collapses the simplified static planner’s problem to

maxs(·)≥0

ω log(c) + (1− ω)

∫ ∞z

log(1− s(z))dM(z)

s.t.

(α2

1− η − ν(1− η − α)

) α1−α

(1− α)Nη

1−αs

(∫ ∞z

z exp(s(z))α

1−α−η dM(z)

) 1−α−η1−α

= c

M(z) given

The first order condition for this problem is

(1− ω)m(z)

1− s(z)= λC

(∫ ∞z

z exp(s(z))α

1−α−ηm(z)dz

) −η1−α((

α

1− η − α

)z exp(s(z))

α1−α−ηm(z)

)−λ2

where λ is the Lagrange multiplier on the resource constraint, λ2 is the multiplier on

the non-negativity constraint s ≥ 0, and C is the constant in front of the integral of the

resource constraint. For any z1 and z2 with interior solutions (λ2 = 0), we have

1− s(z2)

1− s(z1)=

(z1

z2

)(exp(s(z1))

exp(s(z2))

) α1−α−η

(G.7)

Define for ease of notation q(z2/z1, s(z1)) =(z1z2

)exp(s(z1))

α1−α−η (1 − s(z1)) and the

transformation −t =(

αα+η−1

)s(z2)−

(α

α+η−1

). After some algebra, we can rewrite (G.7)

79

as (qα

η + α− 1

)exp

(α

η + α− 1

)= t exp(t)

The solution to this problem is given by the principal branch of the Lambert W function,

t = W0

[(α

η + α− 1

)exp

(α

η + α− 1

)q

].

Undoing the transformation and setting z1 = z, if the economy is at an interior solution

to s for all z ≥ z (which we verify at our estimated parameters), we can write this as

s(z2) =

(1− α− η

α

)W0

[(α

η + α− 1

)exp

(α

η + α− 1

)q(z2/z, s(z))

]+ 1,

Since q is decreasing in z2 and η + α < 1, that s(·) is increasing follows from the fact

that that W0 is an increasing function. Concavity similarly follows from properties of

W0. Since W0 is increasing and x exp(x) is convex, its inverse W0 is concave.

�

If there is a corner solution, this analysis remains nearly identical, except one would

instead be required to solve for z = argminz λ2(z) = 0 instead of s(z). That is, the

z at which supplier search intensity becomes positive. While this is not at issue at our

estimated parameters, we of course take this into consideration when counterfactually

varying parameters to study how the quantitative implications change.

80

Documents

From Micro to Macro in an Equilibrium Di usion Model