Causal Ambiguity as a Source of Firm Performance Heterogeneitybernus/publications/... · The main ﬁndings of the paper are presented in this section. A few concluding thoughts are

Causal Ambiguity as a Source of Firm Performance

Heterogeneity∗

Michael D. Ryall

University of Rochester

March 24, 2004

Abstract

A formal theory is presented by which to analyze decision making under causal

ambiguity. This theory is applied to firms competing in an industry in which technolo-

gies are homogeneous but driven by an unknown causal structure. It is demonstrated

that, even when firms have homogeneous technologies, performance heterogeneity may

exist in equilibrium as a result of causal ambiguity. Perhaps more importantly, it is

shown that the true causal structure limits the degree to which causal ambiguity may

be manifest. This paper contributes to a small but growing literature on the formal

theoretical foundations of strategy.

∗LA05A

1

1 Introduction

In constructing theories about sustained, cross-sectional performance differences within groups

of competing firms, strategy researchers often invoke “causal ambiguity” as a source of the

kind of confusion that might lead otherwise rational managers to make persistent suboptimal

decisions for their respective firms. While most strategy researchers are familiar with the

notion of causal ambiguity and its presumed influence on firm performance, the extant lit-

erature contains no formal discussion of what it is or how its effects on relative performance

are induced.1 Thus, the present literature does not provide precise answers to questions

such as: What is causal ambiguity? Does causal ambiguity lead to sustained performance

heterogeneity as is so often supposed? If so, by what mechanism does this phenomenon

operate? Are some situations more prone to causal confusion than others?

This paper presents a formal theory of causal ambiguity in the context of corporate

strategy. Using this theory, specific answers are provided to the preceding questions and

more. The theory assumes that the firm operates in an environment governed by an objective,

underlying network of causal laws called a causal system. This causal reality is, however,

unknown to the firm’s managers. The managers, faced with this uncertainty, develop causal

theories and assess the likely validity of each. This assessment forms the basis for the firm’s

subsequent actions.

My analysis begins with the single-agent decision problem under causal ambiguity. In this

context, I demonstrate that: 1) causal ambiguity may lead to inefficient equilibrium behavior;

and, 2) the degree of such ambiguity is limited by the true causal system (i.e., some causal

systems are inherently less transparent than others). Following this, the analysis is extended

to the case of multiple firms competing under causal ambiguity. It is shown that performance

heterogeneity may arise under causal ambiguity even when firms have identical technologies.

1For example, Lippman and Rumelt (1982) provide an early suggestion that causal ambiguity may underly

stable performance differences across firms — but do not model such ambiguity per se. Indeed, the only paper

of which I am aware that explicitly adresses causal ambiguity and its relationship to business strategy is

Reed and DeFillippi (1990). This paper, which is discussed in more detail later, uses informal arguments to

support its conclusions.

2

However, causal ambiguity alone is not sufficient to imply performance heterogeneity. Rather,

the existence of equilibria exhibiting performance heterogeneity depend jointly upon the

heterogeneity of firm beliefs, the relative transparency of the true system and the relationship

between environmental variables and firm costs.

This paper makes several contributions to the small but growing literature on the formal

theoretical foundations of strategy (Lippman and Rumelt, 1982; Brandenburger and Stuart,

1996; Makadok and Barney, 2001; Adner and Zemsky, 2003; Ryall, 2003; MacDonald and

Ryall, 2003 and 2004).2 First, a formal approach is presented for quantifying causal ambigu-

ity. The basic formalism, which draws upon the literature on probabilistic networks, is not

new. However, this paper marks the first attempt to exploit results on probabilistic networks

to answer foundational questions in strategy.3 Second, this marks the first formal demon-

stration that firm performance heterogeneity may, indeed, obtain in the presence of causal

ambiguity — even under homogeneous production technologies. Third, results are presented

that characterize the limits of causal ambiguity given the structure of the true causal system.

Finally, it is worth pointing out that the analysis relies upon a notion of self-confirming equi-

librium invented by Kalai and Lehrer (1993) and first applied to the foundations of strategy

by Ryall (2003).4 All the propositions, with the exception of Proposition 1, are new to this

paper.

2The unifying theme of this literature is its use of formal methods to develop general principles of value

distribution within a competitive setting. In this sense it differs, e.g., from industrial organization economics,

the primary focus of which tends to be on issues of economic efficiency.3Probabilistic networks constitute a very active area of research within the artificial intelligence commu-

nity and, as a result, one that is growing quite rapidly. Since most strategy researchers are not familiar with

it, I have included a condensed, self-contained discussion of the relevant results and references in Appendix

A.4The idea behind the self-confirming equilibrium literature is to loosen the strong rationality assumptions

inherent in the more traditional equilibrium concepts of economics. Foundational papers in this line include

Abreu et al. (1990), Battigalli and Guatoli (1988), Fudenberg and Levine (1993) and Kalai and Lehrer (1993).

Sorenson (2003) conducts an empirical investigation of self-confirming equilibria in the movie industry. The

material contained here is also related to the notion of causal Nash equilibrium developed by Penalva and

Ryall (2003).

3

The following section motivates the later formalism with an actual case taken from the

popular business press. A simple numerical model is constructed to illustrate many of the

key ideas that appear in the following sections. §3 initiates the formal part of the paper

by defining a causal system. §4 analyzes the single-agent decision problem under causal

ambiguity. The notion of an “intervention” is formalized, causal beliefs are described and

causal equilibrium is defined. The main results (Proposition 2 and its corollary) demonstrate

how the relative transparency of the true causal system limits the extent to which causal

ambiguity may obtain in equilibrium. §5 extends the results of the single-agent problem

to an industry in which firms with identical production technologies face the possibility of

causal ambiguity regarding the workings of that technology. The main findings of the paper

are presented in this section. A few concluding thoughts are tendered in §6.

2 Motivating example

The following excerpts are from Moody (1995), in which he chronicles a year spent at Mi-

crosoft shadowing a design and development team working on a multimedia project called

Sendak. Moody is an experienced journalist covering the technology beat. As he explains (p.

xviii), “With the somewhat bemused blessing of Microsoft, I lived with this team, attended

all of its meetings, shared an office with its lead developer, read all of its electronic mail, and

exhaustively discussed with team members their experiences in the context of the broader

Microsoft culture.”

The original purpose of this exercise was to write a book on how Microsoft managed

to maintain its leadership position as a producer of PC software — a sort of best-practices

manual for this industry. However, as Moody describes (p. 137), “I had come to Microsoft

to describe, up close, unprecedented success in action ... The more I watched the process of

creating Sendak try to unfold, the more confused ... I grew. These people were hardly the

crisp, precise, unerring, and ruthless Mircosoftoids of popular legend.”

Of particular relevance to this paper, is that one of the significant sources of confusion

was which group, exactly, played the leadership role in the development of the software: the

4

product designers or the software developers. Says Moody (p. xix), “The Microsoft approach

to corporate organization is to form small teams around specific products and leave them

alone to organize and work as they wish.” However, different elements of the Sendak team

had different agendas (p. 27), “Sendak ’s designers and editors would want to pack the

encyclopedia with features seen nowhere else ... Sendak ’s developers would want a far less

ambitious set of new features and ample time in which to write code for them.” Obviously,

which element held sway would largely determine the functionality of the software, the timing

of its release and, ultimately, its success in the marketplace,” [emph. added].

Moody highlights another important feature of the Sendak story of pertinence to the

theory developed below. It is this: leaving product teams alone was a managerial choice —

one that could be and, at times, was disregarded in favor of direct intervention. For example,

in describing the response of a senior manager to the machinations of the team, Moody says

(p. 217), “His direct interventions in team disputes invariably were in support of Bjerke —

an endorsement, it seemed to me, of the decisions she was making ...” [Bjerke was a lead

designer on the team.]

Moody’s narrative illuminates all of the elements that characterize the phenomena of

interest here. First, the mechanism causing team outcomes was unclear, even to an insider

privy to all of the information and activities of the team. Second, the various team members

had different skills and preferences depending upon their functional area membership. These

attributes, along with individual personalities, interacted in complex ways to drive results.

Third, clear managerial directives were not sufficient to guarantee efficient outcomes, even

when accepted and pursued by individual team members. Finally, senior managers occasion-

ally chose to intervene directly in team activities (even though their understanding of the

potential consequences of such interventions was limited).

Let us now consider a highly stylized model of the situation described by Moody in order

to introduce the analytical issues dealt with more generally in the upcoming formalism. Sup-

pose Microsoft is composed of two departments: Marketing, denoted M, and Engineering,

denoted E. Assume that the software market is divided into two main segments: educa-

tion and business. Success in the more profitable business segment implies success in the

5

education segment, but not conversely.

Software development requires the coordination of the title’s content (responsibility of

E) with a “look-and-feel” (responsibility of M) suitable to one segment or the other. Thus,

E produces content that, ultimately, appeals either only to the education segment or to

businesses (and, by implication, education as well). M ’s efforts on creating an appropriate

look-and-feel have similar implications. Let (b, b) denote a successful business/education

title, with the first component indicating business content from E and the second a business-

appropriate look and feel from M . The other potential outcomes are (b, c) , (c, b) and (c, c)

with the obvious interpretations. An outcome of (b, b) results in a $1 profit, (c, c) a breakeven

payoff and either (b, c) or (c, b) a loss of $2.

Suppose the firm has long operated under a decentralized organizational regime; that is,

an organization in which objectives are communicated to the department heads who are then

left to achieve them as they see fit. The historical record of outcomes, for a large number of

titles, is

Historical Data

Dept. Outcomes Observed Frequency

(c, c) 0.2

(c, b) 0.0

(b, c) 0.0

(b, b) 0.8

Historically, M and E have done a good job of producing profitable titles: left to their own

devices, titles earn $1 eight percent of the time and never lose money.

Suppose that in firms of this kind one department emerges as the de facto organiza-

tional leader in the development of software. The process by which one department or the

other attains this leadership role is complex and depends upon a multitude of firm-specific

attributes: organizational culture, explicit incentive systems, the personalities, preferences,

skills and past performance of the functional area managers, and so on. Moreover, while

formal reporting relationships and managerial directives may shape the emergence of the

leader, they may not be sufficient to do so. That is, the true process driving outcomes

6

may be the result of informal dynamics that do not conform to managerial edict. Hence, at

any point, upper management may face uncertainty with respect to which department truly

assumes this role.

Notice that, from the historical record, the hypothesis that the two departments act

independently can be ruled out immediately. Rather, the preceding data are consistent with

both of the following hypotheses: H1 : E leads in developing content, which is business-

suitable 80% of the time, and M provides a look-and-feel to match; and, H2 : M pushes

a particular look-and-feel, which is business-suitable 80% of the time, and E coordinates

the content. Thus, while it is clear from the data that one department has attained the

leadership role, it remains ambiguous which department that is.

Now, suppose management has two broad policy options: 1) continue with the decen-

tralized strategy and earn the expected profit of $0.80; or, 2) implement an intervention

strategy to improve success by becoming personally involved in functional area activities.

To make the example interesting, assume management can only intervene in the activities

of one department (due, e.g., to constrainted managerial resources).

Can performance be improved with an intervention? To answer, assume that manage-

ment’s prior beliefs over hypotheses are uninformative; that is, management assesses the

relative likelihood of the two hypotheses to be 50/50. Under these priors, the expected

profits associated with the available interventions are:

Profit Subjective

Intervention H1 H2 Expected Profit

None 0.8 0.8 .8

Set E to b 1.0 0.4 .7

Set M to b 0.4 1.0 .7

The first two columns detail the expected profits under each of the respective hypotheses.

The last is the subjective expected profit given management’s priors on each of the two

hypotheses.

In this situation, decentralization is the subjectively optimal decision. Suppose, however,

7

that H1 describes the true state of affairs. Then, forcing E to produce business titles 100%

of the time increases expected profits from .8 to 1. If H2 is true, management should focus

on M . The subjectively rational decision is not objectively optimal under either possible

state of the world.

There are a few things to note about this example. First, the decision to retain the

decentralized organization ensures that no data will ever be generated that identifies which

hypothesis is actually correct. Thus, the no-intervention policy is stable. Second, this

decision is based upon accurate data and correct Bayesian updating of priors over causal

hypotheses. Management does not suffer from any of the irrational decision biases often

written about in the behavioral decision literature (e.g., Tversky and Kahneman, 1981).

The decision not to intervene, while not objectively optimal, is subjectively optimal given

managerial priors. Third, the conclusion drawn above is not stable under all priors. For

example, if management places strong weight on H1 and, as a result, intervenes in the affairs

of E then he will eventually discover the truth (be it H1 or H2) and, from that point on,

implement the objectively optimal strategy.

Finally, intuition suggests that, in a market with several software manufacturers, confu-

sion of the type described here could account for stable, cross-sectional differences in firm

performance. For example, under heterogeneous priors, some firms might pursue a stable,

no intervention policy and others the optimal intervention.

3 Causal systems

In order to analyze situations in which the consequences of agents’ actions depend upon an

underlying, unknown system of causal laws, a formal representation of a “causal system” is

required. The conventions used throughout the paper are: variables and elements of sets are

represented with small letters, sets with capitals and sets of sets with calligraphics. Directed

graphs, denoted with bold capitals, play a significant role in what follows; standard graph-

theoretic terminology (parents, descendants, paths, etc.) is used freely without explicit

definition (see, e.g., Bollobas, 1998).

8

3.1 Causal primitives

Begin with a relevant, indexed set of environmental variables, V ≡ {x1, ..., xk} . Each variable

xi takes values from a finite setXi. In the previous example, V = {xE, xM} withXE = XM =

{c, b} . The set of outcomes is defined as X ≡ ×ki=1Xi with typical element x̂ ≡ (x̂1, ..., x̂k) ,

where x̂i indicates a specific value taken by xi. The agents’ payoffs depend upon outcomes.

Let π (x̂) denote the payoff to the agent when the outcome is x̂.

In the previous example, the set of outcomes isX = {(c, c) , (c, b) , (b, c) , (b, b)} . A specific

outcome is x̂ = (b, b) which has the associated payoff π (b, b) = 1.

Assume that outcomes are determined by an underlying system of causal relationships.

How can these relationships be represented formally? In the example, the intuitive nota-

tion xE → xM can be used to indicate the causal relationship implied under H1; that is,

the engineers operate independently and the marketing group attempts to coordinate by

following their lead. A causal system, then, is a pair (G, θ) in which G is a finite, directed,

acyclic graph that represents the system’s causal structure and θ ≡ (θ1, ..., θn) is a profile of

parameters. Each node in G corresponds to an element in V (there is a bijection between

nodes in G and V ). G is said to be a causal structure on V.

Pi ⊂ V denotes the set of xi’s parents — its direct causes — in G and Di ⊂ V its set of

descendants (i.e., including but not limited to its children). Let pi ≡ (xj)xj∈Pibe the variable

whose components are the parents of xi. When Pi = ∅, pi can be any constant. Given an

outcome x̂, p̂i denotes the projection of x̂ into the dimensions corresponding to the parents

of xi.

The components of θ are conditional probabilities. Specifically, θi (x̂i|p̂i) indicates the

probability of x̂i conditional upon p̂i. It is noted without proof that θ induces a probability

distribution on X, denoted µθ, and referred to as the empirical distribution on outcomes

induced by θ.Moreover, if (G, θ) is a causal system, then µθ admits the following factorization

for all x̂ such that µθ (x̂) > 0,

µθ (x̂) =k∏

i=1

µθ (x̂i|p̂i) . (1)

Referring back to the example in §2, under H1, xE → xM . The parameters are θE (b) =

9

0.8, θM (c|c) = 1 and θM (c|b) = 0. The distribution on X induced by H1 is the one shown

in the table on page 6.

Causal systems are fairly general mathematical objects open to a wide range of inter-

pretation and, therefore, applicable to a wide range of decision problems. Examples outside

strategy include medical diagnostics, weather prediction, space shuttle propulsion systems,

and oil price forecasting (Korb and Nicholson, 2004). In the context of decision problems

facing a firm, causal systems might represent the workings of a particular factory production

process, the relationship between incentive systems and managerial behavior, the interaction

between firm resources and key performance variables, or the structure and conduct of an

industry. The application presented later in this paper follows the example of §2: firm costs

are determined by the joint activities of departments, which are determined according to a

stable (but unknown) set of influence relationships.

3.2 The connection between G and µθ

A causal system, (G, θ) implies two different types of “dependency model” on the elements

of V, one probabilistic (µθ) and the other graphic (G).5 Under µ

θ, given disjoint subsets

C,D,E ⊂ V, one can say whether “C is independent of D given E” using the standard

definition of conditional probability. Under the graphical notion of “d-separation” (Pearl,

1986), one can make similar statements with respect toG.6 The interesting fact is that, given

a causal system (G, θ), the conditional independencies implied by G (by d-separation) also

hold in µθ(by probability theory). For example, in (1), for all xi ∈ V, xi is µθ

-conditionally

independent of V \Pi ∪Di given Pi. At the same time, Pi d-separates xi from V \Pi ∪Di in

G.

Thus, the conditional independence relationships implied by G remain invariant to

changes in the numerical parameters comprising θ. As Pearl (2000, p. 25) says, “We expect

such difference in stability because causal relationships are ontological, describing objective

physical constraints in our world, whereas probabilistic relationships are epistemic, reflect-

5See the detailed discussion in Appendix A6A complete understanding of the technical details of d-separation is not required for what follows.

10

ing what we know or believe about the world. Therefore, causal relationships should remain

unaltered as long as no change has taken place in the environment, even when our knowledge

about the environment undergoes changes.”

As it turns out, the relationship between G and µθ is generally stronger than described

above. (G, θ) is said to be faithful when every conditional independence statement implied

by G under d-separation corresponds to a conditional independence statement implied by

µθ under the probabilistic notion of conditional independence, and visa versa. As we will

see, faithful systems have certain properties that are useful for our purposes. Fortunately,

almost all causal systems are faithful. Specifically, representing θ by a real vector, it can be

shown that the set of such vectors that fail the faithfulness condition for arbitrary G is of

Lebesgue measure zero (Meek, 1995).

4 Individual decisions under causal ambiguity

As we saw in the motivating example, knowledge of θ alone is not sufficient to predict the

consequences of an intervention; one must also know G. Thus, in addition to assessing the

random elements of his or her environment, an agent must also form theories about the deeper

but unknown causal structure that drives it. The agent’s decision problem is to choose an

intervention to maximize his expected payoff given his beliefs with respect to the true causal

system governing his environment. To proceed, we must formalize what is meant by an

“intervention,” and quantify “causal ambiguity.” Once done, it is then possible to calculate

the agent’s subjective expectations with respect to each of the available interventions.

In addition, in order to say something about the kinds of behavior expected in situations

like this, some notion of equilibrium is required. The one introduced here, causal equilibrium

(hereafter, CE), is a special case of Kalai and Lehrer’s (1993) subjective Nash equilibrium

(see also Ryall, 2003). CE requires that the agent’s intervention be optimal given his beliefs

and that the resulting empirical distribution on outcomes be consistent with those beliefs.

The idea is that a decision is stable only when the empirical distribution does not contradict

the beliefs that led to the decision in the first place. One interpretation of CE is that it is

11

what one might observe after an initial (unmodelled) period of experimentation and learning.

More on this later.

4.1 Interventions

In order to keep things simple, assume that presented with a causal system (G, θ) and payoff

function π, an intervention consists of fixing one environmental variable.7 Specifically, the

agent has the power to choose one xi ∈ V and set it to xi = x̂i for any x̂i ∈ Xi. Let (xi ⇒ x̂i)

indicate the intervention that sets xi to x̂i; the “do-nothing” intervention is denoted ∅. The

set of all possible interventions is labelled I with �x indicating a generic intervention.

Making an intervention changes the environmental parameters. That is, given θ, the

intervention (xi ⇒ x̂i) has the effect of adjusting θi such that, for all p̂i, θi (x̂′i|p̂i) = 1 when

x̂′i= x̂i and θi (x̂

′i|p̂i) = 0 otherwise. In other words, (xi ⇒ x̂i) implies xi becomes a constant,

x̂i. An arbitrary intervention �x generates a new empirical distribution, denoted µθ|�x.

We wish to construct a graph that exhibits the same relationship to µθ|�x as G exhibits

to µθ. Making an intervention renders the manipulated variable independent of all other

variables. Thus, G�x is constructed by taking G and simply removing all edges involving xi.8

Denote this graph G�x. Let π̄θ|�x denote the expected payoff implied by �x under (G, θ) .

In the example of §2, if the underlying system is the one described by H1, then manage-

ment would like to implement the intervention (xE ⇒ b) , resulting in π̄B = 1. IfH2 describes

the true system, the optimal intervention is (xM ⇒ b) . Under H1, G is the graph xE → xM .

In this case, the graphs G(xE⇒b) and G(xM⇒b) are identical: two disconnect nodes, xE and

xM . The same is true under H2.

7The generalization that allows an agent to set an arbitrary number of variables is straightforward.8If interventions are noisy (that is, instead of making xi a constant, an intervention results in a new

distribution θ′

i), then G�x is constructed by removing only the edges between xi and its parents. All of the

following results hold in this case.

12

4.2 Causal beliefs

Assume the agent does not know the true causal system. When V is understood from the

context, let C be the set of all causal systems such that (G, θ) ∈ C only if G is a causal

structure on V. Note that while the number of causal structures on V is finite (because V

is finite), C is uncountably infinite. To represent the agents’ assessment regarding which

causal structure is the true one, let β = (β1, ..., βl) be multinomial probability distribu-

tion on a finite subset Cβ ⊂ C where βi indicates the agent’s belief that (G, θ)i ∈ Cβ

is the true system.9 In the example, Cβ = {(xE → xM , θ1) , (xM → xE, θ2)} where θ1 =

(θE (b) = .8, θM (c|c) = 1, θM (c|b) = 0) and θ2 = (θM (b) = .8, θE (b|b) = 1, θE (b|c) = 0) .

4.3 Subjective expected payoff

Given the set of environmental variables V and beliefs β, the expected payoff from allowing

the system to operate without intervention is

π̄β|∅ ≡∑

(G,θ)i∈Cβ

βi

∑

x̂∈X

µθ (x̂)π (x̂) .

Similarly, intervention �x under beliefs β imply the expected payoff

π̄β|�x ≡∑

(G,θ)i∈Cβ

βi

∑

x̂∈X

µθ|�x (x̂)π (x̂) .

Thus, the agent’s subjective best reply set to beliefs β is defined as

BRβ ≡{�x ∈ I|∀�x′ ∈ I, π̄θ|�x ≥ π̄θ|�x′

}.

In the example, (xE ⇒ b) implies π̄β|(xE⇒b) = .5 (1) + .5 (.4) = .7. Given these beliefs,

BRβ = ∅; that is, the unique best response is to do nothing.

9Note that, while beliefs could be defined as a probability measure on some measure space (C, C) with

a suitably defined σ-algebra C, the set of causal systems upon which the agent places positive weight in

equilibrium is finite. Hence, for the purposes of this paper, there is no gain in encumbering the analysis with

more complex belief structures.

13

4.4 Causal equilibrium

The formal definition of CE imposes a consistency condition on the consequences of an

agent’s actions with respect to his initial beliefs. The idea is that a subjectively optimal

intervention is stable when the frequency of observed outcomes is consistent with the agent’s

prior expectations.

Definition 1 Given a causal system (G, θ), a causal equilibrium is a pair (β, �x) such that

1. (Subjective optimization) �x ∈ BRβ;

2. (Empirical consistency) ∀ (G′, θ′) ∈ Cβ, µθ′|�x = µθ|�x.

The subjective optimization condition is self-explanatory. The empirical consistency

requirement is that, in equilibrium, outcomes occur with the expected probability. Equiv-

alently, the agent’s assessment of the probabilities with which outcomes are generated is

correct given his intervention decision. Notice that this degree of consistency does not imply

correct causal beliefs; that is, there may be �x′ ∈ I for which π̄θ|�x′ > π̄θ|�x. Hence, this notion

of equilibrium is weaker than Nash (which would require an accurate assessment of the true

causal system). If there is no causal ambiguity, then CE is equivalent to Nash.

Although the actual results generated in a CE are consistent with the results expected

by the agent, the potential for trouble arises from the fact that the agent’s counterfactual

predictions — that is, the predictions of what would have happened had the agent taken some

other course of action — are not observed. Therefore, an agent in a CE may experience

persistently suboptimal performance because he never observes the superior consequences

that would have obtained had the truly optimal decision been implemented.

In the example, the decision to make no intervention combined with 50/50 beliefs on H1

and H2 is a CE. It has already been shown that ∅ ∈ BRβ, thereby meeting requirement 1.

Given this (non-)intervention, management expects the distribution shown in the table on

page 6 and, indeed, this is the distribution generated by the no-intervention decision. Thus,

condition 2 is also met.

14

CE presents a nicely balanced form of “bounded rationality.” Agents optimize given their

beliefs (and are rational in the sense of being subjective profit maximizers) and are allowed

to believe anything they want provided their beliefs are not contradicted by observable data

(and, so, are also rational in the sense of not clinging to beliefs that are demonstrably wrong).

However, decisions may well be suboptimal (due to erroneous counterfactual beliefs). At the

same time, bad behavior is not dependant upon ad hoc assumptions invoking psychological

bias, hubris, non-profit objectives and the like.

4.5 A measure of causal ambiguity

As we have seen, a causal structure imposes certain constraints on the empirical distributions

it generates. Hence, given an empirical distribution, certain structures can generally be ruled

out. The important message of this section is that to say agents face causal ambiguity is

not to say “anything goes.” Rather, the degree of causal ambiguity permitted in equilibrium

fundamentally depends upon the true causal structure underlying the system.

In the example of §2, the two causal systems under consideration are empirically indis-

tinguishable; that is,no matter how long management observes the outcomes of unmanaged

departmental interactions, it is always impossible to distinguish H1 from H2. This is a gen-

eral consequence of the causal structure of these two hypotheses — as opposed to a special

outcome effected by careful choice of θ. That is, for any set of parameters under H1, there

exists a set of parameters under H2 that results in the same empirical distribution and visa

versa.

To see how this works, suppose µ is an arbitrary probability distribution on the outcomes

in the example. Then, simply by the definition of conditional probability, µ can be factored

as

∀x̂M ∈ XM , ∀x̂E ∈ XE µ (x̂M , x̂E) = µ (x̂M |x̂E)µ (x̂E) (2)

where µ (x̂M |x̂E) and µ (x̂E) are the µ-conditional probability of x̂M given x̂E and the µ-

marginal probability of x̂E, respectively. Of course, these numbers happen to correspond

directly to θ parameters that generate µ under H1. Note, however, that µ can also be

15

factored as (again, simply by the definition of conditional probability)

µ (x̂M , x̂E) = µ (x̂E|x̂M)µ (x̂M) . (3)

This is significant because the numbers on the right hand side of this equation correspond

to θ parameters consistent with H2. Thus, for any set of parameters θ that deliver µθ under

H1, there is a corresponding set of parameters θ′ that deliver the same empirical distribution

on outcomes under H2. An outsider observing only the undisturbed empirical distribution is

never able to infer which hypothesis describes the true causal system.

The hypothesis, say H3, that the two departments operate independently is not empiri-

cally indistinguishable from H1 or H2 since, if H3 were true, then for all θ,

µθ(x̂M , x̂E) = µ

θ(x̂E)µθ (x̂M) ,

which does not generally hold in (2) or (3). Formally, two causal structuresG andG′ on a set

of environmental variables V are said to be empirically indistinguishable, denoted G ≈ G′,

if, for all causal systems (G, θ) there exists a causal system (G′, θ′) such that µθ= µ

θ′ . Let

GG denote the equivalence class of causal structures that are empirically indistinguishable

from G. It is important to note that the composition of GG depends only upon G itself and

not, i.e., upon any particular θ.

Definition 2 The degree of causal ambiguity of a causal structure G is |GG| , the set car-

dinality of GG. G is said to be causally ambiguous if |GG| > 1.

Under this definition, a causal structure is ambiguous when it is never completely identi-

fied by the data it generates. When two or more causal structures imply identical conditional

independencies in every empirical distribution, there are no parameter values for one that

ever allows an outsider to distinguish it from the other (simply by observing the behavior of

the system).

Naturally, given a particular causal structure, we would like to know which additional

structures, if any, are contained in GG. For the following proposition, given a causal structure

G, let E be the set of edges without reference to direction; i.e., {xi, xj} ∈ E if and only

16

if (xi → xj) or (xj → xi) in G. Let S be the set of all ordered triples (xi, xj, xk) such that

(xi, xj, xk) ∈ S if and only if (xi → xj) , (xk → xj) and {xi, xk} /∈ E. That is, S is the

collection of triples that form v-structures like

xi xk

↘ ↙

xj

where it is important to note that xi and xk are not adjacent in the graph.

Proposition 1 Two causal structures G and G′ on a set of environmental variables V are

empirically indistinguishable if and only if E = E′ and S = S′.

Proof. See Appendix A.

This is a rather remarkable proposition providing, as it does, a test for causal ambiguity

based upon simple visual inspection. Returning to the example, the two hypothesis H1

and H2 have causal structures G1 : xE → xM and G2 : xM → xE, respectively. First, we

have E1 = {{xE, xM}} and E2 = {{xE, xM}} , so E1 = E2. Neither G1 or G2 have any

v-structures, so S1 = S2 = ∅. Thus, G1 ∈ GG2implying that both G1 and G2 are causally

ambiguous. The hypothesis that the departments do not interact, H3, is represented by a

disconnected graph, G3, with nodes xE and xM . Since E = ∅, G3 is not ambiguous.

Consider a slightly more complicated example. Suppose a firm is comprised of five de-

partments: Engineering, Design, Manufacturing, Marketing and Sales. Assume the structure

of departmental interaction is Engineering and Design independently feed ideas to Manu-

facturing which implements new products on the basis of their input. The new product

specs are then delivered by Manufacturing to Sales and Marketing which choose sales and

marketing programs, respectively, designed to maximize profit. The causal structure is

xE xD

↘ ↙

xMan

↙ ↘

xS xMar

(4)

17

In this case, the causal structure is not ambiguous. To see this, note first that any viable

candidate for empirical indistinguishability must have the same edges. The only dimension of

flexibility is the direction of the arrows. However, switching the direction of any arrow either

breaks a v-structure (e.g., reversing xE → xMan) or creates a new one (e.g., reversing xMan →

xS). Therefore, there are no other causal structures in the empirical indistinguishability

equivalence class.

Proposition 2 Suppose (β, �x) is a CE under the true causal system (G, θ). Then, G′ ∈ Cβ

only if G′

�x ∈ GG�x(a.s.).

Proof. Given (G, θ) and �x, let θ|�x indicate the list of parameters θ adjusted for �x. Then,

as discussed above, (G�x, θ|�x) is almost surely faithful. By Spirtes et al. (2000, Theorem 4.3),

a graph G′

�x is faithful to µθ|�x if and only if it is empirically indistinguishable from G�x. The

result then follows from condition (2) of CE.

Corollary 1 Suppose (β, �x) is a CE under the true causal system (G, θ). Then, the maximal

set of causal systems that receive positive weight under β is finite and can be generated by

exhaustive application of Proposition 1 on G�x.

These propositions demonstrate that agents may indeed “get it wrong” in equilibrium

and choose suboptimal interventions only as a result of causal ambiguity. On the other hand,

regardless of the situation, the extent of this type of ambiguity has its limit depending upon

the true underlying structure. Also, note well that the set of systems receiving positive

weight under �x are those empirically indistinguishable from G�x, not G; G′

�x∈ GG�x

does not,

in general, imply G′ ∈ GG.

For example, suppose the true causal structure is the one shown in (4). Then, the systems

18

that are empirically indistinguishable given an intervention on xD are as follows.

xE 1 xD

↘

xMan

↙ ↘

xS xMar

xE 2 xD

↘

xMan

↙ ↘

xS xMar

xE 3 xD

↖

xMan

↗ ↘

xS xMar

xE 4 xD

↖

xMan

↙ ↖

xS xMar

There are twelve systems that become empirically indistinguishable under an intervention

on xD. They are the four detailed above plus the additional eight generated by making a

directed link between xMan and xD in one direction or the other.

Up to now, no mention has been made of temporal knowledge (i.e., regarding the timing

of one variable vis-à-vis another). The previous results obtain when an agent has cross-

sectional data on the behavior of the system. If the agent has additional knowledge about

the timing of events, these results can be strengthened considerably. For example, Pearl

(1988) shows that if the variables in V are completely ordered by the timing of their relative

occurrences, then the underlying causal structure is uniquely identified by µθ.

Finally, note well that the preceding results are related to condition (2) in the definition of

CE. However, consideration of subjective optimization condition (1) may refine equilibrium

even further. For example, the extreme case is one in which the agent has a dominant

intervention regardless of the causal structure; i.e., the minimum payoff guaranteed under

(xi ⇒ x̂i) is greater than the maximum attainable under any other intervention (regardless

of θ). In such a case, the agent will implement the dominant intervention regardless of what

she thinks about the underlying causal structure.

As mentioned earlier, the only paper dealing explicitly with causal ambiguity and its

relationship to sustained performance advantage is Reed and DeFillippi (1990). The authors

19

argue, among other things, that causal ambiguity is exponentially increasing in complexity.

They say (p. 93), “Three separate skills can have up to 4 interactions, 4 skills can have 10

interactions, 5 can have 19, and so forth. Even if all potential interactions do not occur,

the ambiguity that is derived from the complxity of interaction is still likely to increase at a

greater than arithmetic rate.” Because the analysis is informal, it is hard to say how, exactly,

their model relates to this one. From the previous statement, however, what they seem to

have in mind as a measure of complexity is the number of nodes in the causal structure G.

The preceding results demonstrate the danger of this type of intuitive reasoning. Ac-

cording to the theory presented here, causal ambiguity is monotonic neither in the number

of nodes nor in the number of linkages between nodes. For example the structure x1 → x2 is

indistinguishable from x1 ← x2 but the structure x1 → x2 ← x3 is unambiguous. The catch

in the previous logic is that sometimes additional structure (and, by implication, complexity)

actually helps one discern the true underlying causal architecture.

5 Causal ambiguity and industry performance

I now apply the theory developed above to a competitive situation involving multiple firms.

In order to keep things concrete, the extension is done in the context of a specific application

along the lines of the example in §2.

5.1 Setup

Assume there are n firms involved in Cournot-style quantity competition indexed by N ≡

{1, ..., n}. In order to isolate the effects of causal ambiguity, assume that the firms control

homogeneous technologies. Specifically, each firm is composed of k departments. The joint

activities of the departments determine the firms’ marginal costs. Extending the notation

developed in the previous section to allow for multiple agents, let xij indicate the activity

of department j in firm i, which takes values in Xj (these are the same for all firms, so no

additional subscript is necessary). The set of joint departmental outcomes is X. An actual

outcome for firm i is denoted x̂i = (x̂i1, ..., x̂ik) ∈ X and, now, x̂ = (x̂1, ..., x̂n) denotes a

20

profile of outcomes for the entire industry.

Firm i’s marginal cost (constant with respect to quantity) is a function of what its

departments do: c (x̂i) indicates the marginal cost of firm i when the joint activities of its

departments is x̂i. Note that the function c is identical for all firms. The generation of costs

is governed by a true causal system (G, θ) which, since technologies are homogeneous, is also

identical for all firms.

With multiple agents, let �x ≡ (�x1, ..., �xn) with �xi = (xij ⇒ x̂ij) indicating an intervention

by firm i. Also, since there are multiple firms, let β = (β1, ..., βn) indicate a profile of beliefs

with βi = (βi1, ..., βim) the vector of weights placed by agent i on the set of m causal systems

she thinks may be true.

The industry is modelled in two stages. In the first stage, firms simultaneously choose

an intervention, after which a vector of actual costs, c (x̂) = (c (x̂1) , ..., c (x̂n)) is determined

according to the distributions(µθ|�x1, ..., µθ|�xn

). In the second stage, firms observe c (x̂) and

then compete in standard Cournot style. Assume inverse demand is given by λ (q) ≡ α −∑n

i=1 qi where α is a demand parameter (the usual restrictions apply).

When x̂ is the outcome in the first stage, q (x̂) ≡ (q1 (x̂) , ..., qn (x̂)) denotes the Cournot

Nash equilibrium quantities chosen in the second stage (given costs c (x̂)). The actual profit

of firm i is πi (x̂) ≡ (λ (q (x̂))− c (x̂i)) qi (x̂) . Thus, given a profile of interventions, �x, in the

first stage and assuming Nash Cournot equilibrium in the second, it is possible to calculate

the expected profit to firm i as π̄i (�x) ≡∑

x̂∈X πi (x̂)µθ|�x1 (x̂1) . . . µθ|�xn (x̂n) .

Notice that, although the intervention of one firm affects the expected profit of another,

the dominant strategy for all firms in the first stage is to pick the intervention that delivers

the lowest expected cost (i.e., regardless of what the other agents do). Intervention �xi under

beliefs βi implies an expected cost of

c̄βi|�xi ≡∑

(G,θ)∈Cβi

βi (G, θ)∑

x̂i∈X

µθ|�xi(x̂i) c (x̂i) .

Hence, firm i’s subjective best reply set to beliefs βi is defined as

BRβi≡

{�xi ∈ I|∀�x′

i ∈ I, c̄βi|�xi≤ c̄βi|�x

′

i

}.

21

The application of causal equilibrium to this setup is now straightforward: firms choose a

CE in their own first-stage decision problem and Cournot quantities in the second stage.

When there is no causal ambiguity (i.e., agents place full weight on the true causal system),

Cournot Nash equilibrium obtains.

5.2 Causal ambiguity and performance heterogeneity

This section begins with a simple example that demonstrates how causal ambiguity may,

indeed, lead to performance heterogeneity and concludes with a more general set of propo-

sitions. Suppose there are only two firms. Both face the technology described in §2; i.e.,

there are two departments, E and M, each of which have outcomes of b or c. Assume the

true causal system is xE → xM . Now, instead of profits, assume that the joint output of the

departments implies the following marginal costs:

E M $cost

b b 1

b c 4

c b 4

c c 2

The actual expected costs under each of the available interventions are:

Intervention c̄ (x̂ij)

∅ 1.2

(xE ⇒ c) 2.0

(xE ⇒ b) 1.0

(xM ⇒ c) 3.6

(xM ⇒ b) 1.6

Thus, if both firms know the true causal structure, they both choose the optimal intervention,

(xE ⇒ b) , resulting in identical average costs of 1.0.

Now, suppose firm 1 places equal weight on xE → xM and xE ← xM , respectively, while

firm 2 is certain that xE → xM is the true structure. Under these beliefs, each firm’s expected

22

cost under each intervention is

Intervention Firm 1 Firm 2

∅ 1.2 1.2

(xE ⇒ c) 2.8 2.0

(xE ⇒ b) 1.3 1.0

(xM ⇒ c) 2.8 3.6

(xM ⇒ b) 1.3 1.6

The optimal interventions are for firm 1 to do nothing and for firm 2 to set xE to b. Note

that this is an equilibrium: under the do-nothing intervention, firm 1 observes its expected

distribution over outcomes and obtains the anticipated average cost. The same is true for

firm 2.

Noting that the expected Nash equilibrium quantity for firm i is given by

q̄∗i =α− 2c̄i + c̄j

3,

firm 1 has expected output of q̄∗1= 0.87 and firm 2 has q̄∗

2= 1.07; the corresponding market

shares are s1 = 44.8% and s1 = 55.2%. Expected profits are π̄∗

1= 1.04 and π̄∗

2= 1.07,

respectively (a 2% advantage for firm 2). Thus, in this circumstance, causal ambiguity

results in performance heterogeneity within the industry — even though the firms’ actual

technologies are identical.

The full characterization of equilibria in this example is as follows. Let βibe the weight

placed by firm i on xE → xM (the true hypothesis). Then, in equilibrium, firm i either

chooses: 1) the objectively optimal intervention (xE ⇒ b) with βi= 1, or 2) the no-

intervention option (∅) with belief βi∈ [.33, .67] (i.e., any belief in this range supports

the no intervention decision). Thus, as we already know from Proposition 2, the amount of

performance heterogeneity observed in equilibrium is limited by the true causal structure.

For example, there are no equilibrium beliefs that support the intervention (xM ⇒ b).

While causal ambiguity is necessary for performance heterogeneity, it is not sufficient.

The range of possible causal equilibria depends upon beliefs, causal structure and payoffs.

For example, even with causal ambiguity, if firms have sufficiently similar beliefs, they choose

23

interventions that result in identical average costs (e.g., this would have been the case if firm

2 above had β2∈ [.33, .67]). Alternatively, with costs of

E M $cost

b b 1

b c 1

c b 4

c c 4

the unique equilibrium is for firms to place full weight on xE → xM and to choose (xE ⇒ b) .10

To generalize this result, recall that the Herfindahl index of market concentration is

defined as H ≡∑

i∈N s2i , where si is the market share of firm i. H reflects both the number

of firms and their relative sizes. It can be shown that H = nσ2 + 1

nwhere σ2 is the variance

in firms’ market shares. Hence, when n is fixed, H is increasing in the inequality of these

shares. Let σ2

�x denote the variance in firms’ expected market shares and H�x ≡ nσ2

�x +1

nthe

expected Herfindahl index, respectively, under �x. If all firms have equal expected market

shares, then σ2

�x = 0 and H�x = 1

n. Recall that, in Cournot competition, market shares are

positively correlated with relative profitability.

We also require a formal measure of the dispersion of causal beliefs within the industry.

Given industry beliefs β, let m be the number of causal systems receiving positive weight by

at least one agent (i.e., m is the set cardinality of the union of the Cβi’s). Agent i’s beliefs

are then represented by a vector βi =(β1

i , ..., βmi

). The variance of beliefs in the industry is

given by σ2

β ≡ 1

n

∑i∈N

∥∥βi − β̄∥∥2 , where

∥∥βi − β̄∥∥ is the Euclidean distance between βi and

the mean belief β̄.11

Proposition 3 If (β, �x) is a CE with H�x >1

n, then σ2

β > 0.

10To put it differently, the only optimal intervention in this case, regardless of one’s belief is (xE ⇒ b) .

However, the only causal system consistent with the empirical distribution implied by (xE ⇒ b) is xE → xM

and, hence, the agents choosing this intervention must, in equilibrium, know the true nature of the causal

structure governing their environment.

11The mean belief is β̄ =

(β̄1, ..., β̄

m)

where β̄l≡

1

n

∑i∈N βli. The Euclidean distance is

∥∥βi − β̄∥∥ =√(

β1i − β̄1)2+ · · ·+

(β1i − β̄

m)2. Hence, σ2β =

1

n

∑ni=1

∑ml=1

(βli − β̄

l)2

.

24

Proof. First, note that, for all i ∈ N and all �xi, �x′

i ∈ BRβi, c̄βi|�xi

= c̄βi|�x′

i. Now, suppose

H�x > 1

nand σ2

β = 0. Since σ2

β = 0, for all i, j ∈ N, βi = βj. This implies, for all i, j ∈ N,

BRβi= BRβj

. Thus, since (β, �x) is a CME, for all i, j ∈ N, c̄βi|�xi= c̄βj |�xj

. But, this implies

equal expected market shares and, as a result, σ2

�x = 0. Hence, H�x = 1

n, a contradiction.

Proposition 4 If (β, �x) is a CE with σ2

β > 0, then H�x ≥ 1

n.

Proof. The preceding example demonstrates there exist CME in which σ2

β > 0 and

H�x > 1

nas well as those in which σ2

β > 0 and H�x = 1

n(e.g., firms 1 and 2 have beliefs

β1= .33 and β

2= .67, respectively, and both choose �xi = ∅).

Proposition 5 If (β, �x) is a CE with H�x > 1

n, then there exist i ∈ N such that G�xi

is

causally ambiguous.

Proof. Assume the premise with∣∣∣GG�xi

∣∣∣ = 1 for all i ∈ N. Then, for all i ∈ N, βi (G, θ) =

1, implying σ2

β = 0. Hence, H�x =1

n, a contradiction.

These propositions, taken together, make several important points about the relation-

ship between causal ambiguity and firm performance heterogeneity. Proposition 3 says that

heterogeneous causal beliefs are necessary for performancy heterogeneity. It is not enough

for firms to be uncertain about the causal mechanics of their environment — their theories

about these mechanics must be sufficiently heterogeneous that the firms implement different

interventions. However, Proposition 4 says that heterogeneous beliefs are not, in and of

themselves, sufficient to induce performance heterogeneity. In any situation, there may be a

large number of causal systems all of which imply the optimality of the same intervention.

Proposition 5 makes it clear that performance heterogeneity is not possible without causal

ambiguity

This last point is, obviously, the important one. On the one hand, the previous analysis

presents a formal validation of the oft-repeated claim that causal ambiguity can be a key

determinant in the performance heterogeneity of an industry (even, as shown here, when

firm technologies are identical). On the other hand, as we know from Proposition 2 and its

corollary, not only are some causal systems more ambiguous than others, but the range of

possibilities can be precisely established using relatively simple graphic analysis.

25

6 Caveats, extensions, etc.

The preceding analysis establishes the possibility of equilibria in which causal ambiguity is

a determining factor of firm performance heterogeneity. In some ways, the underlying as-

sumptions serve to strengthen their implications. In particular, the finding that performance

heterogeneity may obtain even when all firms have identical technologies is striking. This

suggests that this phenomenon may be more prevalent in the real world; i.e., in which firm

technologies and the constraints to managerial intervention are varied.

On the other hand, it seems highly unlikely that, upon observing the performance of direct

competitors known to have similar technologies, high-cost firms would be able to maintain

the fiction that they are objectively optimizing. Even if costs were private information, firms

might observe each other’s market shares and, thereby, have some better sense of the accuracy

of their beliefs. Public information on the performance of one’s competitors may work against

the stability of heterogeneous causal beliefs; i.e., as long as someone else is outperforming

me, I know I have not discovered the optimal intervention. Of course, experimentation

is expensive. Thus, it may be that diverse initial priors under causal ambiguity leads to

intervention paths that eventually stabilize with some firms out-performing others due to

these inherent experimentation costs.

This suggests that a useful extension is to make the model dynamic. Such an extension

would involve a greater degree of mathematical complexity, but the resulting insights might

well warrant it. It would be nice to know, for example, the extent to which public infor-

mation in a competitive setting causes beliefs to converge and, thereby, the extent to which

causal ambiguity-based performance advantage survives in dynamic markets. Similarly, in a

dynamic setting, causal learning could also be explored. Characterizing optimal experimen-

tation policies would be a component of such exploration. By embedding a specific learning

process in a dynamic version of the model, it may be possible to say something precise about

how and when CE are reached.

A related point is that, in this model, firm interventions are not strategic. That is, firms

always have the incentive to choose an intervention that minimizes marginal production cost

26

without regard to what everyone else does. However, in a situation in which firms observe

each other and then have an opportunity to react, it may behoove some firms, for example,

to knowingly choose a somewhat suboptimal intervention in order to sow confusion amongst

their competitors (i.e., make ambiguous interventions). Although the paper does not tackle

this issue, it does present a framework by which such questions can be answered.

Finally, it should be pointed out that the literature on probabilistic networks is exten-

sive and includes numerous contributions on estimating the underlying causal system from

empirical data (several good initial references are presented in Appendix A). This raises

the possibility of investigating the relationship between causal ambiguity and performance

heterogeneity via empirical methods.

27

A Dependency models

The following is a condensed discussion of the relevant underlying theory of probabilistic

networks adapted from Penalva and Ryall (2003). Since most strategy researchers are unfa-

miliar with this literature, the objective here is to: (i) give readers a sense of its theoretical

content, and (ii) provide sufficient technical detail to support the preceding discussion. For

those interested in pursuing these ideas further, I suggest starting with the texts by Cowell

et al. (1999), Pearl (1988, 2000) and Spirtes et al. (2000).

Definition 3 A dependency model M over a finite set of elements V is a collection of

independence statements of the form (C ⊥ D|E) in which C,D and E are disjoint subsets

of V and which is read “C is independent of D given E.” The negation of an independency

is called a dependency.

The notion of a general dependency model was originated by Pearl and Paz (1985),

who were motivated to develop a set of axiomatic conditions on general dependency models

that would include probabilistic and graphical dependencies as special cases. These axioms

are known as the graphoid axioms.12 We are interested in graphoids, which are defined as

dependency models that are closed under the graphoid axioms.

For example, given a probability space (M,M, µ) and an associated, finite set of random

variables X indexed by V = {1, ..., t} with typical element x̃r, Mµ is the list of conditional

independencies that hold under µ. For all W ⊆ V, let x̃W ≡ (x̃r)r∈W . Then, for all disjoint

C,D,E ⊂ V, (C ⊥ D|E) ∈ Mµ if and only if x̃C is µ-conditionally independent of x̃D given

x̃E. A proof that the graphoid axioms hold for conditional independence in all probability

distributions can be found in Spohn (1980).

Alternatively, if G is a graph whose vertices are V , then for all disjoint C,D,E ⊂ V,

(C ⊥ D|E) ∈ MG if and only if E is a cutset separating C from D. Of course, in this

case, the meaning of (C ⊥ D|E) depends upon how one defines “cutset.” The literature on

probabilistic networks contains several such definitions, depending upon whether the graph

is undirected, directed or some mixture of the two (i.e., a chain graph). We proceed with

Pearl’s (1986) notion of d-separation (the d stands for “directed”).

28

Given a DAG G and a path (ordered set of nodes) W ⊆ V , a node xr ∈ W is called

head-to-head with respect to W if xr−1 → xr and xr ← xr+1 inW . A node that starts or ends

a path is not head-to-head. A path W ⊂ V is active by E ⊂ V if: (i) every head-to-head

node is in or has a descendant in E, and (ii) every other node in W is outside E. Otherwise,

W is said to be blocked by E.

Definition 4 If G is a DAG and C,D and E are disjoint subsets of V, then E is said to

d-separate C from D if and only if there exists no active path by E between a node in C and

a node in D.

Examples of d-separation can be found in the Pearl references cited above. Thus, given a

DAG G we define MG such that (C ⊥ D|E) ∈ MG if and only if E d-separates C from D

in G.

We wish to characterize the relationship between probabilistic and graphical dependency

models. This is done through the general notion of an independence map (or, I-map).

Definition 5 An I-map of a dependency model M is any model M ′ such that M ′ ⊆ M.

Given a probability space (M,M, µ) and an associated, finite set of random variables

V ≡ {x̃1, ..., x̃t} , the task of constructing a DAG G such that MG is an I-map of Mµ is

straightforward (see Geiger et al., 1990, p. 514). First, for all xr ∈ V, let Ur ≡ {1, ..., r − 1}

index the predecessors of x̃r according to V. Next, identify a minimal set of predecessors Pr ⊂

V such that ({r} ⊥ Ur\Pr|Pr)µ where the “µ” subscript indicates probabilistic independence

under µ. This results in a set of t independence statements known as a recursive basis drawn

from Mµ and denoted Eµ. Now, construct G such that xs → xr if and only if xs ∈ Pr. The

resulting graph G, a DAG, is said to be generated by Eµ and Pr = {xs ∈ V |xs → xr} is the

set of parents of xr in G.

The following theorems are from Geiger et al. (1990, Theorems 1 and 2). First, an

independence statement (C ⊥ D|E) is a semantic consequence (with respect to a class of

dependency models M — e.g., those that satisfy the graphoid axioms) of a set E of such

statements if (C ⊥ D|E) holds in every dependency model that satisfies E; i.e., (C ⊥ D|E) ∈

M for all M such that E ⊆ M ∈ M.

29

Theorem 2 (soundness) If M is a graphoid and E is any recursive basis drawn from M,

then the DAG generated by E is an I-map of M.

So, given (M,M, µ), the DAG G constructed in the fashion outlined above is an I-map

of Mµ. That is, every independence statement implied by G under d -separation corresponds

to a valid µ-conditional independency.

Theorem 3 (closure) Let D be a DAG generated by a recursive basis E. Then MD, the

dependency model generated by D, is exactly the closure of E under the graphoid axioms.

Two DAGs G and G′ are said to be empirically indistinguishable if every probability dis-

tribution that can be factored in accordance with the recursive basisEG ≡ {({r} ⊥ Ur\Pr|Pr) |r ∈ V }

can also be factored in accordance with EG′ ≡ {({r} ⊥ U ′

r\P ′

r|P ′

r) |r ∈ V } . The original

Theorem on empirical indistinguishability was by Verma and Pearl (1990,Theorem 1) and

is generalized by Andersson et al. (1997, Theorems B.1 and 2.1). The variation applied in

Proposition (1) for faithful indistinguishability is from Spirtes, et al. (2000, Theorem 4.2).

30

References

[1] Abreu, D., D. Pearce, , E. Stacchetti, 1990. Toward a theory of discounted repeated

games with imperfect monitoring. Econometrica, 58(5), 1041-63.

[2] Adner, R., and P. Zemsky, 2003. A demand-based view of sustainable competitive ad-

vantage: The evolution of substitution threats, resource rents and competitive positions.

Unpublished working paper. Insead.

[3] Andersson, S.A., D. Madigan, and M. D. Perlmann, 1997. A characterization of markov

equivalence classes for acyclic digraphs. The Annals of Statistics, 25(2), 505-541.

[4] Brandenburger, A., and H. W. Stuart, (1996). Value-based business strategy. Journal

of Economics and Management Strategy, 5: 5-24.

[5] Battigalli, P., and D. Guatoli, 1988. Conjectural equilibria and rationalizability in a

macroeconomic game with incomplete information. Instituto Economia Politica, Milan.

[6] Bollobas, B., 1998. Modern Graph Theory. Springer, New York.

[7] Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, 1999. Probabilistic

Networks and Expert Systems. Springer, New York.

[8] Fudenberg, D., and D. K. Levine, 1993. Self-confirming equilibrium. Econometrica, 61,

523-45.

[9] Geiger, D., T. S. Verma, and J. Pearl, 1990. Identifying independence in Bayesian

networks. Networks, 20, 507-34.

[10] Kalai, E., and E. Lehrer, 1993. Subjective equilibrium in repeated games. Econometrica,

61, 1231-1240.

[11] Korb, K. B., and A. E. Nicholson, 2004. Bayesian Artificial Intelligence. Chapman and

Hall/CRC, Boca Raton.

31

[12] Lippman, S. A., and R. P. Rumelt, 1982. Uncertain imitability: an analysis of inter-firm

differences in efficiency under competition. Bell Journal of Economics, 13(3):418-38.

[13] MacDonald, G., and M. D. Ryall, 2003. Does value lurk in the shadows? New methods

for the identification and evaluation of strategic initiatives. Unpublished working paper,

Washington University.

[14] ––, 2004. How do value creation and competition determine whether a firm appropri-

ates value? Management Science (forthcoming).

[15] Makadok, R., and J. Barney, 2001. Strategic factor market intelligence: An application

of information economics to strategy formulation and cmpetitor intelligence. Manage-

ment Science, 47 (12): 1621-38.

[16] Moody, F., 1995. I Sing the Body Electronic: A Year with Microsoft on the Multimedia

Frontier. Penguin Books, New York.

[17] Pearl, J., 1986. Fusion, propagation and structuring in belief networks. Artificial Intel-

ligence 29, 241-88.

[18] ––, 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Infer-

ence. North Holland, Amsterdam.

[19] ––, 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press,

Cambridge.

[20] ––, and Paz 1985. Graphoids: A graph-based logic for reasoning about relevance rela-

tions. Technical report 850038 (R-53-L), Cognitive Systems Laboratory, UCLA. Short

version in Advances in artificial intelligence 2, ed. Du Boulay, B., Hogg, D., Steels, L.

North Holland, Amsterdam.

[21] Penalva, J., and M. D. Ryall, 2003. Causal assessment in finite-length extensive-form

games. Working paper, University of Rochester.

32

[22] Reed, R., and R. J. DeFillippi (1990). Causal ambiguity, barriers to imitation, and

sustainable competitive advantage. Academy of Management Review, 15(1): 88-102.

[23] Ryall, M. D., 2003. Subjective rationality, self-confirming equilibrium and corporate

strategy. Management Science, 49(7): 936-49.

[24] Sorenson, O., and D. Waguespack, 2003. Social networks and exchange: Self-confirming

dynamics in Hollywood. Working paper, UCLA

[25] Spirtes, P., C. Glymour, and R. Scheines, 2000. Causation, Prediction and Search. The

MIT Press, Cambridge.

[26] Spohn, W., 1980. Stochastic independence, causal independence and shieldability. J.

Phil. Logic, 9, 73-99.

[27] Tversky, A., and D. Kahneman, 1981. The framing of decisions and the psychology of

choice. Science, 211(4481): 453-58.

[28] Verma, T. S., Pearl, J., 1990. Equivalence and synthesis of causal models, in: Pro-

ceedings of the 6th Conference on Uncertainty in Artificial Intelligence. Cambridge, pp.

220-7. Reprinted in: Bonissone, P., Henrion, M., Kanal, L. N., Lemmer, J. F. (Eds.),

Uncertainty in Artificial Intelligence, vol. 6, 255-68.

33

Documents

Causal Ambiguity as a Source of Firm Performance Heterogeneitybernus/publications/... · The main ﬁndings of the paper are presented in this section. A few concluding thoughts are