Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Causal Ambiguity as a Source of Firm Performance
Heterogeneity∗
Michael D. Ryall
University of Rochester
March 24, 2004
Abstract
A formal theory is presented by which to analyze decision making under causal
ambiguity. This theory is applied to firms competing in an industry in which technolo-
gies are homogeneous but driven by an unknown causal structure. It is demonstrated
that, even when firms have homogeneous technologies, performance heterogeneity may
exist in equilibrium as a result of causal ambiguity. Perhaps more importantly, it is
shown that the true causal structure limits the degree to which causal ambiguity may
be manifest. This paper contributes to a small but growing literature on the formal
theoretical foundations of strategy.
∗LA05A
1
1 Introduction
In constructing theories about sustained, cross-sectional performance differences within groups
of competing firms, strategy researchers often invoke “causal ambiguity” as a source of the
kind of confusion that might lead otherwise rational managers to make persistent suboptimal
decisions for their respective firms. While most strategy researchers are familiar with the
notion of causal ambiguity and its presumed influence on firm performance, the extant lit-
erature contains no formal discussion of what it is or how its effects on relative performance
are induced.1 Thus, the present literature does not provide precise answers to questions
such as: What is causal ambiguity? Does causal ambiguity lead to sustained performance
heterogeneity as is so often supposed? If so, by what mechanism does this phenomenon
operate? Are some situations more prone to causal confusion than others?
This paper presents a formal theory of causal ambiguity in the context of corporate
strategy. Using this theory, specific answers are provided to the preceding questions and
more. The theory assumes that the firm operates in an environment governed by an objective,
underlying network of causal laws called a causal system. This causal reality is, however,
unknown to the firm’s managers. The managers, faced with this uncertainty, develop causal
theories and assess the likely validity of each. This assessment forms the basis for the firm’s
subsequent actions.
My analysis begins with the single-agent decision problem under causal ambiguity. In this
context, I demonstrate that: 1) causal ambiguity may lead to inefficient equilibrium behavior;
and, 2) the degree of such ambiguity is limited by the true causal system (i.e., some causal
systems are inherently less transparent than others). Following this, the analysis is extended
to the case of multiple firms competing under causal ambiguity. It is shown that performance
heterogeneity may arise under causal ambiguity even when firms have identical technologies.
1For example, Lippman and Rumelt (1982) provide an early suggestion that causal ambiguity may underly
stable performance differences across firms — but do not model such ambiguity per se. Indeed, the only paper
of which I am aware that explicitly adresses causal ambiguity and its relationship to business strategy is
Reed and DeFillippi (1990). This paper, which is discussed in more detail later, uses informal arguments to
support its conclusions.
2
However, causal ambiguity alone is not sufficient to imply performance heterogeneity. Rather,
the existence of equilibria exhibiting performance heterogeneity depend jointly upon the
heterogeneity of firm beliefs, the relative transparency of the true system and the relationship
between environmental variables and firm costs.
This paper makes several contributions to the small but growing literature on the formal
theoretical foundations of strategy (Lippman and Rumelt, 1982; Brandenburger and Stuart,
1996; Makadok and Barney, 2001; Adner and Zemsky, 2003; Ryall, 2003; MacDonald and
Ryall, 2003 and 2004).2 First, a formal approach is presented for quantifying causal ambigu-
ity. The basic formalism, which draws upon the literature on probabilistic networks, is not
new. However, this paper marks the first attempt to exploit results on probabilistic networks
to answer foundational questions in strategy.3 Second, this marks the first formal demon-
stration that firm performance heterogeneity may, indeed, obtain in the presence of causal
ambiguity — even under homogeneous production technologies. Third, results are presented
that characterize the limits of causal ambiguity given the structure of the true causal system.
Finally, it is worth pointing out that the analysis relies upon a notion of self-confirming equi-
librium invented by Kalai and Lehrer (1993) and first applied to the foundations of strategy
by Ryall (2003).4 All the propositions, with the exception of Proposition 1, are new to this
paper.
2The unifying theme of this literature is its use of formal methods to develop general principles of value
distribution within a competitive setting. In this sense it differs, e.g., from industrial organization economics,
the primary focus of which tends to be on issues of economic efficiency.3Probabilistic networks constitute a very active area of research within the artificial intelligence commu-
nity and, as a result, one that is growing quite rapidly. Since most strategy researchers are not familiar with
it, I have included a condensed, self-contained discussion of the relevant results and references in Appendix
A.4The idea behind the self-confirming equilibrium literature is to loosen the strong rationality assumptions
inherent in the more traditional equilibrium concepts of economics. Foundational papers in this line include
Abreu et al. (1990), Battigalli and Guatoli (1988), Fudenberg and Levine (1993) and Kalai and Lehrer (1993).
Sorenson (2003) conducts an empirical investigation of self-confirming equilibria in the movie industry. The
material contained here is also related to the notion of causal Nash equilibrium developed by Penalva and
Ryall (2003).
3
The following section motivates the later formalism with an actual case taken from the
popular business press. A simple numerical model is constructed to illustrate many of the
key ideas that appear in the following sections. §3 initiates the formal part of the paper
by defining a causal system. §4 analyzes the single-agent decision problem under causal
ambiguity. The notion of an “intervention” is formalized, causal beliefs are described and
causal equilibrium is defined. The main results (Proposition 2 and its corollary) demonstrate
how the relative transparency of the true causal system limits the extent to which causal
ambiguity may obtain in equilibrium. §5 extends the results of the single-agent problem
to an industry in which firms with identical production technologies face the possibility of
causal ambiguity regarding the workings of that technology. The main findings of the paper
are presented in this section. A few concluding thoughts are tendered in §6.
2 Motivating example
The following excerpts are from Moody (1995), in which he chronicles a year spent at Mi-
crosoft shadowing a design and development team working on a multimedia project called
Sendak. Moody is an experienced journalist covering the technology beat. As he explains (p.
xviii), “With the somewhat bemused blessing of Microsoft, I lived with this team, attended
all of its meetings, shared an office with its lead developer, read all of its electronic mail, and
exhaustively discussed with team members their experiences in the context of the broader
Microsoft culture.”
The original purpose of this exercise was to write a book on how Microsoft managed
to maintain its leadership position as a producer of PC software — a sort of best-practices
manual for this industry. However, as Moody describes (p. 137), “I had come to Microsoft
to describe, up close, unprecedented success in action ... The more I watched the process of
creating Sendak try to unfold, the more confused ... I grew. These people were hardly the
crisp, precise, unerring, and ruthless Mircosoftoids of popular legend.”
Of particular relevance to this paper, is that one of the significant sources of confusion
was which group, exactly, played the leadership role in the development of the software: the
4
product designers or the software developers. Says Moody (p. xix), “The Microsoft approach
to corporate organization is to form small teams around specific products and leave them
alone to organize and work as they wish.” However, different elements of the Sendak team
had different agendas (p. 27), “Sendak ’s designers and editors would want to pack the
encyclopedia with features seen nowhere else ... Sendak ’s developers would want a far less
ambitious set of new features and ample time in which to write code for them.” Obviously,
which element held sway would largely determine the functionality of the software, the timing
of its release and, ultimately, its success in the marketplace,” [emph. added].
Moody highlights another important feature of the Sendak story of pertinence to the
theory developed below. It is this: leaving product teams alone was a managerial choice —
one that could be and, at times, was disregarded in favor of direct intervention. For example,
in describing the response of a senior manager to the machinations of the team, Moody says
(p. 217), “His direct interventions in team disputes invariably were in support of Bjerke —
an endorsement, it seemed to me, of the decisions she was making ...” [Bjerke was a lead
designer on the team.]
Moody’s narrative illuminates all of the elements that characterize the phenomena of
interest here. First, the mechanism causing team outcomes was unclear, even to an insider
privy to all of the information and activities of the team. Second, the various team members
had different skills and preferences depending upon their functional area membership. These
attributes, along with individual personalities, interacted in complex ways to drive results.
Third, clear managerial directives were not sufficient to guarantee efficient outcomes, even
when accepted and pursued by individual team members. Finally, senior managers occasion-
ally chose to intervene directly in team activities (even though their understanding of the
potential consequences of such interventions was limited).
Let us now consider a highly stylized model of the situation described by Moody in order
to introduce the analytical issues dealt with more generally in the upcoming formalism. Sup-
pose Microsoft is composed of two departments: Marketing, denoted M, and Engineering,
denoted E. Assume that the software market is divided into two main segments: educa-
tion and business. Success in the more profitable business segment implies success in the
5
education segment, but not conversely.
Software development requires the coordination of the title’s content (responsibility of
E) with a “look-and-feel” (responsibility of M) suitable to one segment or the other. Thus,
E produces content that, ultimately, appeals either only to the education segment or to
businesses (and, by implication, education as well). M ’s efforts on creating an appropriate
look-and-feel have similar implications. Let (b, b) denote a successful business/education
title, with the first component indicating business content from E and the second a business-
appropriate look and feel from M . The other potential outcomes are (b, c) , (c, b) and (c, c)
with the obvious interpretations. An outcome of (b, b) results in a $1 profit, (c, c) a breakeven
payoff and either (b, c) or (c, b) a loss of $2.
Suppose the firm has long operated under a decentralized organizational regime; that is,
an organization in which objectives are communicated to the department heads who are then
left to achieve them as they see fit. The historical record of outcomes, for a large number of
titles, is
Historical Data
Dept. Outcomes Observed Frequency
(c, c) 0.2
(c, b) 0.0
(b, c) 0.0
(b, b) 0.8
Historically, M and E have done a good job of producing profitable titles: left to their own
devices, titles earn $1 eight percent of the time and never lose money.
Suppose that in firms of this kind one department emerges as the de facto organiza-
tional leader in the development of software. The process by which one department or the
other attains this leadership role is complex and depends upon a multitude of firm-specific
attributes: organizational culture, explicit incentive systems, the personalities, preferences,
skills and past performance of the functional area managers, and so on. Moreover, while
formal reporting relationships and managerial directives may shape the emergence of the
leader, they may not be sufficient to do so. That is, the true process driving outcomes
6
may be the result of informal dynamics that do not conform to managerial edict. Hence, at
any point, upper management may face uncertainty with respect to which department truly
assumes this role.
Notice that, from the historical record, the hypothesis that the two departments act
independently can be ruled out immediately. Rather, the preceding data are consistent with
both of the following hypotheses: H1 : E leads in developing content, which is business-
suitable 80% of the time, and M provides a look-and-feel to match; and, H2 : M pushes
a particular look-and-feel, which is business-suitable 80% of the time, and E coordinates
the content. Thus, while it is clear from the data that one department has attained the
leadership role, it remains ambiguous which department that is.
Now, suppose management has two broad policy options: 1) continue with the decen-
tralized strategy and earn the expected profit of $0.80; or, 2) implement an intervention
strategy to improve success by becoming personally involved in functional area activities.
To make the example interesting, assume management can only intervene in the activities
of one department (due, e.g., to constrainted managerial resources).
Can performance be improved with an intervention? To answer, assume that manage-
ment’s prior beliefs over hypotheses are uninformative; that is, management assesses the
relative likelihood of the two hypotheses to be 50/50. Under these priors, the expected
profits associated with the available interventions are:
Profit Subjective
Intervention H1 H2 Expected Profit
None 0.8 0.8 .8
Set E to b 1.0 0.4 .7
Set M to b 0.4 1.0 .7
The first two columns detail the expected profits under each of the respective hypotheses.
The last is the subjective expected profit given management’s priors on each of the two
hypotheses.
In this situation, decentralization is the subjectively optimal decision. Suppose, however,
7
that H1 describes the true state of affairs. Then, forcing E to produce business titles 100%
of the time increases expected profits from .8 to 1. If H2 is true, management should focus
on M . The subjectively rational decision is not objectively optimal under either possible
state of the world.
There are a few things to note about this example. First, the decision to retain the
decentralized organization ensures that no data will ever be generated that identifies which
hypothesis is actually correct. Thus, the no-intervention policy is stable. Second, this
decision is based upon accurate data and correct Bayesian updating of priors over causal
hypotheses. Management does not suffer from any of the irrational decision biases often
written about in the behavioral decision literature (e.g., Tversky and Kahneman, 1981).
The decision not to intervene, while not objectively optimal, is subjectively optimal given
managerial priors. Third, the conclusion drawn above is not stable under all priors. For
example, if management places strong weight on H1 and, as a result, intervenes in the affairs
of E then he will eventually discover the truth (be it H1 or H2) and, from that point on,
implement the objectively optimal strategy.
Finally, intuition suggests that, in a market with several software manufacturers, confu-
sion of the type described here could account for stable, cross-sectional differences in firm
performance. For example, under heterogeneous priors, some firms might pursue a stable,
no intervention policy and others the optimal intervention.
3 Causal systems
In order to analyze situations in which the consequences of agents’ actions depend upon an
underlying, unknown system of causal laws, a formal representation of a “causal system” is
required. The conventions used throughout the paper are: variables and elements of sets are
represented with small letters, sets with capitals and sets of sets with calligraphics. Directed
graphs, denoted with bold capitals, play a significant role in what follows; standard graph-
theoretic terminology (parents, descendants, paths, etc.) is used freely without explicit
definition (see, e.g., Bollobas, 1998).
8
3.1 Causal primitives
Begin with a relevant, indexed set of environmental variables, V ≡ {x1, ..., xk} . Each variable
xi takes values from a finite setXi. In the previous example, V = {xE, xM} withXE = XM =
{c, b} . The set of outcomes is defined as X ≡ ×ki=1Xi with typical element x̂ ≡ (x̂1, ..., x̂k) ,
where x̂i indicates a specific value taken by xi. The agents’ payoffs depend upon outcomes.
Let π (x̂) denote the payoff to the agent when the outcome is x̂.
In the previous example, the set of outcomes isX = {(c, c) , (c, b) , (b, c) , (b, b)} . A specific
outcome is x̂ = (b, b) which has the associated payoff π (b, b) = 1.
Assume that outcomes are determined by an underlying system of causal relationships.
How can these relationships be represented formally? In the example, the intuitive nota-
tion xE → xM can be used to indicate the causal relationship implied under H1; that is,
the engineers operate independently and the marketing group attempts to coordinate by
following their lead. A causal system, then, is a pair (G, θ) in which G is a finite, directed,
acyclic graph that represents the system’s causal structure and θ ≡ (θ1, ..., θn) is a profile of
parameters. Each node in G corresponds to an element in V (there is a bijection between
nodes in G and V ). G is said to be a causal structure on V.
Pi ⊂ V denotes the set of xi’s parents — its direct causes — in G and Di ⊂ V its set of
descendants (i.e., including but not limited to its children). Let pi ≡ (xj)xj∈Pibe the variable
whose components are the parents of xi. When Pi = ∅, pi can be any constant. Given an
outcome x̂, p̂i denotes the projection of x̂ into the dimensions corresponding to the parents
of xi.
The components of θ are conditional probabilities. Specifically, θi (x̂i|p̂i) indicates the
probability of x̂i conditional upon p̂i. It is noted without proof that θ induces a probability
distribution on X, denoted µθ, and referred to as the empirical distribution on outcomes
induced by θ.Moreover, if (G, θ) is a causal system, then µθ admits the following factorization
for all x̂ such that µθ (x̂) > 0,
µθ (x̂) =k∏
i=1
µθ (x̂i|p̂i) . (1)
Referring back to the example in §2, under H1, xE → xM . The parameters are θE (b) =
9
0.8, θM (c|c) = 1 and θM (c|b) = 0. The distribution on X induced by H1 is the one shown
in the table on page 6.
Causal systems are fairly general mathematical objects open to a wide range of inter-
pretation and, therefore, applicable to a wide range of decision problems. Examples outside
strategy include medical diagnostics, weather prediction, space shuttle propulsion systems,
and oil price forecasting (Korb and Nicholson, 2004). In the context of decision problems
facing a firm, causal systems might represent the workings of a particular factory production
process, the relationship between incentive systems and managerial behavior, the interaction
between firm resources and key performance variables, or the structure and conduct of an
industry. The application presented later in this paper follows the example of §2: firm costs
are determined by the joint activities of departments, which are determined according to a
stable (but unknown) set of influence relationships.
3.2 The connection between G and µθ
A causal system, (G, θ) implies two different types of “dependency model” on the elements
of V, one probabilistic (µθ) and the other graphic (G).5 Under µ
θ, given disjoint subsets
C,D,E ⊂ V, one can say whether “C is independent of D given E” using the standard
definition of conditional probability. Under the graphical notion of “d-separation” (Pearl,
1986), one can make similar statements with respect toG.6 The interesting fact is that, given
a causal system (G, θ), the conditional independencies implied by G (by d-separation) also
hold in µθ(by probability theory). For example, in (1), for all xi ∈ V, xi is µθ
-conditionally
independent of V \Pi ∪Di given Pi. At the same time, Pi d-separates xi from V \Pi ∪Di in
G.
Thus, the conditional independence relationships implied by G remain invariant to
changes in the numerical parameters comprising θ. As Pearl (2000, p. 25) says, “We expect
such difference in stability because causal relationships are ontological, describing objective
physical constraints in our world, whereas probabilistic relationships are epistemic, reflect-
5See the detailed discussion in Appendix A6A complete understanding of the technical details of d-separation is not required for what follows.
10
ing what we know or believe about the world. Therefore, causal relationships should remain
unaltered as long as no change has taken place in the environment, even when our knowledge
about the environment undergoes changes.”
As it turns out, the relationship between G and µθ is generally stronger than described
above. (G, θ) is said to be faithful when every conditional independence statement implied
by G under d-separation corresponds to a conditional independence statement implied by
µθ under the probabilistic notion of conditional independence, and visa versa. As we will
see, faithful systems have certain properties that are useful for our purposes. Fortunately,
almost all causal systems are faithful. Specifically, representing θ by a real vector, it can be
shown that the set of such vectors that fail the faithfulness condition for arbitrary G is of
Lebesgue measure zero (Meek, 1995).
4 Individual decisions under causal ambiguity
As we saw in the motivating example, knowledge of θ alone is not sufficient to predict the
consequences of an intervention; one must also know G. Thus, in addition to assessing the
random elements of his or her environment, an agent must also form theories about the deeper
but unknown causal structure that drives it. The agent’s decision problem is to choose an
intervention to maximize his expected payoff given his beliefs with respect to the true causal
system governing his environment. To proceed, we must formalize what is meant by an
“intervention,” and quantify “causal ambiguity.” Once done, it is then possible to calculate
the agent’s subjective expectations with respect to each of the available interventions.
In addition, in order to say something about the kinds of behavior expected in situations
like this, some notion of equilibrium is required. The one introduced here, causal equilibrium
(hereafter, CE), is a special case of Kalai and Lehrer’s (1993) subjective Nash equilibrium
(see also Ryall, 2003). CE requires that the agent’s intervention be optimal given his beliefs
and that the resulting empirical distribution on outcomes be consistent with those beliefs.
The idea is that a decision is stable only when the empirical distribution does not contradict
the beliefs that led to the decision in the first place. One interpretation of CE is that it is
11
what one might observe after an initial (unmodelled) period of experimentation and learning.
4.1 Interventions
In order to keep things simple, assume that presented with a causal system (G, θ) and payoff
function π, an intervention consists of fixing one environmental variable.7 Specifically, the
agent has the power to choose one xi ∈ V and set it to xi = x̂i for any x̂i ∈ Xi. Let (xi ⇒ x̂i)
indicate the intervention that sets xi to x̂i; the “do-nothing” intervention is denoted ∅. The
set of all possible interventions is labelled I with �x indicating a generic intervention.
Making an intervention changes the environmental parameters. That is, given θ, the
intervention (xi ⇒ x̂i) has the effect of adjusting θi such that, for all p̂i, θi (x̂′i|p̂i) = 1 when
x̂′i= x̂i and θi (x̂
′i|p̂i) = 0 otherwise. In other words, (xi ⇒ x̂i) implies xi becomes a constant,
x̂i. An arbitrary intervention �x generates a new empirical distribution, denoted µθ|�x.
We wish to construct a graph that exhibits the same relationship to µθ|�x as G exhibits
to µθ. Making an intervention renders the manipulated variable independent of all other
variables. Thus, G�x is constructed by taking G and simply removing all edges involving xi.8
Denote this graph G�x. Let π̄θ|�x denote the expected payoff implied by �x under (G, θ) .
In the example of §2, if the underlying system is the one described by H1, then manage-
ment would like to implement the intervention (xE ⇒ b) , resulting in π̄B = 1. IfH2 describes
the true system, the optimal intervention is (xM ⇒ b) . Under H1, G is the graph xE → xM .
In this case, the graphs G(xE⇒b) and G(xM⇒b) are identical: two disconnect nodes, xE and
xM . The same is true under H2.
7The generalization that allows an agent to set an arbitrary number of variables is straightforward.8If interventions are noisy (that is, instead of making xi a constant, an intervention results in a new
distribution θ′
i), then G�x is constructed by removing only the edges between xi and its parents. All of the
following results hold in this case.
12
4.2 Causal beliefs
Assume the agent does not know the true causal system. When V is understood from the
context, let C be the set of all causal systems such that (G, θ) ∈ C only if G is a causal
structure on V. Note that while the number of causal structures on V is finite (because V
is finite), C is uncountably infinite. To represent the agents’ assessment regarding which
causal structure is the true one, let β = (β1, ..., βl) be multinomial probability distribu-
tion on a finite subset Cβ ⊂ C where βi indicates the agent’s belief that (G, θ)i ∈ Cβ
is the true system.9 In the example, Cβ = {(xE → xM , θ1) , (xM → xE, θ2)} where θ1 =
(θE (b) = .8, θM (c|c) = 1, θM (c|b) = 0) and θ2 = (θM (b) = .8, θE (b|b) = 1, θE (b|c) = 0) .
4.3 Subjective expected payoff
Given the set of environmental variables V and beliefs β, the expected payoff from allowing
the system to operate without intervention is
π̄β|∅ ≡∑
(G,θ)i∈Cβ
βi
∑
x̂∈X
µθ (x̂)π (x̂) .
Similarly, intervention �x under beliefs β imply the expected payoff
π̄β|�x ≡∑
(G,θ)i∈Cβ
βi
∑
x̂∈X
µθ|�x (x̂)π (x̂) .
Thus, the agent’s subjective best reply set to beliefs β is defined as
BRβ ≡{�x ∈ I|∀�x′ ∈ I, π̄θ|�x ≥ π̄θ|�x′
}.
In the example, (xE ⇒ b) implies π̄β|(xE⇒b) = .5 (1) + .5 (.4) = .7. Given these beliefs,
BRβ = ∅; that is, the unique best response is to do nothing.
9Note that, while beliefs could be defined as a probability measure on some measure space (C, C) with
a suitably defined σ-algebra C, the set of causal systems upon which the agent places positive weight in
equilibrium is finite. Hence, for the purposes of this paper, there is no gain in encumbering the analysis with
more complex belief structures.
13
4.4 Causal equilibrium
The formal definition of CE imposes a consistency condition on the consequences of an
agent’s actions with respect to his initial beliefs. The idea is that a subjectively optimal
intervention is stable when the frequency of observed outcomes is consistent with the agent’s
prior expectations.
Definition 1 Given a causal system (G, θ), a causal equilibrium is a pair (β, �x) such that
1. (Subjective optimization) �x ∈ BRβ;
2. (Empirical consistency) ∀ (G′, θ′) ∈ Cβ, µθ′|�x = µθ|�x.
The subjective optimization condition is self-explanatory. The empirical consistency
requirement is that, in equilibrium, outcomes occur with the expected probability. Equiv-
alently, the agent’s assessment of the probabilities with which outcomes are generated is
correct given his intervention decision. Notice that this degree of consistency does not imply
correct causal beliefs; that is, there may be �x′ ∈ I for which π̄θ|�x′ > π̄θ|�x. Hence, this notion
of equilibrium is weaker than Nash (which would require an accurate assessment of the true
causal system). If there is no causal ambiguity, then CE is equivalent to Nash.
Although the actual results generated in a CE are consistent with the results expected
by the agent, the potential for trouble arises from the fact that the agent’s counterfactual
predictions — that is, the predictions of what would have happened had the agent taken some
other course of action — are not observed. Therefore, an agent in a CE may experience
persistently suboptimal performance because he never observes the superior consequences
that would have obtained had the truly optimal decision been implemented.
In the example, the decision to make no intervention combined with 50/50 beliefs on H1
and H2 is a CE. It has already been shown that ∅ ∈ BRβ, thereby meeting requirement 1.
Given this (non-)intervention, management expects the distribution shown in the table on
page 6 and, indeed, this is the distribution generated by the no-intervention decision. Thus,
condition 2 is also met.
14
CE presents a nicely balanced form of “bounded rationality.” Agents optimize given their
beliefs (and are rational in the sense of being subjective profit maximizers) and are allowed
to believe anything they want provided their beliefs are not contradicted by observable data
(and, so, are also rational in the sense of not clinging to beliefs that are demonstrably wrong).
However, decisions may well be suboptimal (due to erroneous counterfactual beliefs). At the
same time, bad behavior is not dependant upon ad hoc assumptions invoking psychological
bias, hubris, non-profit objectives and the like.
4.5 A measure of causal ambiguity
As we have seen, a causal structure imposes certain constraints on the empirical distributions
it generates. Hence, given an empirical distribution, certain structures can generally be ruled
out. The important message of this section is that to say agents face causal ambiguity is
not to say “anything goes.” Rather, the degree of causal ambiguity permitted in equilibrium
fundamentally depends upon the true causal structure underlying the system.
In the example of §2, the two causal systems under consideration are empirically indis-
tinguishable; that is,no matter how long management observes the outcomes of unmanaged
departmental interactions, it is always impossible to distinguish H1 from H2. This is a gen-
eral consequence of the causal structure of these two hypotheses — as opposed to a special
outcome effected by careful choice of θ. That is, for any set of parameters under H1, there
exists a set of parameters under H2 that results in the same empirical distribution and visa
versa.
To see how this works, suppose µ is an arbitrary probability distribution on the outcomes
in the example. Then, simply by the definition of conditional probability, µ can be factored
as
∀x̂M ∈ XM , ∀x̂E ∈ XE µ (x̂M , x̂E) = µ (x̂M |x̂E)µ (x̂E) (2)
where µ (x̂M |x̂E) and µ (x̂E) are the µ-conditional probability of x̂M given x̂E and the µ-
marginal probability of x̂E, respectively. Of course, these numbers happen to correspond
directly to θ parameters that generate µ under H1. Note, however, that µ can also be
15
factored as (again, simply by the definition of conditional probability)
µ (x̂M , x̂E) = µ (x̂E|x̂M)µ (x̂M) . (3)
This is significant because the numbers on the right hand side of this equation correspond
to θ parameters consistent with H2. Thus, for any set of parameters θ that deliver µθ under
H1, there is a corresponding set of parameters θ′ that deliver the same empirical distribution
on outcomes under H2. An outsider observing only the undisturbed empirical distribution is
never able to infer which hypothesis describes the true causal system.
The hypothesis, say H3, that the two departments operate independently is not empiri-
cally indistinguishable from H1 or H2 since, if H3 were true, then for all θ,
µθ(x̂M , x̂E) = µ
θ(x̂E)µθ (x̂M) ,
which does not generally hold in (2) or (3). Formally, two causal structuresG andG′ on a set
of environmental variables V are said to be empirically indistinguishable, denoted G ≈ G′,
if, for all causal systems (G, θ) there exists a causal system (G′, θ′) such that µθ= µ
θ′ . Let
GG denote the equivalence class of causal structures that are empirically indistinguishable
from G. It is important to note that the composition of GG depends only upon G itself and
not, i.e., upon any particular θ.
Definition 2 The degree of causal ambiguity of a causal structure G is |GG| , the set car-
dinality of GG. G is said to be causally ambiguous if |GG| > 1.
Under this definition, a causal structure is ambiguous when it is never completely identi-
fied by the data it generates. When two or more causal structures imply identical conditional
independencies in every empirical distribution, there are no parameter values for one that
ever allows an outsider to distinguish it from the other (simply by observing the behavior of
the system).
Naturally, given a particular causal structure, we would like to know which additional
structures, if any, are contained in GG. For the following proposition, given a causal structure
G, let E be the set of edges without reference to direction; i.e., {xi, xj} ∈ E if and only
16
if (xi → xj) or (xj → xi) in G. Let S be the set of all ordered triples (xi, xj, xk) such that
(xi, xj, xk) ∈ S if and only if (xi → xj) , (xk → xj) and {xi, xk} /∈ E. That is, S is the
collection of triples that form v-structures like
xi xk
↘ ↙
xj
where it is important to note that xi and xk are not adjacent in the graph.
Proposition 1 Two causal structures G and G′ on a set of environmental variables V are
empirically indistinguishable if and only if E = E′ and S = S′.
Proof. See Appendix A.
This is a rather remarkable proposition providing, as it does, a test for causal ambiguity
based upon simple visual inspection. Returning to the example, the two hypothesis H1
and H2 have causal structures G1 : xE → xM and G2 : xM → xE, respectively. First, we
have E1 = {{xE, xM}} and E2 = {{xE, xM}} , so E1 = E2. Neither G1 or G2 have any
v-structures, so S1 = S2 = ∅. Thus, G1 ∈ GG2implying that both G1 and G2 are causally
ambiguous. The hypothesis that the departments do not interact, H3, is represented by a
disconnected graph, G3, with nodes xE and xM . Since E = ∅, G3 is not ambiguous.
Consider a slightly more complicated example. Suppose a firm is comprised of five de-
partments: Engineering, Design, Manufacturing, Marketing and Sales. Assume the structure
of departmental interaction is Engineering and Design independently feed ideas to Manu-
facturing which implements new products on the basis of their input. The new product
specs are then delivered by Manufacturing to Sales and Marketing which choose sales and
marketing programs, respectively, designed to maximize profit. The causal structure is
xE xD
↘ ↙
xMan
↙ ↘
xS xMar
(4)
17
In this case, the causal structure is not ambiguous. To see this, note first that any viable
candidate for empirical indistinguishability must have the same edges. The only dimension of
flexibility is the direction of the arrows. However, switching the direction of any arrow either
breaks a v-structure (e.g., reversing xE → xMan) or creates a new one (e.g., reversing xMan →
xS). Therefore, there are no other causal structures in the empirical indistinguishability
equivalence class.
Proposition 2 Suppose (β, �x) is a CE under the true causal system (G, θ). Then, G′ ∈ Cβ
only if G′
�x ∈ GG�x(a.s.).
Proof. Given (G, θ) and �x, let θ|�x indicate the list of parameters θ adjusted for �x. Then,
as discussed above, (G�x, θ|�x) is almost surely faithful. By Spirtes et al. (2000, Theorem 4.3),
a graph G′
�x is faithful to µθ|�x if and only if it is empirically indistinguishable from G�x. The
result then follows from condition (2) of CE.
Corollary 1 Suppose (β, �x) is a CE under the true causal system (G, θ). Then, the maximal
set of causal systems that receive positive weight under β is finite and can be generated by
exhaustive application of Proposition 1 on G�x.
These propositions demonstrate that agents may indeed “get it wrong” in equilibrium
and choose suboptimal interventions only as a result of causal ambiguity. On the other hand,
regardless of the situation, the extent of this type of ambiguity has its limit depending upon
the true underlying structure. Also, note well that the set of systems receiving positive
weight under �x are those empirically indistinguishable from G�x, not G; G′
�x∈ GG�x
does not,
in general, imply G′ ∈ GG.
For example, suppose the true causal structure is the one shown in (4). Then, the systems
18
that are empirically indistinguishable given an intervention on xD are as follows.
xE 1 xD
↘
xMan
↙ ↘
xS xMar
xE 2 xD
↘
xMan
↙ ↘
xS xMar
xE 3 xD
↖
xMan
↗ ↘
xS xMar
xE 4 xD
↖
xMan
↙ ↖
xS xMar
There are twelve systems that become empirically indistinguishable under an intervention
on xD. They are the four detailed above plus the additional eight generated by making a
directed link between xMan and xD in one direction or the other.
Up to now, no mention has been made of temporal knowledge (i.e., regarding the timing
of one variable vis-à-vis another). The previous results obtain when an agent has cross-
sectional data on the behavior of the system. If the agent has additional knowledge about
the timing of events, these results can be strengthened considerably. For example, Pearl
(1988) shows that if the variables in V are completely ordered by the timing of their relative
occurrences, then the underlying causal structure is uniquely identified by µθ.
Finally, note well that the preceding results are related to condition (2) in the definition of
CE. However, consideration of subjective optimization condition (1) may refine equilibrium
even further. For example, the extreme case is one in which the agent has a dominant
intervention regardless of the causal structure; i.e., the minimum payoff guaranteed under
(xi ⇒ x̂i) is greater than the maximum attainable under any other intervention (regardless
of θ). In such a case, the agent will implement the dominant intervention regardless of what
she thinks about the underlying causal structure.
As mentioned earlier, the only paper dealing explicitly with causal ambiguity and its
relationship to sustained performance advantage is Reed and DeFillippi (1990). The authors
19
argue, among other things, that causal ambiguity is exponentially increasing in complexity.
They say (p. 93), “Three separate skills can have up to 4 interactions, 4 skills can have 10
interactions, 5 can have 19, and so forth. Even if all potential interactions do not occur,
the ambiguity that is derived from the complxity of interaction is still likely to increase at a
greater than arithmetic rate.” Because the analysis is informal, it is hard to say how, exactly,
their model relates to this one. From the previous statement, however, what they seem to
have in mind as a measure of complexity is the number of nodes in the causal structure G.
The preceding results demonstrate the danger of this type of intuitive reasoning. Ac-
cording to the theory presented here, causal ambiguity is monotonic neither in the number
of nodes nor in the number of linkages between nodes. For example the structure x1 → x2 is
indistinguishable from x1 ← x2 but the structure x1 → x2 ← x3 is unambiguous. The catch
in the previous logic is that sometimes additional structure (and, by implication, complexity)
actually helps one discern the true underlying causal architecture.
5 Causal ambiguity and industry performance
I now apply the theory developed above to a competitive situation involving multiple firms.
In order to keep things concrete, the extension is done in the context of a specific application
along the lines of the example in §2.
5.1 Setup
Assume there are n firms involved in Cournot-style quantity competition indexed by N ≡
{1, ..., n}. In order to isolate the effects of causal ambiguity, assume that the firms control
homogeneous technologies. Specifically, each firm is composed of k departments. The joint
activities of the departments determine the firms’ marginal costs. Extending the notation
developed in the previous section to allow for multiple agents, let xij indicate the activity
of department j in firm i, which takes values in Xj (these are the same for all firms, so no
additional subscript is necessary). The set of joint departmental outcomes is X. An actual
outcome for firm i is denoted x̂i = (x̂i1, ..., x̂ik) ∈ X and, now, x̂ = (x̂1, ..., x̂n) denotes a
20
profile of outcomes for the entire industry.
Firm i’s marginal cost (constant with respect to quantity) is a function of what its
departments do: c (x̂i) indicates the marginal cost of firm i when the joint activities of its
departments is x̂i. Note that the function c is identical for all firms. The generation of costs
is governed by a true causal system (G, θ) which, since technologies are homogeneous, is also
identical for all firms.
With multiple agents, let �x ≡ (�x1, ..., �xn) with �xi = (xij ⇒ x̂ij) indicating an intervention
by firm i. Also, since there are multiple firms, let β = (β1, ..., βn) indicate a profile of beliefs
with βi = (βi1, ..., βim) the vector of weights placed by agent i on the set of m causal systems
she thinks may be true.
The industry is modelled in two stages. In the first stage, firms simultaneously choose
an intervention, after which a vector of actual costs, c (x̂) = (c (x̂1) , ..., c (x̂n)) is determined
according to the distributions(µθ|�x1, ..., µθ|�xn
). In the second stage, firms observe c (x̂) and
then compete in standard Cournot style. Assume inverse demand is given by λ (q) ≡ α −∑n
i=1 qi where α is a demand parameter (the usual restrictions apply).
When x̂ is the outcome in the first stage, q (x̂) ≡ (q1 (x̂) , ..., qn (x̂)) denotes the Cournot
Nash equilibrium quantities chosen in the second stage (given costs c (x̂)). The actual profit
of firm i is πi (x̂) ≡ (λ (q (x̂))− c (x̂i)) qi (x̂) . Thus, given a profile of interventions, �x, in the
first stage and assuming Nash Cournot equilibrium in the second, it is possible to calculate
the expected profit to firm i as π̄i (�x) ≡∑
x̂∈X πi (x̂)µθ|�x1 (x̂1) . . . µθ|�xn (x̂n) .
Notice that, although the intervention of one firm affects the expected profit of another,
the dominant strategy for all firms in the first stage is to pick the intervention that delivers
the lowest expected cost (i.e., regardless of what the other agents do). Intervention �xi under
beliefs βi implies an expected cost of
c̄βi|�xi ≡∑
(G,θ)∈Cβi
βi (G, θ)∑
x̂i∈X
µθ|�xi(x̂i) c (x̂i) .
Hence, firm i’s subjective best reply set to beliefs βi is defined as
BRβi≡
{�xi ∈ I|∀�x′
i ∈ I, c̄βi|�xi≤ c̄βi|�x
′
i
}.
21
The application of causal equilibrium to this setup is now straightforward: firms choose a
CE in their own first-stage decision problem and Cournot quantities in the second stage.
When there is no causal ambiguity (i.e., agents place full weight on the true causal system),
Cournot Nash equilibrium obtains.
5.2 Causal ambiguity and performance heterogeneity
This section begins with a simple example that demonstrates how causal ambiguity may,
indeed, lead to performance heterogeneity and concludes with a more general set of propo-
sitions. Suppose there are only two firms. Both face the technology described in §2; i.e.,
there are two departments, E and M, each of which have outcomes of b or c. Assume the
true causal system is xE → xM . Now, instead of profits, assume that the joint output of the
departments implies the following marginal costs:
E M $cost
b b 1
b c 4
c b 4
c c 2
The actual expected costs under each of the available interventions are:
Intervention c̄ (x̂ij)
∅ 1.2
(xE ⇒ c) 2.0
(xE ⇒ b) 1.0
(xM ⇒ c) 3.6
(xM ⇒ b) 1.6
Thus, if both firms know the true causal structure, they both choose the optimal intervention,
(xE ⇒ b) , resulting in identical average costs of 1.0.
Now, suppose firm 1 places equal weight on xE → xM and xE ← xM , respectively, while
firm 2 is certain that xE → xM is the true structure. Under these beliefs, each firm’s expected
22
cost under each intervention is
Intervention Firm 1 Firm 2
∅ 1.2 1.2
(xE ⇒ c) 2.8 2.0
(xE ⇒ b) 1.3 1.0
(xM ⇒ c) 2.8 3.6
(xM ⇒ b) 1.3 1.6
The optimal interventions are for firm 1 to do nothing and for firm 2 to set xE to b. Note
that this is an equilibrium: under the do-nothing intervention, firm 1 observes its expected
distribution over outcomes and obtains the anticipated average cost. The same is true for
firm 2.
Noting that the expected Nash equilibrium quantity for firm i is given by
q̄∗i =α− 2c̄i + c̄j
3,
firm 1 has expected output of q̄∗1= 0.87 and firm 2 has q̄∗
2= 1.07; the corresponding market
shares are s1 = 44.8% and s1 = 55.2%. Expected profits are π̄∗
1= 1.04 and π̄∗
2= 1.07,
respectively (a 2% advantage for firm 2). Thus, in this circumstance, causal ambiguity
results in performance heterogeneity within the industry — even though the firms’ actual
technologies are identical.
The full characterization of equilibria in this example is as follows. Let βibe the weight
placed by firm i on xE → xM (the true hypothesis). Then, in equilibrium, firm i either
chooses: 1) the objectively optimal intervention (xE ⇒ b) with βi= 1, or 2) the no-
intervention option (∅) with belief βi∈ [.33, .67] (i.e., any belief in this range supports
the no intervention decision). Thus, as we already know from Proposition 2, the amount of
performance heterogeneity observed in equilibrium is limited by the true causal structure.
For example, there are no equilibrium beliefs that support the intervention (xM ⇒ b).
While causal ambiguity is necessary for performance heterogeneity, it is not sufficient.
The range of possible causal equilibria depends upon beliefs, causal structure and payoffs.
For example, even with causal ambiguity, if firms have sufficiently similar beliefs, they choose
23
interventions that result in identical average costs (e.g., this would have been the case if firm
2 above had β2∈ [.33, .67]). Alternatively, with costs of
E M $cost
b b 1
b c 1
c b 4
c c 4
the unique equilibrium is for firms to place full weight on xE → xM and to choose (xE ⇒ b) .10
To generalize this result, recall that the Herfindahl index of market concentration is
defined as H ≡∑
i∈N s2i , where si is the market share of firm i. H reflects both the number
of firms and their relative sizes. It can be shown that H = nσ2 + 1
nwhere σ2 is the variance
in firms’ market shares. Hence, when n is fixed, H is increasing in the inequality of these
shares. Let σ2
�x denote the variance in firms’ expected market shares and H�x ≡ nσ2
�x +1
nthe
expected Herfindahl index, respectively, under �x. If all firms have equal expected market
shares, then σ2
�x = 0 and H�x = 1
n. Recall that, in Cournot competition, market shares are
positively correlated with relative profitability.
We also require a formal measure of the dispersion of causal beliefs within the industry.
Given industry beliefs β, let m be the number of causal systems receiving positive weight by
at least one agent (i.e., m is the set cardinality of the union of the Cβi’s). Agent i’s beliefs
are then represented by a vector βi =(β1
i , ..., βmi
). The variance of beliefs in the industry is
given by σ2
β ≡ 1
n
∑i∈N
∥∥βi − β̄∥∥2 , where
∥∥βi − β̄∥∥ is the Euclidean distance between βi and
the mean belief β̄.11
Proposition 3 If (β, �x) is a CE with H�x >1
n, then σ2
β > 0.
10To put it differently, the only optimal intervention in this case, regardless of one’s belief is (xE ⇒ b) .
However, the only causal system consistent with the empirical distribution implied by (xE ⇒ b) is xE → xM
and, hence, the agents choosing this intervention must, in equilibrium, know the true nature of the causal
structure governing their environment.
11The mean belief is β̄ =
(β̄1, ..., β̄
m)
where β̄l≡
1
n
∑i∈N βli. The Euclidean distance is
∥∥βi − β̄∥∥ =√(
β1i − β̄1)2+ · · ·+
(β1i − β̄
m)2. Hence, σ2β =
1
n
∑ni=1
∑ml=1
(βli − β̄
l)2
.
24
Proof. First, note that, for all i ∈ N and all �xi, �x′
i ∈ BRβi, c̄βi|�xi
= c̄βi|�x′
i. Now, suppose
H�x > 1
nand σ2
β = 0. Since σ2
β = 0, for all i, j ∈ N, βi = βj. This implies, for all i, j ∈ N,
BRβi= BRβj
. Thus, since (β, �x) is a CME, for all i, j ∈ N, c̄βi|�xi= c̄βj |�xj
. But, this implies
equal expected market shares and, as a result, σ2
�x = 0. Hence, H�x = 1
n, a contradiction.
Proposition 4 If (β, �x) is a CE with σ2
β > 0, then H�x ≥ 1
n.
Proof. The preceding example demonstrates there exist CME in which σ2
β > 0 and
H�x > 1
nas well as those in which σ2
β > 0 and H�x = 1
n(e.g., firms 1 and 2 have beliefs
β1= .33 and β
2= .67, respectively, and both choose �xi = ∅).
Proposition 5 If (β, �x) is a CE with H�x > 1
n, then there exist i ∈ N such that G�xi
is
causally ambiguous.
Proof. Assume the premise with∣∣∣GG�xi
∣∣∣ = 1 for all i ∈ N. Then, for all i ∈ N, βi (G, θ) =
1, implying σ2
β = 0. Hence, H�x =1
n, a contradiction.
These propositions, taken together, make several important points about the relation-
ship between causal ambiguity and firm performance heterogeneity. Proposition 3 says that
heterogeneous causal beliefs are necessary for performancy heterogeneity. It is not enough
for firms to be uncertain about the causal mechanics of their environment — their theories
about these mechanics must be sufficiently heterogeneous that the firms implement different
interventions. However, Proposition 4 says that heterogeneous beliefs are not, in and of
themselves, sufficient to induce performance heterogeneity. In any situation, there may be a
large number of causal systems all of which imply the optimality of the same intervention.
Proposition 5 makes it clear that performance heterogeneity is not possible without causal
ambiguity
This last point is, obviously, the important one. On the one hand, the previous analysis
presents a formal validation of the oft-repeated claim that causal ambiguity can be a key
determinant in the performance heterogeneity of an industry (even, as shown here, when
firm technologies are identical). On the other hand, as we know from Proposition 2 and its
corollary, not only are some causal systems more ambiguous than others, but the range of
possibilities can be precisely established using relatively simple graphic analysis.
25
6 Caveats, extensions, etc.
The preceding analysis establishes the possibility of equilibria in which causal ambiguity is
a determining factor of firm performance heterogeneity. In some ways, the underlying as-
sumptions serve to strengthen their implications. In particular, the finding that performance
heterogeneity may obtain even when all firms have identical technologies is striking. This
suggests that this phenomenon may be more prevalent in the real world; i.e., in which firm
technologies and the constraints to managerial intervention are varied.
On the other hand, it seems highly unlikely that, upon observing the performance of direct
competitors known to have similar technologies, high-cost firms would be able to maintain
the fiction that they are objectively optimizing. Even if costs were private information, firms
might observe each other’s market shares and, thereby, have some better sense of the accuracy
of their beliefs. Public information on the performance of one’s competitors may work against
the stability of heterogeneous causal beliefs; i.e., as long as someone else is outperforming
me, I know I have not discovered the optimal intervention. Of course, experimentation
is expensive. Thus, it may be that diverse initial priors under causal ambiguity leads to
intervention paths that eventually stabilize with some firms out-performing others due to
these inherent experimentation costs.
This suggests that a useful extension is to make the model dynamic. Such an extension
would involve a greater degree of mathematical complexity, but the resulting insights might
well warrant it. It would be nice to know, for example, the extent to which public infor-
mation in a competitive setting causes beliefs to converge and, thereby, the extent to which
causal ambiguity-based performance advantage survives in dynamic markets. Similarly, in a
dynamic setting, causal learning could also be explored. Characterizing optimal experimen-
tation policies would be a component of such exploration. By embedding a specific learning
process in a dynamic version of the model, it may be possible to say something precise about
how and when CE are reached.
A related point is that, in this model, firm interventions are not strategic. That is, firms
always have the incentive to choose an intervention that minimizes marginal production cost
26
without regard to what everyone else does. However, in a situation in which firms observe
each other and then have an opportunity to react, it may behoove some firms, for example,
to knowingly choose a somewhat suboptimal intervention in order to sow confusion amongst
their competitors (i.e., make ambiguous interventions). Although the paper does not tackle
this issue, it does present a framework by which such questions can be answered.
Finally, it should be pointed out that the literature on probabilistic networks is exten-
sive and includes numerous contributions on estimating the underlying causal system from
empirical data (several good initial references are presented in Appendix A). This raises
the possibility of investigating the relationship between causal ambiguity and performance
heterogeneity via empirical methods.
27
A Dependency models
The following is a condensed discussion of the relevant underlying theory of probabilistic
networks adapted from Penalva and Ryall (2003). Since most strategy researchers are unfa-
miliar with this literature, the objective here is to: (i) give readers a sense of its theoretical
content, and (ii) provide sufficient technical detail to support the preceding discussion. For
those interested in pursuing these ideas further, I suggest starting with the texts by Cowell
et al. (1999), Pearl (1988, 2000) and Spirtes et al. (2000).
Definition 3 A dependency model M over a finite set of elements V is a collection of
independence statements of the form (C ⊥ D|E) in which C,D and E are disjoint subsets
of V and which is read “C is independent of D given E.” The negation of an independency
is called a dependency.
The notion of a general dependency model was originated by Pearl and Paz (1985),
who were motivated to develop a set of axiomatic conditions on general dependency models
that would include probabilistic and graphical dependencies as special cases. These axioms
are known as the graphoid axioms.12 We are interested in graphoids, which are defined as
dependency models that are closed under the graphoid axioms.
For example, given a probability space (M,M, µ) and an associated, finite set of random
variables X indexed by V = {1, ..., t} with typical element x̃r, Mµ is the list of conditional
independencies that hold under µ. For all W ⊆ V, let x̃W ≡ (x̃r)r∈W . Then, for all disjoint
C,D,E ⊂ V, (C ⊥ D|E) ∈ Mµ if and only if x̃C is µ-conditionally independent of x̃D given
x̃E. A proof that the graphoid axioms hold for conditional independence in all probability
distributions can be found in Spohn (1980).
Alternatively, if G is a graph whose vertices are V , then for all disjoint C,D,E ⊂ V,
(C ⊥ D|E) ∈ MG if and only if E is a cutset separating C from D. Of course, in this
case, the meaning of (C ⊥ D|E) depends upon how one defines “cutset.” The literature on
probabilistic networks contains several such definitions, depending upon whether the graph
is undirected, directed or some mixture of the two (i.e., a chain graph). We proceed with
Pearl’s (1986) notion of d-separation (the d stands for “directed”).
28
Given a DAG G and a path (ordered set of nodes) W ⊆ V , a node xr ∈ W is called
head-to-head with respect to W if xr−1 → xr and xr ← xr+1 inW . A node that starts or ends
a path is not head-to-head. A path W ⊂ V is active by E ⊂ V if: (i) every head-to-head
node is in or has a descendant in E, and (ii) every other node in W is outside E. Otherwise,
W is said to be blocked by E.
Definition 4 If G is a DAG and C,D and E are disjoint subsets of V, then E is said to
d-separate C from D if and only if there exists no active path by E between a node in C and
a node in D.
Examples of d-separation can be found in the Pearl references cited above. Thus, given a
DAG G we define MG such that (C ⊥ D|E) ∈ MG if and only if E d-separates C from D
in G.
We wish to characterize the relationship between probabilistic and graphical dependency
models. This is done through the general notion of an independence map (or, I-map).
Definition 5 An I-map of a dependency model M is any model M ′ such that M ′ ⊆ M.
Given a probability space (M,M, µ) and an associated, finite set of random variables
V ≡ {x̃1, ..., x̃t} , the task of constructing a DAG G such that MG is an I-map of Mµ is
straightforward (see Geiger et al., 1990, p. 514). First, for all xr ∈ V, let Ur ≡ {1, ..., r − 1}
index the predecessors of x̃r according to V. Next, identify a minimal set of predecessors Pr ⊂
V such that ({r} ⊥ Ur\Pr|Pr)µ where the “µ” subscript indicates probabilistic independence
under µ. This results in a set of t independence statements known as a recursive basis drawn
from Mµ and denoted Eµ. Now, construct G such that xs → xr if and only if xs ∈ Pr. The
resulting graph G, a DAG, is said to be generated by Eµ and Pr = {xs ∈ V |xs → xr} is the
set of parents of xr in G.
The following theorems are from Geiger et al. (1990, Theorems 1 and 2). First, an
independence statement (C ⊥ D|E) is a semantic consequence (with respect to a class of
dependency models M — e.g., those that satisfy the graphoid axioms) of a set E of such
statements if (C ⊥ D|E) holds in every dependency model that satisfies E; i.e., (C ⊥ D|E) ∈
M for all M such that E ⊆ M ∈ M.
29
Theorem 2 (soundness) If M is a graphoid and E is any recursive basis drawn from M,
then the DAG generated by E is an I-map of M.
So, given (M,M, µ), the DAG G constructed in the fashion outlined above is an I-map
of Mµ. That is, every independence statement implied by G under d -separation corresponds
to a valid µ-conditional independency.
Theorem 3 (closure) Let D be a DAG generated by a recursive basis E. Then MD, the
dependency model generated by D, is exactly the closure of E under the graphoid axioms.
Two DAGs G and G′ are said to be empirically indistinguishable if every probability dis-
tribution that can be factored in accordance with the recursive basisEG ≡ {({r} ⊥ Ur\Pr|Pr) |r ∈ V }
can also be factored in accordance with EG′ ≡ {({r} ⊥ U ′
r\P ′
r|P ′
r) |r ∈ V } . The original
Theorem on empirical indistinguishability was by Verma and Pearl (1990,Theorem 1) and
is generalized by Andersson et al. (1997, Theorems B.1 and 2.1). The variation applied in
Proposition (1) for faithful indistinguishability is from Spirtes, et al. (2000, Theorem 4.2).
30
References
[1] Abreu, D., D. Pearce, , E. Stacchetti, 1990. Toward a theory of discounted repeated
games with imperfect monitoring. Econometrica, 58(5), 1041-63.
[2] Adner, R., and P. Zemsky, 2003. A demand-based view of sustainable competitive ad-
vantage: The evolution of substitution threats, resource rents and competitive positions.
Unpublished working paper. Insead.
[3] Andersson, S.A., D. Madigan, and M. D. Perlmann, 1997. A characterization of markov
equivalence classes for acyclic digraphs. The Annals of Statistics, 25(2), 505-541.
[4] Brandenburger, A., and H. W. Stuart, (1996). Value-based business strategy. Journal
of Economics and Management Strategy, 5: 5-24.
[5] Battigalli, P., and D. Guatoli, 1988. Conjectural equilibria and rationalizability in a
macroeconomic game with incomplete information. Instituto Economia Politica, Milan.
[6] Bollobas, B., 1998. Modern Graph Theory. Springer, New York.
[7] Cowell, R. G., A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter, 1999. Probabilistic
Networks and Expert Systems. Springer, New York.
[8] Fudenberg, D., and D. K. Levine, 1993. Self-confirming equilibrium. Econometrica, 61,
523-45.
[9] Geiger, D., T. S. Verma, and J. Pearl, 1990. Identifying independence in Bayesian
networks. Networks, 20, 507-34.
[10] Kalai, E., and E. Lehrer, 1993. Subjective equilibrium in repeated games. Econometrica,
61, 1231-1240.
[11] Korb, K. B., and A. E. Nicholson, 2004. Bayesian Artificial Intelligence. Chapman and
Hall/CRC, Boca Raton.
31
[12] Lippman, S. A., and R. P. Rumelt, 1982. Uncertain imitability: an analysis of inter-firm
differences in efficiency under competition. Bell Journal of Economics, 13(3):418-38.
[13] MacDonald, G., and M. D. Ryall, 2003. Does value lurk in the shadows? New methods
for the identification and evaluation of strategic initiatives. Unpublished working paper,
Washington University.
[14] ––, 2004. How do value creation and competition determine whether a firm appropri-
ates value? Management Science (forthcoming).
[15] Makadok, R., and J. Barney, 2001. Strategic factor market intelligence: An application
of information economics to strategy formulation and cmpetitor intelligence. Manage-
ment Science, 47 (12): 1621-38.
[16] Moody, F., 1995. I Sing the Body Electronic: A Year with Microsoft on the Multimedia
Frontier. Penguin Books, New York.
[17] Pearl, J., 1986. Fusion, propagation and structuring in belief networks. Artificial Intel-
ligence 29, 241-88.
[18] ––, 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Infer-
ence. North Holland, Amsterdam.
[19] ––, 2000. Causality: Models, Reasoning, and Inference. Cambridge University Press,
Cambridge.
[20] ––, and Paz 1985. Graphoids: A graph-based logic for reasoning about relevance rela-
tions. Technical report 850038 (R-53-L), Cognitive Systems Laboratory, UCLA. Short
version in Advances in artificial intelligence 2, ed. Du Boulay, B., Hogg, D., Steels, L.
North Holland, Amsterdam.
[21] Penalva, J., and M. D. Ryall, 2003. Causal assessment in finite-length extensive-form
games. Working paper, University of Rochester.
32
[22] Reed, R., and R. J. DeFillippi (1990). Causal ambiguity, barriers to imitation, and
sustainable competitive advantage. Academy of Management Review, 15(1): 88-102.
[23] Ryall, M. D., 2003. Subjective rationality, self-confirming equilibrium and corporate
strategy. Management Science, 49(7): 936-49.
[24] Sorenson, O., and D. Waguespack, 2003. Social networks and exchange: Self-confirming
dynamics in Hollywood. Working paper, UCLA
[25] Spirtes, P., C. Glymour, and R. Scheines, 2000. Causation, Prediction and Search. The
MIT Press, Cambridge.
[26] Spohn, W., 1980. Stochastic independence, causal independence and shieldability. J.
Phil. Logic, 9, 73-99.
[27] Tversky, A., and D. Kahneman, 1981. The framing of decisions and the psychology of
choice. Science, 211(4481): 453-58.
[28] Verma, T. S., Pearl, J., 1990. Equivalence and synthesis of causal models, in: Pro-
ceedings of the 6th Conference on Uncertainty in Artificial Intelligence. Cambridge, pp.
220-7. Reprinted in: Bonissone, P., Henrion, M., Kanal, L. N., Lemmer, J. F. (Eds.),
Uncertainty in Artificial Intelligence, vol. 6, 255-68.
33