Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Almost Optimal Universal Lower Bound for LearningCausal DAGs with Atomic Interventions
Vibhor Porwalβ Piyush Srivastavaβ Gaurav Sinhaβ‘
Abstract
A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG)is that using observational data, one can only learn the graph up to a βMarkov equivalence classβ (MEC). Theremaining undirected edges have to be oriented using interventions, which can be very expensive to performin applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEChas received a lot of recent attention, and is also the focus of this work. We prove two main results. The rstis a new universal lower bound on the number of atomic interventions that any algorithm (whether active orpassive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, infact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Ourlower bound is provably better than previously known lower bounds. The proof of our lower bound is basedon the new notion of clique-block shared-parents (CBSP) orderings, which are topological orderings of DAGswithout v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs andby giving examples of special graph families, we show that our bound is often signicantly better.
βAdobe Research, Bangalore. [email protected]β Tata Institute of Fundamental Research, Mumbai. [email protected]β‘Adobe Research, Bangalore. [email protected] acknowledges support from the Department of Atomic Energy, Government of India, under project no. RTI4001, from the Ramanujan
Fellowship of SERB, from the Infosys foundation, through its support for the Infosys-Chandrasekharan virtual center for Random Geometry,and from Adobe Systems Incorporated via a gift to TIFR. The contents of this paper do not necessarily reect the views of the fundingagencies listed above.
1
arX
iv:2
111.
0507
0v2
[cs
.LG
] 2
3 N
ov 2
021
Contents1 Introduction 3
1.1 Our Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Preliminaries 4
3 Universal Lower Bound 63.1 Tightness of Universal Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83.2 Comparison with Known Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4 Empirical Explorations 10
5 Conclusion 11
A Proof of Lemma 3.4 of the Main Paper 14
B Other Omitted Proofs 16B.1 Proof of Observation 3.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16B.2 Proof of Proposition 3.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16B.3 Proof of Theorem 3.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16B.4 Proof of Theorem 3.9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17B.5 Proof of Theorem 3.11 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18B.6 Proof of Lemma 3.12 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
C Various Example Graphs 18
D Details of Experimental Setup 19
E Proof of Remark B.1 21
F Proof of Corollary B.2 22
2
1 IntroductionCausal Bayesian Networks (CBN) provide a very convenient framework for modeling causal relationships betweena collection of random variables (Pearl, 2009). A CBN is fully specied by (a) a directed acyclic graph (DAG),whose nodes model random variables of interest, and whose edges depict immediate causal relationships betweenthe nodes, and (b) a conditional probability distribution (CPD) of each variable given its parent variables (in theDAG) such that the joint distribution of all variables factorizes as a product of these conditionals. The generalityof the framework has led to CBN becoming a popular tool for the modeling of causal relationships in a varietyof elds, with health science (Shen et al., 2020), molecular cell biology (Friedman, 2004), and computationaladvertising (Bottou et al., 2013) being a few examples.
It is well known that the underlying DAG of a CBN is not uniquely determined by the joint distribution ofits nodes. In fact, the joint distribution only determines the DAG up to its Markov Equivalence Class (MEC),which is represented as a partially directed graph with well-dened combinatorial properties (Verma and Pearl,1990; Chickering, 1995; Meek, 1995; Andersson et al., 1997). Information about which nodes are adjacent isencoded in the MEC, but the direction of several edges remains undetermined. Thus, learning algorithms basedonly on the observed joint distribution (Glymour et al., 2019) cannot direct these remaining edges. As a result,algorithms which use additional interventional distributions were developed (Squires et al. (2020) and referencestherein). In addition to the joint distribution, these algorithms also assume access to interventional distributionsgenerated as a result of randomizing some target vertices in the original CBN (a process called intervention) andthereby breaking their dependence on any of their ancestors. A natural and well motivated (Eberhardt et al., 2005)question therefore is to nd the minimum number of interventions required to fully resolve the orientations ofthe undirected edges in the MEC.
Interventions, especially on a large set of nodes, however, can be expensive to perform (Kocaoglu et al., 2017).In this respect, the setting of atomic interventions, where each intervention is on a single node, is already veryinteresting and nding the smallest number of atomic interventions that can orient the MEC is well studied(Squires et al., 2020). A long line of work, including those cited above, has considered in various settings theproblem of designing methods for nding the smallest set of atomic interventions that would fully orient all edgesof a given MEC. An important distinction between such methods is whether they are active (He and Geng, 2008),i.e., where the directions obtained via the current intervention are available before one decides which furtherinterventions to perform; or passive, where all the interventions to be performed have to be specied beforehand.Methods can also dier in whether or not randomness is used in selecting the targets of the interventions. Animportant question, therefore, is to understand how many interventions must be performed by any given methodto fully orient an MEC.
Universal Lower Bounds: While several works have reported lower bounds (on minimum number of atomicinterventions required to orient an MEC) in dierent settings, a very satisfying solution concept for such lowerbounds, called universal lower bounds, was proposed by Squires et al. (2020). A universal lower bound of πΏ atomicinterventions for orienting a given MEC means that if a set of atomic interventions is of size less than πΏ, then forevery ground-truth DAG π· in the MEC, the set π will fail to fully orient the MEC. Thus, a universal lower boundhas two universality properties. First, the value of a universal lower bound depends only upon the MEC, andapplies to every DAG in the MEC. Second, the lower bound applies to every set of interventions that would fullyorient the MEC, without regards to the method by which the intervention set was produced.
In this work, we address the problem of obtaining tight universal lower bounds. The goal is to nd a universallower bound such that for any DAG π· in the MEC, the smallest set of atomic interventions that can orient theMEC into π· has size bounded above by a constant factor of the universal lower bound. Similar to Squires et al.(2020), we work in the setting of causally sucient models, i.e. there are no hidden confounders, selection bias orfeedback. To the best of our knowledge, this is the rst work that addresses the problem of tight (up to a constantfactor) universal lower bounds. We note that the best known universal lower bounds (Squires et al., 2020) so farare not tight and provide concrete examples of graph families that illustrate this in Section 3.2.
1.1 Our ContributionsWe prove a new universal lower bound on the size of any set of atomic interventions that can orient a given MEC,improving upon previous work (Squires et al., 2020). We further prove that our lower bound is optimal withina factor of 2 in the class of universal lower bounds: we show that for any DAG π· in the MEC, there is a set of
3
atomic interventions of size at most twice our lower bound, that would fully orient the MEC if the unknownground-truth DAG were π· .
We also compare our new lower bound with the one obtained previously by Squires et al. (2020). We proveanalytically that our lower bound is at least as good as the one given by Squires et al. (2020). We further giveexamples of graph classes where our bound is signicantly better (in fact, it is apparent from our proof that thegraphs in which the two lower bounds are close must have very special properties). We then supplement thesetheoretical ndings with simulation results comparing our lower bound with the βtrueβ optimal answer and withthe lower bound in previous work.
Our lower bound is based on elementary combinatorial arguments drawing upon the theory of chordal graphs,and centers around a notion of certain special topological orderings of DAGs without v-structures, which we callclique-block shared-parents (CBSP) orderings (Denition 3.2). This is in contrast to the earlier work of Squires et al.(2020), where they had to develop sophisticated notions of directed clique trees and residuals in order to provetheir lower bound. We expect that the notion of CBSP orderings may also be of interest in the design of optimalintervention sets.
1.2 Related WorkThe theoretical underpinning for many works dealing with the use of interventions for orienting an MECcan be said to be the notion of βinterventionalβ Markov equivalence (Hauser and BΓΌhlmann, 2012), which,roughly speaking, says that given a collection I of sets of targets for interventions, two DAGs π·1 and π·2 areI-Markov equivalent if and only if for all π β I, the DAGs obtained by removing from π·1 and π·2 the incomingedges of all vertices in π are in the same MEC (Hauser and BΓΌhlmann, 2012, Theorem 10). Thus, interventionshave the capability of distinguishing between DAGs in the same Markov Equivalence class, and in particular,βinterventionalβ Markov equivalence classes can be ner than MECs (Hauser and BΓΌhlmann, 2012, see alsoTheorem 2.1 below).
As described above, the problem of learning the orientations of a CBN using interventions has been studiedin a wide variety of settings. Lower bounds and algorithms for the problem have been obtained in the setting ofinterventions of arbitrary sizes and with various cost models (Eberhardt, 2008; Shanmugam et al., 2015; Kocaogluet al., 2017), in the setting when the underlying model is allowed to contain feedback loops (and is therefore not aCBN in the usual sense) (Hyttinen et al., 2013a,b), in settings where hidden variables are present (Addanki et al.,2020, 2021), and in interventional βsample eciencyβ settings (Agrawal et al., 2019; Greenewald et al., 2019). Therelated notion of orienting the maximum possible number of edges given a xed budget on the number or cost ofinterventions has also been studied (Hauser and BΓΌhlmann, 2014; Ghassami et al., 2018; AhmadiTeshnizi et al.,2020). However, to the best of our knowledge, the work of Squires et al. (2020) was the rst to isolate the notionof a universal lower bound, and prove a lower bound in that setting.
2 PreliminariesGraphs A partially directed graph (or just graph) πΊ = (π , πΈ) consists of a set π of nodes or vertices and a set πΈof adjacencies. Each adjacency in πΈ is of the form π β π or π β π, where π, π β π are distinct vertices, with thecondition that for any π, π β π , at most one of π β π, π β π and π β π is present in πΈ. 1 If there is an adjacencyin πΈ containing both π and π, then we say that π and π are adjacent in πΊ , or that there is an edge between π and πin πΊ . If π β π β πΈ, then we say that the edge between π and π in πΊ is undirected, while if π β π β πΈ then we saythat the edge between π and π is directed in πΊ . πΊ is said to be undirected if all its adjacencies are undirected, anddirected if all its adjacencies are directed. Given a directed graphπΊ , and a vertex π£ inπΊ , we denote by paπΊ (π£) theset of nodes π’ in πΊ such that π’ β π£ is present in πΊ . A vertex π£ in πΊ is said to be a child of π’ if π’ β paπΊ (π£). Aninduced subgraph ofπΊ is a graph whose vertices are some subset π of π , and whose adjacencies πΈ [π] are all thoseadjacencies in πΈ both of whose elements are in π . This induced subgraph ofπΊ is denoted asπΊ [π]. The skeleton ofπΊ , denoted skeleton (πΊ), is an undirected graph with nodes π and adjacencies π β π whenever π, π are adjacentin πΊ .
A cycle in a graph πΊ is a sequence of vertices π£1, π£2, π£3, . . . , π£π+1 = π£1 (with π β₯ 3) such that for each 1 β€ π β€ π,either π£π β π£π+1 or π£π β π£π+1 is present in πΈ. The length of the cycle is π, and the cycle is said to be simple ifπ£1, π£2, . . . , π£π are distinct. The cycle is said to have a chord if two non-consecutive vertices in the cycle are adjacentin πΊ , i.e., if there exist 1 β€ π < π β€ π such that π β π β Β±1 (mod π) and such that π£π and π£ π are adjacent in πΊ . The
1π β π and π β π are treated as equal.
4
π π
π
(i)
π
π
π
(ii)
π
π
π
(iii)
ππ1 π2
π
(iv)
Figure 1: Strong Protection (Andersson et al., 1997; Hauser and BΓΌhlmann, 2012)
cycle is said to be directed if for some 1 β€ π β€ π, π£π β π£π+1 is present in πΊ . A graph πΊ is said to be a chain graphif it has no directed cycles. The chain components of a chain graph πΊ are the connected components left afterremoving all the directed edges from πΊ . A directed acyclic graph or DAG is a directed graph without directedcycles. Note that both DAGs and undirected graphs are chain graphs. An undirected graphπΊ is said to be chordalif any simple cycle in πΊ of length at least 4 has a chord.
A clique πΆ in a graphπΊ = (π , πΈ) is a subset of nodes ofπΊ such that any two distinct π’ and π£ in πΆ are adjacentin πΊ . The clique πΆ is maximal if for all π£ β π \πΆ , the set πΆ βͺ {π£} is not a clique.
A perfect elimination ordering (PEO), π = (π£1, . . . , π£π) of a graph πΊ is an ordering of the nodes of πΊ such thatβπ β [π], πππΊ (π£π ) β© {π£1, . . . , π£πβ1} is a clique in πΊ , where πππΊ (π£π ) is the set of nodes adjacent to π£π .2 A graph ischordal if and only if it has a perfect elimination ordering (Blair and Peyton, 1993). A topological ordering, π of aDAG π· is an ordering of the nodes of π· such that π (π) < π (π) whenever π β paπ· (π), where π (π’) denotes theindex of π’ in π . We say that π· is oriented according to an ordering π to mean that π· has a topological ordering π .
A v-structure in a graphπΊ is an induced subgraph of the form π β π β π . It follows easily from the denitionsthat by orienting the edges of a chordal graph according to a perfect elimination ordering, we get a DAG withoutv-structures, and that the skeleton of a DAG without v-structures is chordal (see Proposition 1 of Hauser andBΓΌhlmann (2014)). In fact, any topological ordering of a DAG π· without v-structures is a perfect eliminationordering of skeleton (π·).
Interventions An intervention πΌ on a partially directed graph πΊ is specied as a subset of target vertices of πΊ .An intervention set is a set of interventions. In this paper, we make the standard assumption that the βemptyβintervention, in which no vertices are intervened upon, is always included in any intervention set we consider:this corresponds to assuming that information from purely observational data is always available (see, e.g., thediscussion surrounding Denition 6 of Hauser and BΓΌhlmann (2012)). The size of an intervention set I is thenumber of interventions in I, not counting the empty intervention.I is a set of atomic interventions if |πΌ | = 1 for all non-empty πΌ β I. With a slight abuse of notation, we denote
a set of atomic interventions I = {β , {π£1} , . . . , {π£π }} as just the set πΌ = {π£1, . . . , π£π } when it is clear from thecontext that we are talking about a set of atomic interventions. Given an intervention set I and a DAG π· , wedenote, following Hauser and BΓΌhlmann (2012), by EI (π·) the partially directed graph representing the set ofall DAGs that are I-Markov equivalent to π· . EI (π·) is also known as the I-essential graph of π· . For a formaldenition of I-Markov equivalence, we refer to Denitions 7 and 9 of Hauser and BΓΌhlmann (2012); we useinstead the following equivalent characterization developed in the same paper.
Theorem 2.1 (Characterization of I-essential graphs, Denition 14 and Theorem 18 of Hauser andBΓΌhlmann (2012)). Let π· be a DAG and I an intervention set. A graph π» is an I-essential graph of π· if and onlyif π» has the same skeleton as π· , all directed edges of π» are directed in the same direction as in π· , all v-structures ofπ· are directed in π» , and
1. π» is a chain graph with chordal chain components.
2. For any three vertices π, π, π of π» , the subgraph of π» induced by π, π and π is not π β π β π .
3. If π β π in π· (so that π, π are adjacent in π» ) and there is an intervention π½ β I such that |π½ β© {π, π}| = 1,then π β π is directed in π» .
4. Every directed edge π β π in π» is strongly I-protected. An edge π β π in π» is said to be strongly I-protectedif either (a) there is an intervention π½ β I such that |π½ β© {π, π}| = 1, or (b) at least one of the four graphs inFigure 1 appears as an induced subgraph ofπ» , and π β π appears in that induced subgraph in the congurationindicated in the gure.
2Our denition of a PEO uses the same ordering convention as Hauser and BΓΌhlmann (2014).
5
π
π π
π
π
π
A DAG π· without v-structures
π β² Β·Β·= π ποΈΈοΈ·οΈ·οΈΈπΏ1 (πβ²)
π π ποΈΈοΈ·οΈ·οΈΈπΏ2 (πβ²)
ποΈΈοΈ·οΈ·οΈΈπΏ3 (πβ²)
π Β·Β·= π ποΈΈοΈ·οΈ·οΈΈπΏ1 (π)
π π ποΈΈοΈ·οΈ·οΈΈπΏ2 (π)
ποΈΈοΈ·οΈ·οΈΈπΏ3 (π)
π Β·Β·= π ποΈΈοΈ·οΈ·οΈΈπΏ1 (π)
π ποΈΈοΈ·οΈ·οΈΈπΏ2 (π)
π ποΈΈοΈ·οΈ·οΈΈπΏ3 (π)
Figure 2: Clique-Block Shared-Parents Topological Orderings: π Satises both P1 and P2; π Satises only P1
3 Universal Lower BoundIn this section, we establish our main technical result (Theorem 3.3). Our new lower bound (Theorems 3.6 and 3.7)then follows easily from this combinatorial result, without having to resort to the sophisticated machinery ofresiduals and directed clique trees developed in previous work (Squires et al., 2020).
We begin with a denition that isolates two important properties of certain topological orderings of DAGswithout v-structures. Given a DAG π· without v-structures, and a maximal cliqueπΆ of skeleton (π·), we denote bysinkπ· (πΆ) any vertex inπ· such thatπΆ = paπ· (sinkπ· (πΆ))βͺ{sinkπ· (πΆ)}. The fact that sinkπ· (πΆ) is uniquely dened,and that sinkπ· (πΆ1) β sinkπ· (πΆ2) when πΆ1 and πΆ2 are distinct maximal cliques of skeleton (π·) is guaranteed bythe following observation. (The standard proof of this is deferred to Section B.1.)
Observation 3.1. Let π· be a DAG without v-structures. Then, for every maximal clique πΆ of skeleton (π·), thereis a unique vertex π£ of π· , denoted sinkπ· (πΆ), such that πΆ = paπ· (π£) βͺ {π£}. Further, for any two distinct maximalcliques πΆ1 and πΆ2 in skeleton (π·), we have sinkπ· (πΆ1) β sinkπ· (πΆ2).
We refer to each vertex π£ ofπ· that is equal to sinkπ· (πΆ) for some maximal clique of skeleton (π·) as amaximal-clique-sink vertex of the DAG π· . Note also that sinkπ· (πΆ) is also the unique node with out-degree 0 in the inducedsubgraph π· [πΆ].Denition 3.2 (Clique-Block Shared-Parents (CBSP) ordering). Let π be a topological ordering of a DAG π·without v-structures. Let π 1, π 2, π 3, . . . , π π be the maximal-clique-sink vertices of π· indexed so that π (π π ) < π (π π ) whenπ < π . (Here π is the number of maximal cliques in skeleton (π·).) Then, π is said to be a clique-block shared-parents(CBSP) ordering of π· if it satises the following two properties:
1. P1: Clique block property Dene πΏ1 (π) to be the set of nodes π’ which occur before or at the same position asπ 1 in π i.e., π (π’) β€ π (π 1). Similarly, for 2 β€ π β€ π , dene πΏπ (π) to be the set of nodes which occur in π beforeor at the same position as π π , but strictly after π πβ1 (i.e., π (π πβ1) < π (π’) β€ π (π π )). Then, for each 1 β€ π β€ π thesubgraph induced by πΏπ (π) in skeleton (π·) is a (not necessarily maximal) clique.
2. P2: Shared parents property If vertices π and π in π· are consecutive in π (i.e., π (π) = π (π) + 1), and also liein the same πΏπ (π) for some 1 β€ π β€ π , then all parents of π are also parents of π in π· .
We illustrate the denition with an example in Figure 2. In the gure, vertices π, π , and π are the maximal-clique-sink vertices of π· , and are highlighted with an underbar. The orderings π β², π and π in the gure are validtopological orderings of π· . However, π β² does not satisfy P1 of Denition 3.2 (since πΏ2 (π β²) is not a clique), while πsatises P1 of Denition 3.2, but does not satisfy P2, because π, π in πΏ2 (π) are consecutive in π , but π is a parentonly of π and not of π . Finally, π satises both P1 and P2 and hence is a CBSP ordering.
Our main technical result is that for any DAG π· that has no v-structures, there exists a CBSP ordering π of π· ,and the new lower bound is an easy corollary of this result. Further, the proof of this result uses only standardnotions from the theory of chordal graphs.
Theorem 3.3. If π· is a DAG without v-structures, then π· has a CBSP ordering.
Towards the proof of this theorem, we note rst that the existence of a topological ordering π satisfyingjust P1 can be established using ideas from the analysis of, e.g., the βmaximum cardinality searchβ algorithm forchordal graphs (Tarjan and Yannakakis (1984), see also Corollary 2 of WienΓΆbst et al. (2021)). We state this hereas a lemma, and provide the proof in Section A.
6
Lemma 3.4. If π· is a DAG without v-structures, then π· has a topological ordering π satisfying P1 of Denition 3.2.
We now prove the theorem.
Proof of Theorem 3.3. Let O be the set of topological orderings of π· which satisfy P1 of Denition 3.2. ByLemma 3.4, O is non-empty. If there is a π β O which also satises P2 of Denition 3.2, then we are done.
We now proceed to show by contradiction that such a π must indeed exist. So, suppose for the sake ofcontradiction that for each π in O, P2 is violated. Then, for each π β O, there exists vertices π, π and an index πsuch that π, π β πΏπ (π), π (π) = π (π) + 1, and there exists a parent of π in π· that is not a parent of π. For any givenπ β O, we choose π, π as above so that π (π) is as small as possible. With such a choice of π for each π β O, wethen dene a function π : O β [π β 1] by dening π (π) = π (π). Note that by the assumption that P2 is violatedby each π in O, π is dened for each π in O. But then, since O is a nite set, there must be some π β O for whichπ (π) attains its maximum value. We obtain a contradiction by exhibiting another π β O for which π (π) is strictlylarger than π (π). We rst describe the construction of π from π , and then prove that π so constructed is in O andhas π (π) > π (π).
Construction. Given π β O, let π (π) = π β [π β 1]. Let π (π) = π, π (π) = π + 1 and suppose that π, π β πΏπ (π).By the denition of π , there is a parent of π that is not a parent of π. Dene πΆπ to be the set {π} βͺ paπ· (π). Sinceπ· has no v-structures, πΆπ is a clique in skeleton (π·). Dene ππ Β·Β·= {π§ |πΆπ β paπ· (π§)}. Let ππ be the set of nodesoccurring after π in π that are not in ππ (note that π β ππ , and, in general, π§ β ππ if and only if π (π§) > π (π) andthere is an π₯ β πΆπ that is not adjacent to π§).
We now note the following easy to verify properties of the sets ππ and ππ (the proof is provided in Section B.2).
Proposition 3.5. 1. If π¦ β ππ and π§ is a child of π¦ in π· , then π§ β ππ .
2. Suppose that π₯ β {π} βͺ ππ is not a maximal-clique-sink node of π· . Then there exists π¦ β ππ such thatπ₯ β paπ· (π¦) and such that π¦ is a maximal-clique-sink node in π· . In particular, ππ is non-empty.
3. Suppose that π₯ is a maximal-clique-sink node in the induced DAG π» Β·Β·= π· [ππ]. Then π₯ is also a maximal-clique-sink node in π· .
The ordering π is now dened as follows: the rst π nodes in π are the same as π . After this, the nodesof ππ appear according to some topological ordering πΎ of the induced DAG π» Β·Β·= π· [ππ] that satises P1 ofDenition 3.2 in π» (such a πΎ exists because of Lemma 3.4 applied to the induced DAG π» , which also cannot haveany v-structures). Finally, the nodes of ππ appear according to their ordering in π .
Proof that π β O and π (π) > π (π). Note rst that π is a topological ordering of π· . For, if not, then theremust exist π’ β ππ and π£ β ππ such that the edge π’ β π£ is present in π· , but this cannot happen by item 1 ofProposition 3.5 above.
To show that π β O (i.e., that π satises P1), the following notation will be useful. For each maximal-clique-sinkvertex π in π· , denote by _(π ) the unique πΏπΌ (π) such that π β πΏπΌ (π). Similarly, denote by ` (π ) the unique πΏπ½ (π)such that π β πΏπ½ (π). Since π β O, we already know that _(π ) is a clique for each maximal-clique-sink node π of π· .In order to show that π β O, all we need to show is that ` (π ) is also a clique for each maximal-clique-sink node π of π· .
Let π½ be the set of maximal-clique-sink vertices of π· present in ππ . By item 2 of Proposition 3.5, the lastvertex in πΎ must be an element of π½ . Note also that π π β π½ since π β ππ and the edge π β π π in π· togetherimply that π π β ππ by item 1 of Proposition 3.5. We also observe that _(π ) β ππ , for all π β π½ . For if there existsπ’ β _(π ) β© ππ then the edge π’ β π in π· implies that π β ππ , contradicting that π β ππ . From the constructionof π , we already have ` (π π ) = _(π π ) for all maximal-clique-sink vertices π π that precede π π in π . From the abovediscussion, we also get that for any maximal-clique-sink node π of π· such that π β ππ , ` (π ) β _(π ). Thus, whenπ is a maximal-clique-sink node of π· that is not in π½ β ππ , we have that ` (π ) is a clique in skeleton (π·), since` (π ) β _(π ), and _(π ) is a clique in skeleton (π·). It remains to show that ` (π ) is a clique when π β π½ .
Let π‘1, π‘2, . . . , π‘π be the maximal-clique-sink nodes of π» = π· [ππ], arranged in increasing order by πΎ . Since πΎsatises P1 in π» , each πΏπ (πΎ), 1 β€ π β€ π , is a clique in π» (and thus also in π·). Now consider a maximal-clique-sinknode π β π½ β ππ . Since the π‘π are maximal-clique-sink nodes of π· (from item 3 of Proposition 3.5), it follows that` (π ) β πΏπ (πΎ) (if π β πΏπ (πΎ) for π β₯ 2) or ` (π ) β πΏ1 (πΎ) βͺπΆπ (if π β πΏ1 (πΎ)). In the former case, ` (π ) is automatically aclique, since πΏπ (πΎ) is a clique in π· . In the latter case also ` (π ) is a clique since πΏ1 (πΎ) β ππ , so that πΏ1 (πΎ) βͺπΆπ is aclique in π· .
7
Thus, we get that π also satises P1, so that π β O. Consider the node π β² Β·Β·= π ( π + 1) next to π in π . Since ππ isnon-empty, the construction of π implies π β² β ππ , so that π β² is adjacent to all parents of π. Since π and π agreeon the ordering of all vertices up to π, we thus have π (π) β₯ π (π β²) = π + 1 > π = π (π). This gives the desiredcontradiction to π being chosen as a maximum of π . Thus, there must exist some ordering in O which satisesP2. οΏ½
Theorem 3.6. Given a DAG π· without v-structures with π nodes,βπβπ2βatomic interventions are necessary to learn
the directed edges of π· , starting with skeleton (π·). Here π is the number of distinct maximal cliques in skeleton (π·).
Proof. Consider a CBSP ordering π of nodes in π· . Let π, π be two nodes such that π, π β πΏπ (π) for some π β [π ]and such that π (π) = π (π) + 1. Consider a set πΌ of atomic interventions such that πΌ β© {π, π} = β . We show nowthat the edge π β π is not directed in the πΌ -essential graph EπΌ (π·) of π· .
Suppose, for the sake of contradiction, that π β π is directed in EπΌ (π·). Then, by item 4 of Theorem 2.1,π β π must be strongly πΌ -protected in EπΌ (π·). Since πΌ β© {π, π} = β , one of the graphs in Figure 1 must appear asan induced subgraph of EπΌ (π·).
We now show that none of these subgraphs can appear as an induced subgraph of EπΌ (π·). First, subgraphs (ii)and (iv) cannot be induced subgraphs of EπΌ (π·) since they have a v-structure at π while π· (and therefore alsoEπΌ (π·)) has no v-structures. For subgraph (iii) to appear as an induced subgraph, the vertex π must lie between πand π in any topological ordering of π· , which contradicts the fact that π and π are consecutive in the topologicalordering π . For subgraph (i) to appear, we must have a parent π of π that is not adjacent to π. However, since πis a CBSP ordering, it satises property P2 of Denition 3.2, so that, since π, π are consecutive in π and belongto the same πΏπ (π), any parent of π must also be a parent of π. We thus conclude that π β π cannot be stronglyπΌ -protected in EπΌ (π·), and hence is not directed in it.
The above argument implies that any set πΌ of atomic interventions that fully orients π· starting withskeleton (π·) (i.e., for which EπΌ (π·) = π·) must contain at least one node of each pair of consecutive nodes(in π) of πΏπ (π), for each π β [π ]. Thus, for each π β [π ], πΌ must contain at least d(|πΏπ (π) | β 1)/2e nodes of πΏπ (π).We therefore have,
|πΌ | β₯πβοΈπ=1
β|πΏπ (π) | β 1
2
ββ₯β
πβοΈπ=1
|πΏπ (π) | β 12
β=
ββππ=1 |πΏπ (π) |
2β π2
β=
βπ β π2
β. οΏ½
The following corollary for general DAGs (those that may have v-structures) follows from the previous resultabout DAGs without v-structures in a manner identical to previous work (Squires et al., 2020), using the factthat it is necessary and sucient to separately orient each chordal chain component of an MEC in order to fullyorient an MEC (Hauser and BΓΌhlmann, 2014, Lemma 1). We defer the standard proof to Section B.3.
Theorem 3.7. Let π· be an arbitrary DAG and let E(π·) be the chain graph with chordal chain componentsrepresenting the MEC of π· . Let πΆπΆ denote the set of chain components of E(π·), and π (π) the number of maximalcliques in the chain component π β πΆπΆ . Then, any set of atomic interventions which fully orients E(π·) must be ofsize at least βοΈ
π βπΆπΆ
β|π | β π (π)
2
ββ₯βπ β π
2
β,
where π is the number of nodes in π· , and π is the total number of maximal cliques in the chordal chain componentsof E(π·) (including chain components consisting of singleton vertices).
3.1 Tightness of Universal Lower BoundWe now show that our universal lower bound is tight up to a factor of 2: for any DAG π· , there is a set of atomicinterventions of size at most twice the lower bound that fully orients the MEC of π· . In fact, as the proof of thetheorem below shows, when π· has no v-structures, this intervention set can be taken to be the set of nodes of π·that are not maximal-clique-sink nodes of π· .
Theorem 3.8. Let π· be a DAG without v-structures with π nodes, and let π be the number of distinct maximalcliques of skeleton (π·). Then, there exists a set πΌ of atomic interventions of size at most π β π such that πΌ fully orientsskeleton (π·) (i.e., EπΌ (π·) = π·).
8
Proof. Fix any topological ordering π of π· . Let the maximal cliques of π· beπΆ1, . . . ,πΆπ , and let π π Β·Β·= sinkπ· (πΆπ ), forπ β [π ]. Observation 3.1 implies that each node of π = {π 1, . . . , π π } is distinct. We re-index these nodes accordingto the ordering π , i.e. π (π π ) < π (π π ) when π < π . Consider the set πΌ Β·Β·= π \ π of atomic interventions (note that|πΌ | = π β π ). We show that EπΌ (π·) = π· . Note that every edge of π· , except those which have both end-points in π ,has a single end-point in one of the interventions in πΌ , and hence is directed in EπΌ (π·) (by item 3 of Theorem 2.1).We show now that all edges with both end-points in π are also oriented in EπΌ (π·).
Suppose, if possible, that there exist π π , π π β π , with π < π such that π π and π π are adjacent in skeleton (π·), sothat the edge π π β π π is present in π· , but for which π π β π π is not directed in EπΌ (π·). We derive a contradiction tothis supposition. To start, choose an π π , π π as above with the smallest possible value of π . In particular, this choiceimplies that every edge of the form π’ β π π in π· is directed in EπΌ (π·).
Note that, by Observation 3.1, πΆπ = {π π } βͺ paπ· (π π ) and πΆ π ={π π}βͺ paπ·
(π π)are distinct maximal cliques
in skeleton (π·). Thus, there must exist an π₯ β πΆπ that is not a parent of π π in π· . Further, since π (π π ) < π (π π ),all vertices of πΆπ appear before π π in π . Thus, π₯ β πΆπ that is not a parent of π π in π· is also not adjacent to π πin skeleton (π·). Further, by the choice of π , the edge π₯ β π π is directed in EπΌ (π·). Thus, we have the inducedsubgraph π₯ β π π β π π in EπΌ (π·). However, according to item 2 of Theorem 2.1, such a graph cannot appear asan induced subgraph of an πΌ -essential graph EπΌ (π·), and we have therefore reached the desired contradiction. Itfollows that EπΌ (π·) has no undirected edges, and is therefore the same as π· . οΏ½
Using again the fact that it is necessary and sucient to separately orient each of the chordal chain componentsof an MEC in order to fully orient an MEC, the following result for general DAGs follows immediately fromTheorem 3.8, and implies that the lower bound for general DAGs is also tight up to a factor of 2 (the proof isprovided in Section B.4).
Theorem 3.9. Let π· be an arbitrary DAG on π nodes and let E(π·) and π be as in the notation of Theorem 3.7. Then,there is a set of atomic interventions of size at most π β π that fully orients E(π·).
Remark 3.10. Essentially the same argument as that used for proving Theorem 3.8 gives the following result, whichin many cases improves upon the bound in the above two theorems. The proof of this result is provided in Section B.5.
Theorem 3.11. Let π· be an DAG on π nodes without v-structures and let π be a topological ordering of π· thatsatises the clique block property P1 of Denition 3.2. Suppose that π· has π maximal-clique-sink nodes, and let πΏπ (π),1 β€ π β€ π be as in P1 of Denition 3.2. Then, there is a set of atomic interventions of size at most
βππ=1
β|πΏπ (π) |
2
βthat
fully orients E(π·).
3.2 Comparison with Known Lower BoundsTo compare our universal lower bound with the lower universal bound of Squires et al. (2020), we start with thefollowing combinatorial lemma, whose proof can be found in Section B.6.
Lemma 3.12. Let πΊ be an undirected chordal graph on π nodes in which the size of the largest clique is π . Then,π β |C| β₯ π β 1, where C is the set of maximal cliques of πΊ .
Lemma 3.12 implies thatβπβ|C |
2
ββ₯βπβ12β=
βπ2βin chordal graphs which shows that our universal lower
bound is always equal to or better than the one by Squires et al. (2020). The proof of the Lemma 3.12 makes itapparent that two bounds are close only in very special circumstances. (Split graphs and k-trees are some specialfamilies of chordal graphs for which
βπβ|C |
2
β=βπ2β). We further strengthen this intuition through theoretical
analysis of special classes of graphs and via simulations.
Examples where our Lower Bound is Signicantly Better We provide two constructions of special classesof chordal graphs in which our universal lower bound is Ξ(π) times the
βπ2βlower bound by (Squires et al., 2020)
for any π β N. Further discussion of such examples can be found in Section C.Construction 1. First, we provide a construction by (Shanmugam et al., 2015) for graphs that require about π
times more number of interventions than their lower bound, where π is size of the maximum independent set ofthe graph. This construction of a chordal graph πΊ starts with a line πΏ consisting of vertices 1, . . . , 2π such thateach node 1 < π < 2π is connected to π β 1 and π + 1. For each 1 β€ π β€ π , πΊ has a clique πΆπ of size π which hasexactly two nodes 2πβ1, 2π from the line πΏ. Maximum clique size ofπΊ isπ , number of nodes, π = ππ , and number
9
of maximal cliques, |C| = 2π β 1. Thus, forπΊ , we have, π β |C| = π (π β 2) + 1 which impliesβπβ|C |
2
β= Ξ(π)
βπ2β
for π > 2.Construction 2. πΊ has π cliques of size π , with every pair of cliques intersecting at a unique node π£ . The
number of nodes in πΊ is π (π β 1) + 1, maximum clique size is π , and number of maximal cliques is π , thus,π β |C| = π (π β 2) + 1 which implies
βπβ|C |
2
β= Ξ(π)
βπ2βfor π > 2.
4 Empirical ExplorationsIn this section we report results of two experiments on synthetic data. In Experiment 1, we compare our lowerbound with the optimal intervention size for a large number of randomly generated DAGs. Optimal interventionsize for a DAG π· is dened as the size of the smallest set of atomic interventions πΌ such that EπΌ (π·) = π· . Next,in Experiment 2, we compare our universal lower bound with the one in the work of Squires et al. (2020) forrandomly generated DAGs with small cliques. These experiments provide empirical evidence that strengthensour result about the tightness of our universal lower bound (Theorem 3.9) and the constructions presented inSection 3.2. The experiments use the open source causaldag (Squires, 2018) and networkx (Hagberg et al., 2008)packages. Further details about the experimental setup for both experiments are given in Section D.
Figure 3: Comparison of the Optimal Intervention Set Size with our Universal Lower Bound
Experiment 1 For this experiment, we generate 1000 graphs from ErdΕs-RΓ©nyi graph model πΊ (π, π): for eachof these graphs, the number of nodes π is a random integer in [5, 25] and the connection probability π is arandom value in [0.1, 0.3). These graphs are then converted to DAGs without v-structures by imposing a randomtopological ordering and adding extra edges if needed. To compute the optimal intervention size, we check if asubset of nodes, πΌ of a DAG π· is such that EI (π·) = π· , in increasing order of the size of such subsets. Next, wecompute the universal lower bound value for each of these DAGs as given in Theorem 3.7. In Figure 3, we plotthe optimal intervention size and our lower bound for each of the generated DAGs. Thickness of the points isproportional to the number of points landing at a coordinate. Notice that, all points lie between lines π¦ = π₯ andπ¦ = 2π₯ , as implied by our theoretical results. Further, we can see that, a large fraction of points are closer to theline π¦ = π₯ compared to the line π¦ = 2π₯ , suggesting that our lower bound is even tighter for many graphs.
Experiment 2 For this experiment, we generate 1000 random DAGs without v-structures for each size in{10, 20, 30, 40, 50, 60} by xing a perfect-elimination ordering of the nodes and then adding edges (which are
10
Figure 4: Comparison of our Universal Lower Bound with that of Squires et al. (2020)
oriented according to the perfect-elimination ordering) to the DAG making sure that there are no v-structures,while trying to keep the size of each clique below 5. For each DAG, we compute the ratio of the two lower bounds.In Figure 4, we plot each of these ratios in a scatter plot with the π₯-axis representing the number of nodes ofthe DAG. Thickness of the points is proportional to the number of DAGs having a particular value of the ratiodescribed above. We also plot the average of the ratios for each dierent value of the number of nodes. We seethat our lower bound can sometimes be βΌ 5 times of the lower bound of Squires et al. (2020). Moreover, theaverage ratio has an increasing trend suggesting that our lower bound is much better for this class of randomlygenerated DAGs.
5 ConclusionWe prove a strong universal lower bound on the minimum number of atomic interventions required to fullylearn the orientation of a DAG starting from its MEC. For any DAG π· , by constructing an explicit set of atomicinterventions that learns π· completely (starting with the MEC of π·) and has size at most twice of our lower boundfor the MEC of π· , we show that our universal lower bound is tight up to a factor of two. We prove that our lowerbound is better than the best previously known universal lower bound (Squires et al., 2020) and also constructexplicit graph families where it is signicantly better. We then provide empirical evidence that our lower boundmay be stronger than what we are able to prove about it: by conducting experiments on randomly generatedgraphs, we demonstrate that our lower bound is often tighter (than what we have proved), and also that it is oftensignicantly better than the previous universal lower bound (Squires et al., 2020). An interesting direction forfuture work is to design intervention sets of sizes close to our universal lower bound. We note that in contrast tothe earlier work of Squires et al. (2020), whose lower bound proofs were based on new sophisticated constructions,our proof is based on the simpler notion of a CBSP ordering, which in turn is inspired from elementary ideas inthe theory of chordal graphs. We expect that the notion of CBSP orderings may also play an important role infuture work on designing optimal intervention sets.
11
Bibliography
Addanki, R., Kasiviswanathan, S. P., McGregor, A., and Musco, C. (2020). Ecient Intervention Design for CausalDiscovery with Latents. Proceedings of the 37th International Conference on Machine Learning (ICML 2020),PMLR, 119:63β73. arXiv:2005.11736.
Addanki, R., McGregor, A., and Musco, C. (2021). Intervention Ecient Algorithms for Approximate Learning ofCausal Graphs. Proceedings of the 32nd International Conference on Algorithmic Learning Theory (ALT 2021),PMLR, 132:151β184. arXiv:2012.13976.
Agrawal, R., Squires, C., Yang, K. D., Shanmugam, K., and Uhler, C. (2019). ABCD-Strategy: Budgeted ExperimentalDesign for Targeted Causal Structure Discovery. Proceedings of the 22nd International Conference on ArticialIntelligence and Statistics (AISTATS 2019), PMLR, 89:3400β3409.
AhmadiTeshnizi, A., Salehkaleybar, S., and Kiyavash, N. (2020). LazyIter: A Fast Algorithm for Counting MarkovEquivalent DAGs and Designing Experiments. Proceedings of the 37th International Conference on MachineLearning (ICML 2020), PMLR, 119:125β133.
Andersson, S. A., Madigan, D., and Perlman, M. D. (1997). A Characterization of Markov Equivalence Classes forAcyclic Digraphs. Annals of Statistics, 25(2):505β541.
Blair, J. R. S. and Peyton, B. (1993). An Introduction to Chordal Graphs and Clique Trees. In Graph Theory andSparse Matrix Computation, volume 56 of The IMA Volumes in Mathematics and its Applications, pages 1β29.Springer.
Bottou, L., Peters, J., QuiΓ±onero-Candela, J., Charles, D. X., Chickering, D. M., Portugaly, E., Ray, D., Simard,P., and Snelson, E. (2013). Counterfactual Reasoning and Learning Systems: The Example of ComputationalAdvertising. Journal of Machine Learning Research, 14(65):3207β3260.
Chickering, D. M. (1995). A Transformational Characterization of Equivalent Bayesian Network Structures. Pro-ceedings of the 11th Conference on Uncertainty in Articial Intelligence (UAI 1995), pages 87β98. arXiv:1302.4938.
Eberhardt, F. (2008). Almost Optimal Intervention Sets for Causal Discovery. Proceedings of the 24th Conferenceon Uncertainty in Articial Intelligence (UAI 2008). arXiv:1206.3250.
Eberhardt, F., Glymour, C., and Scheines, R. (2005). On the Number of Experiments Sucient and in the WorstCase Necessary to Identify All Causal Relations Among π Variables. Proceedings of the 21st Conference onUncertainty in Articial Intelligence (UAI 2005). arXiv:1207.1389.
Friedman, N. (2004). Inferring Cellular Networks Using Probabilistic Graphical Models. Science, 303(5659):799β805.
Ghassami, A., Salehkaleybar, S., Kiyavash, N., and Bareinboim, E. (2018). Budgeted Experiment Design for CausalStructure Learning. Proceedings of the 35th International Conference on Machine Learning, (ICML 2018), PMLR,80:1719β1728.
Glymour, C., Zhang, K., and Spirtes, P. (2019). Review of Causal Discovery Methods Based on Graphical Models.Frontiers in Genetics, 10:524.
Greenewald, K., Katz, D., Shanmugam, K., Magliacane, S., Kocaoglu, M., Boix Adsera, E., and Bresler, G. (2019).Sample Ecient Active Learning of Causal Trees. Proceedings of 33rd Annual Conference on Neural InformationProcessing Systems (NeurIPS 2019).
Hagberg, A. A., Schult, D. A., and Swart, P. J. (2008). Exploring Network Structure, Dynamics, and Function usingNetworkX. Proceedings of the 7th Python in Science Conference (SciPy 2008), pages 11β15.
Hauser, A. and BΓΌhlmann, P. (2012). Characterization and Greedy Learning of Interventional Markov EquivalenceClasses of Directed Acyclic Graphs. Journal of Machine Learning Research, 13:2409β2464.
Hauser, A. and BΓΌhlmann, P. (2014). Two Optimal Strategies for Active Learning of Causal Models fromInterventional Data. International Journal of Approximate Reasoning, 55(4):926β939.
He, Y.-B. and Geng, Z. (2008). Active Learning of Causal Networks with Intervention Experiments and OptimalDesigns. Journal of Machine Learning Research, 9(84):2523β2547.
12
Hyttinen, A., Eberhardt, F., and Hoyer, P. O. (2013a). Experiment Selection for Causal Discovery. Journal ofMachine Learning Research, 14(57):3041β3071.
Hyttinen, A., Hoyer, P. O., Eberhardt, F., and Jarvisalo, M. (2013b). Discovering Cyclic Causal Models withLatent Variables: A General SAT-Based Procedure. Proceedings of the 29th Confrence on Uncertainty in ArticialIntelligence (UAI 2013). arXiv:1309.6836.
Kocaoglu, M., Dimakis, A., and Vishwanath, S. (2017). Cost-Optimal Learning of Causal Graphs. Proceedings ofthe 34th International Conference on Machine Learning (ICML 2017), PMLR, 70:1875β1884. arXiv:1703.02645.
Meek, C. (1995). Causal Inference and Causal Explanation with Background Knowledge. Proceedings of the 11thConference on Uncertainty in Articial Intelligence (UAI 1995), pages 403β410. arXiv:1302.4972.
Pearl, J. (2009). Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition.
Shanmugam, K., Kocaoglu, M., Dimakis, A. G., and Vishwanath, S. (2015). Learning Causal Graphs with SmallInterventions. Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NeurIPS2015), pages 3195β3203. arXiv:1511.00041.
Shen, X., Ma, S., Vemuri, P., Simon, G., and The Alzheimerβs Disease Neuroimaging Initiative (2020). Challengesand Opportunities with Causal Discovery Algorithms: Application to Alzheimerβs Pathophysiology. ScienticReports, 10:2975.
Squires, C. (2018). Causaldag Python Package. BSD License. Available fromhttps://github.com/uhlerlab/causaldag.
Squires, C., Magliacane, S., Greenewald, K. H., Katz, D., Kocaoglu, M., and Shanmugam, K. (2020). Active StructureLearning of Causal DAGs via Directed Clique Trees. Proceedings of 34th Annual Conference on Neural InformationProcessing Systems (NeurIPS 2020). arXiv:2011.00641.
Tarjan, R. E. and Yannakakis, M. (1984). Simple Linear-Time Algorithms to Test Chordality of Graphs, TestAcyclicity of Hypergraphs, and Selectively Reduce Acyclic Hypergraphs. SIAM Journal on Computing, 13(3):566β579.
Verma, T. S. and Pearl, J. (1990). Equivalence and Synthesis of Causal Models. Proceedings of the 6th Conference onUncertainty in Articial Intelligence (UAI 1990), pages 220β227. arXiv:1304.1108.
WienΓΆbst, M., Bannach, M., and LiΕkiewicz, M. (2021). Polynomial-Time Algorithms for Counting and SamplingMarkov Equivalent DAGs. Proceedings of the 35th AAAI Conference on Articial Intelligence (AAAI 2021), pages12198β12206. arXiv:2012.09679.
13
A Proof of Lemma 3.4 of the Main PaperHere, we provide the proof of Lemma 3.4 of the main paper. As stated there, the lemma follows from well-knownideas in the theory of chordal graphs. The following generalization of the denition of the clique block propertyP1 of Denition 3.2 will be useful in the proof.
Denition A.1 (π΄-clique block ordering). Let π· be a DAG andπ΄ a subset of vertices of π· . Let π be a topologicalordering of π· . Let the elements ofπ΄ be π1, π2, . . . , ππ , arranged so that π (ππ ) < π (π π ) whenever π < π . Dene πΏπ΄1 (π) tobe the set of nodes π’ which occur before or at the same position as π1 in π i.e., π (π’) β€ π (π1). Similarly, for 2 β€ π β€ π ,dene πΏπ΄π (π) to be the set of nodes which occur in π before or at the same position as ππ , but strictly after ππβ1 (i.e.,π (ππβ1) < π (π’) β€ π (ππ )). Then, π is said to be an π΄-clique block ordering of π· if βͺππ=1πΏπ΄π (π) is the set of all verticesof π· , and for each 1 β€ π β€ π , πΏπ΄π (π) is a (not necessarily maximal) clique in skeleton (π·).
The following observation is immediate with this denition.
Observation A.2. Let π· be a DAG without v-structures, and let π΄ be the set of maximal-clique-sink vertices of π· .Then, a topological ordering π of π· satises property P1 of Denition 3.2 if and only if π is an π΄-clique block orderingof π· .
Proof. The βifβ direction follows from the denition. For the βonly ifβ direction, we note that since every vertexof π· must be contained in some maximal cliqueπΆ of skeleton (π·), and sinceπΆ = {π } βͺ paπ· (π ) for some maximal-clique-sink vertex π , it follows that every vertex of π· must lie in some πΏπ (π) if π satises the clique block propertyP1. οΏ½
We also note the following simple property of π΄-clique block orderings.
Observation A.3. Let π· be a DAG without v-structures, and let π be a topological ordering of π· . Let π΄ and π΅ besubsets of vertices of π· such that π΄ β π΅. If π is an π΄-clique block ordering of π· , then it is also a π΅-clique blockordering of π· .
Proof. Whenπ΄ = π΅, there is nothing to prove. Thus, we can assume that there must exist a π β π΅ \π΄. We considerthe case when π΅ = π΄ βͺ {π}. The general case then follows by straightforward induction on the size of |π΅ \π΄|.
For 1 β€ π β€ |π΄|, let πΏπ΄π (π) be as in the denition of the π΄-clique block orderings. Let π be the unique indexsuch that π β πΏπ΄π (π). Now, for π < π , dene πΏπ΅π (π) Β·Β·= πΏπ΄π (π), and for π β₯ π + 1, dene πΏπ΅π+1 (π) Β·Β·= πΏπ΄π (π). Byconstruction, for 1 β€ π β€ π β 1 and π + 2 β€ π β€ |π΄| + 1, the πΏπ΅π (π) are cliques in skeleton (π·). Further dene
πΏπ΅π (π) Β·Β·={π’ β πΏπ΄π (π) | π (π’) β€ π (π)
}, and
πΏπ΅π+1 (π) Β·Β·= πΏπ΄π (π) \ πΏπ΅π (π).
Again, πΏπ΅π (π) and πΏπ΅π+1 (π) are also cliques in skeleton (π·) since they are subsets of the clique πΏπ (π). Further, byconstruction, βͺ |π΅ |
π=1πΏπ΅π (π) = βͺ
|π΄ |π=1πΏ
π΄π (π). This shows that π is also a π΅-clique block ordering. οΏ½
We can now state the main technical lemma required for the proof of Lemma 3.4 of the main paper.
Lemma A.4. Let π· be a DAG without π£-structures. Let π be the set of maximal-clique-sink vertices of π· . Then,there exists a maximal clique πΆ of skeleton (π·) with the following two properties:
1. If π’ β πΆ then paπ· (π’) β πΆ . That is, if π’ β πΆ and π£ β πΆ , then the edge π£ β π’ is not present in π· .
2. Let π β² be the set of maximal-clique-sink nodes of the induced DAG π· [π \πΆ], where π is the set of nodes of π· .Then π β² is a subset of π \ {sinkπ· (πΆ)}.
Proof. As already alluded to in the main paper, the proof of item 1 uses ideas that are very similar to the βmaximalcardinality searchβ algorithm for chordal graphs (Tarjan and Yannakakis (1984), see also Corollary 2 of WienΓΆbstet al. (2021)). Fix an arbitrary topological ordering π of π· , and let π£1 = π (1) be the top vertex in π . Note that π£1has no parents in π· , so any vertices adjacent to π£1 in π· are children of π£1. Let πΆ β² be the set of these children ofπ£1 in π· . If πΆ β² is empty, then π£1 is isolated in π· and we are done with the proof of item 1 after taking πΆ = π£1. So,
14
assume that πΆ β² is not empty, and let its elements be π1, ππ , . . . ππ , arranged so that π (ππ ) < π (π π ) whenever π < π .Now, dene the sets πΆ β²π , where 1 β€ π β€ π as follows. First, πΆ β²1 Β·Β·= {π£1, π1}. For 2 β€ π β€ π ,
πΆ β²π Β·Β·={πΆ β²πβ1 βͺ {ππ } if πΆ β²πβ1 β paπ· (ππ ),πΆ β²πβ1 otherwise.
(1)
DeneπΆ Β·Β·= πΆ β²π . Note that by construction,πΆ is a clique. Note also the following property of this construction: forany π , ππ β πΆ if and only if there exists 1 β€ π < π such that π π β πΆ and π π is not adjacent to ππ in π· .
We now claim that πΆ is also a maximal clique. For, if not, let π’ β πΆ be such that π’ is adjacent to every vertexin πΆ . Then, we must have π’ = ππ for some π (since π£1 β πΆ , and only the children of π£1 are adjacent to π£1 in π·).But then, since π’ = ππ β πΆ , there must exist some π π , π < π , such that π π β πΆ is not adjacent to π’ = ππ , which is acontradiction to the assumption of π’ being adjacent to every vertex of πΆ .
We now claim that if π£ β πΆ , then for all π’ β πΆ , the edge π£ β π’ is not present in π· . Suppose, if possible, thatthere exist π£ β πΆ and π’ β πΆ such that π£ β π’ is present in π· . By the choice of π£1 as a top vertex in a topologicalorder, we must have π’ β π£1. Thus, π’ must be a child of π£1 in π· . Suppose π’ = ππ , for some π β [π]. Then, π£ mustalso be a child of π£1, for otherwise π£ β ππ β π£1 would be a v-structure in π· . Thus, π£ = π π for some π < π . Sinceπ π = π£ β πΆ , there exists some β < π such that πβ β πΆ and πβ and π£ = π π are not adjacent. But then πβ β ππ β π π is av-structure in π· , so we again get a contradiction. This proves item 1 of the lemma for the clique πΆ .
Item 2 of the lemma trivially follows if π \πΆ is empty, therefore, we are interested in the case when π \πΆ isnon-empty. Now consider the induced DAG π» Β·Β·= π· [π \πΆ]. Since π· has no v-structures, neither does π» . Thus, byObservation 3.1 of the main paper, the maximal-clique-sink nodes of π» and the maximal cliques of skeleton (π» )are in one-to-one correspondence: for each maximal cliqueπΆ β² of skeleton (π» ), there is a unique vertex sinkπ» (πΆ β²)of π» such that πΆ β² = {sinkπ» (πΆ β²)} βͺ paπ» (sinkπ» (πΆ β²)).
Consider now a maximal-clique-sink vertex π β² of π» . There exists then a maximal clique πΆ β² of skeleton (π» )such that π β² = sinkπ» (πΆ β²). Also, since π» is an induced subgraph of π· , there must exist a maximal clique πΆ β²β² ofskeleton (π·) such that πΆ β²β² β πΆ β². In fact, we must further have πΆ β²β² β© (π \πΆ) = πΆ β², for otherwise πΆ β² would not bea maximal clique of π» . Let π‘ Β·Β·= sinkπ· (πΆ β²β²). We will show that π‘ = π β². Note rst that we cannot have π‘ β πΆ , forthen, by item 1,πΆ β²β² = {π‘} βͺ paπ· (π‘) would be contained inπΆ , and would not therefore containπΆ β² β π \πΆ . Thus, π‘must be a node in π \πΆ . But then πΆ β²β² β© (π \πΆ) = πΆ β² implies that π‘ must in fact be in πΆ β², and must therefore beequal to π β² = sinkπ» (πΆ β²). We thus see that any maximal-clique-sink vertex π β² of π» is also a maximal-clique-sinkvertex of π· . Item 2 of the lemma then follows by noting that sinkπ· (πΆ) is the only maximal-clique-sink vertex ofπ· not contained in π \πΆ . οΏ½
We are now ready to prove Lemma 3.4 of the main paper.
Proof of Lemma 3.4 of the main paper. We prove this claim by induction on the number of nodes in π· . The claimof the lemma is trivially true when π· has only one node. Now, x π > 1, and assume the induction hypothesisthat every DAG without v-structures which has at most π β 1 nodes admits a topological ordering that satisesthe clique block property P1 of Denition 3.2. We will complete the induction by showing that if π· is a DAGwithout v-structures which has π nodes, then π· also admits a topological ordering that satises the clique blockproperty P1 of Denition 3.2.
Let the maximal clique πΆ of π· be as guaranteed by Lemma A.4 above. If all the nodes of π· are contained in πΆ ,then the total ordering on the vertices of the clique πΆ in π· trivially satises the clique block property. Therefore,we assume henceforth that π \ πΆ is non-empty. Thus, the induced DAG π» Β·Β·= π· [π \ πΆ] is a DAG on at mostπ β 1 nodes. Let π β² be the set of maximal-clique-sink nodes of π» , and let π be the set of maximal-clique-sinknodes of π· . By the induction hypothesis, π» has a topological ordering π which satises the clique block property.Equivalently, by Observation A.2, π is an π β²-clique block ordering of π» .
Consider now the ordering π of π· obtained by listing rst the vertices of the clique πΆ in the total orderimposed on them by the DAG π· , followed by the vertices of π \ πΆ in the order specied by π . By item 1 ofLemma A.4, there is no directed edge in π· from a vertex in π \πΆ to a vertex in πΆ , so we get that π is in fact atopological ordering of π· .
Dene π = π β² βͺ sinkπ· (πΆ). We now observe that π is a π -clique block ordering of π· , with πΏπ1 (π) = πΆ andπΏππ+1 (π) = πΏπ
β²π (π), for 1 β€ π β€ |π β² |. By item 2 of Lemma A.4, we have π β π . Thus, by Observation A.3, π is also
an π-clique block ordering of π· , and therefore (by Observation A.2) satises the clique block property P1 ofDenition 3.2. οΏ½
15
B Other Omitted Proofs
B.1 Proof of Observation 3.1Proof of Observation 3.1 of main paper. Let πΆ be a maximal clique of skeleton (π·). Since the induced subgraphπ· [πΆ] is a DAG, there is at least one node π in π· [πΆ] with out-degree 0. Thus, for all π£ β πΆ, π£ β π we have, π£ β π ,which implies that πΆ \ {π } β paπ· (π ). Now, paπ· (π ) βͺ {π } must be a clique as π· does not contain v-structures.Thus, we must indeed have paπ· (π ) = πΆ \ {π } since πΆ is maximal. We thus see that there is a unique π β πΆ suchthat πΆ = paπ· (π ) βͺ {π }.
Now, suppose, if possible that there exist distinct maximal cliques πΆ1 and πΆ2 of skeleton (π·) such thatsinkπ· (πΆ1) = sinkπ· (πΆ2) = π . SinceπΆ1 andπΆ2 are distinct maximal cliques, there must exist π β πΆ1 \πΆ2, π β πΆ2 \πΆ1such that π is not adjacent to π. But then, since we have π β paπ· (π ) and π β paπ· (π ), we would have a v-structureπ β π β π, which is a contradiction to the hypothesis that π· has no v-structures. οΏ½
B.2 Proof of Proposition 3.5Proof of Proposition 3.5 from the main paper. We use the same notation as in the proof of Theorem 3.3.
1. Since π¦ β ππ , there exists π’ β πΆπ such that π’ is not adjacent to π¦ in π· . But then, if π§ β ππ , we get thev-structure π’ β π§ β π¦, which is a contradiction to π· not having any v-structures. This proves item 1 ofthe proposition.
2. Consider the clique πΆ Β·Β·= {π₯} βͺ paπ· (π₯). There exists a maximal clique πΆ β² in skeleton (π·) such thatπΆ β² ) πΆ , since π₯ is not a maximal-clique-sink node. Set π¦ Β·Β·= sinkπ· (πΆ β²). Since π₯ is not a maximal-clique-sink node in π· , we thus have π₯ β πΆ β paπ· (π¦), so that π¦ is a child of π₯ . We also have π¦ β ππ sinceπΆπ β {π₯} βͺ paπ· (π₯) = πΆ β paπ· (π¦), where the rst inclusion comes from the assumption that π₯ β {π} βͺ ππ .The fact that ππ is non-empty follows by applying the item with π₯ = π, and noticing that, by construction,π is not a maximal-clique-sink vertex in π· . This follows since π β πΏπ (π) has a child (namely, π) in πΏπ (π),while by the denition of the πΏπ , only the last vertex in πΏπ (π) is a maximal-clique-sink node of π· . Thisproves item 2 of the proposition.
3. Since π₯ is a maximal-clique-sink node inπ» = π· [ππ],πΆ Β·Β·= {π₯}βͺpaπ» (π₯) is a maximal clique in skeleton (π» ),and thereby a clique in skeleton (π·). However, note thatπΆ β² Β·Β·= πΆ βͺπΆπ is also a clique in skeleton (π·), sinceboth πΆ and πΆπ are cliques in skeleton (π·), and as πΆ β ππ , every vertex in πΆ is adjacent to every vertex inπΆπ . Thus, there exists a maximal clique πΆ β²β² in skeleton (π·) such that πΆ β² β πΆ β²β².Consider π¦ Β·Β·= sinkπ· (πΆ β²β²). Suppose, if possible, that π₯ β π¦. Then we must have π₯ β paπ· (π¦) (sinceπ₯ β πΆ β²β²), and also that πΆπ β paπ· (π¦) (as πΆπ β πΆ β²β²). Thus, we must have π¦ β ππ . But then, we get that{π¦} βͺπΆ β πΆ β²β² β© ππ , which contradicts the assumption that πΆ is a maximal clique in π» . Thus, we must haveπ₯ = π¦, so that π₯ is a maximal-clique-sink node in π· . This proves item 3 of the proposition. οΏ½
B.3 Proof of Theorem 3.7For use in this subsection and the next, we recall the characterization of I-essential graphs (Theorem 2.1) fromthe main paper. Recall that in the main paper, this characterization was only used in the setting of DAGs withoutv-structures, in the proof of Theorem 3.6. Here, we will need to use it in the setting of general graphs. (Figure 1in the statement of the theorem can be found in the main paper). Recall also that we always assume that everyintervention set contains the empty intervention, but the empty intervention is not counted in the size of anintervention set.
Theorem 2.1 (Characterization of I-essential graphs, Denition 14 and Theorem 18 of Hauser andBΓΌhlmann (2012)). Let π· be a DAG and I an intervention set. A graph π» is an I-essential graph of π· if and onlyif π» has the same skeleton as π· , all directed edges of π» are directed in the same direction as in π· , all v-structures ofπ· are directed in π» , and
1. π» is a chain graph with chordal chain components.
2. For any three vertices π, π, π of π» , the subgraph of π» induced by π, π and π is not π β π β π .
16
3. If π β π in π· (so that π, π are adjacent in π» ) and there is an intervention π½ β I such that |π½ β© {π, π}| = 1,then π β π is directed in π» .
4. Every directed edge π β π in π» is strongly I-protected. An edge π β π in π» is said to be strongly I-protectedif either (a) there is an intervention π½ β I such that |π½ β© {π, π}| = 1, or (b) at least one of the four graphs inFigure 1 appears as an induced subgraph ofπ» , and π β π appears in that induced subgraph in the congurationindicated in the gure.
Remark B.1. Strictly speaking, Theorem 18 of Hauser and BΓΌhlmann (2012) only identies the class of all I-essentialgraphs. However, it is well known, and follows easily from their results that π» satises all the four items in thestatement of Theorem 2.1 along with the additional conditions in the theorem (i.e., π» has the same skeleton as π· , alldirected edges of π» are directed in the same direction as in π· , and all v-structures of π· are directed in π» ), if and onlyif π» = EI (π·).
For completeness, we provide a proof of the above folklore remark in Section E. Here, we proceed to thefollowing easy and folklore corollary of the characterization Theorem 2.1 that has a proof similar to the proof ofLemma 1 of Hauser and BΓΌhlmann (2014). For completeness, we provide this proof in Section F.
Corollary B.2. Let π· be an arbitrary DAG, and let I be any intervention set containing the empty set. Let πΆπΆbe the set of chordal chain components of the essential graph E(π·) = E{β } (π·) of π· . For each π β πΆπΆ , deneIπ Β·Β·= {πΌ β© π | πΌ β I} to be the projection of the intervention set I to π . Consider vertices π and π that are adjacent inπ· . Then, the edge between π and π is directed in the I-essential graph EI (π·) if and only if one of the followingconditions are true:
1. π and π are elements of distinct chain components of the observational essential graph E(π·) (so that the edgebetween π and π is already directed in E(π·)).
2. π and π are in the same chain components π β πΆπΆ of E(π·) and the edge between π and π is directed in theIπ -essential graph EIπ (π· [π]) of the induced DAG π· [π].
In particular, EI (π·) = π· if and only if EIπ (π· [π]) = π· [π] for every π β πΆπΆ .
We now use the corollary to prove Theorem 3.7.
Proof of Theorem 3.7 from the main paper. Corollary B.2 says that an intervention set I learns π· starting withE(π·) if and only if EIπ (π· [π]) = π· [π] for every π β πΆπΆ . If I is a set of atomic interventions, then for each πΌ β I,|πΌ β© π | = 0 for all but one of the π β πΆπΆ . This means that if an intervention set I of atomic interventions is suchthat EI (π·) = π· , then |I | =
βπ βπΆπΆ |IS |, and EIπ (π· [π]) = π· [π] for every π β πΆπΆ , where IS is a set of atomic
interventions dened as in Corollary B.2. Since π· [π] is a DAG without v-structures, by Theorem 3.6 we have,|IS | β₯
β|π |βπ (π)
2
βwhich implies,
|I | β₯βοΈπ βπΆπΆ
β|π | β π (π)
2
ββ₯β βοΈπ βπΆπΆ
|π | β π (π)2
β=
βπ β π2
β.
This completes the proof. οΏ½
B.4 Proof of Theorem 3.9Here we restate Theorem 3.9 and provide its proof.
Theorem B.3 (Restatement of Theorem 3.9 from the main paper). Let π· be an arbitrary DAG and let E(π·) be thechain graph with chordal chain components representing the MEC of π· . LetπΆπΆ denote the set of chain components ofE(π·), and π (π) the number of maximal cliques in the chain component π β πΆπΆ . Then, there exists a set πΌ of atomicinterventions of size at most
βπ βπΆπΆ ( |π | β π (π)) = π β π , such that πΌ fully orients E(π·) (i.e., EπΌ (π·) = π·), where π is
the number of nodes in π· , and π is the total number of maximal cliques in the chordal chain components of E(π·)(including chain components consisting of singleton vertices).
Proof. Theorem 3.8 implies that for each π β πΆπΆ there is a set Iπ of atomic interventions such that |Iπ | β€ |π | βπ (π)and EIπ (π· [π]) = π· [π]. Now, let I = βͺπ βπΆπΆIπ . EI (π·) = π· by Corollary B.2, and |I | = β
π βπΆπΆ |Iπ |, which means|I | β€ β
π βπΆπΆ (|π | β π (π)) = π β π . This shows that there is a set of atomic interventions of size at most π β π whichfully orients E(π·). οΏ½
17
B.5 Proof of Theorem 3.11Proof of Theorem 3.11. The proof is similar to that of Theorem 3.8. Let πΏπ (π), for 1 β€ π β€ π (where π is the numberof maximal-clique-sink nodes of π·), be as in P1 of Denition 3.2. For each 1 β€ π β€ π , order the nodes of πΏπ (π) asπΏπ (π) π , 1 β€ π β€ |πΏπ (π) |, according to π . Dene
πΌπ Β·Β·={πΏπ (π)2πβ1 | 1 β€ π β€ d|πΏπ (π) | /2e
}. (2)
Let πΌ be the set of atomic interventions given by πΌ Β·Β·=βπ
π=1 πΌπ . By construction, |πΌ | = βππ=1 d|πΏπ (π)/2|e. We will now
complete the proof by showing that EπΌ (π·) = π· . To do this, we will show that any edge π β π in π· is directed inEπΌ (π·).
Consider rst the case when π and π are in the same πΏπ (π). If they are also consecutive in π , then, byconstruction, πΌπ performs an atomic intervention on one of π and π, so that the edge π β π would be directedin EπΌ (π·) (by item 3 of Theorem 2.1). Now, suppose that π, π are in the same πΏπ (π), but are not consecutive in π .Let π£1, π£2, . . . π£β be the vertices (all of which must be in πΏπ (π)) between π and π in π , ordered according to π . Bythe previous argument, π β π£1, π£π β π£π+1 (for 1 β€ π β€ β β 1) and π£β β π are all directed edges in EπΌ (π·) (herewe are also using the fact that πΏπ (π) is a clique in skeleton (π·)). But then, if π β π were not directed in EπΌ (π·),the undirected edge π β π along with the above directed path from π to π would form a directed cycle in EπΌ (π·),contradicting item 1 of Theorem 2.1 (which says that EπΌ (π·) must be a chain graph). Thus, we conclude that if πand π are in the same πΏπ (π), then π β π is directed in EπΌ (π·).
Now, suppose, if possible, that there is some edge π β π in π· which is not directed in EπΌ (π·). Among allsuch edges, choose the one for which the pair (π, π) is lexicographically smallest according to π . By the previousparagraph, π and π cannot belong to the same πΏπ (π), so that there must exist π > π such that π β πΏπ (π) andπ β πΏ π (π). Note that we cannot have π = πΏ π (π)1, for then an atomic intervention would be performed on π byπΌ , which would direct the edge π β π (by item 3 of Theorem 2.1). Thus, there must exist a vertex π β πΏ π (π),such that π β π is directed in EπΌ (π·) (here again we use the fact that πΏπ (π) is a clique). Now, π must be adjacentto π in skeleton (π·), for otherwise, EπΌ (π·) would contain the induced graph π β π β π, contradicting item 2 ofTheorem 2.1. Further, the edge π β π must be directed in EπΌ (π·), by the choice of (π, π) as the lexicographicallysmallest undirected edge in EπΌ (π·) (for otherwise, (π, π) would be a lexicographically smaller undirected edgecompared to (π, π), since π comes before π in π). But we then have the directed cycle π β π β π β π in EπΌ (π·),thereby contradicting item 1 of Theorem 2.1 (which says that EπΌ (π·) must be a chain graph). Thus, we concludethat EπΌ (π·) has no undirected edges, and must therefore be the same as π· . This completes the proof. οΏ½
B.6 Proof of Lemma 3.12Proof of Lemma 3.12 from the main paper. Let πΆ be a (necessarily maximal) clique of πΊ of size π . Since πΆ is amaximal clique of the chordal graphπΊ , there exists a perfect elimination ordering π ofπΊ that starts withπΆ (this isa consequence of the structure of the lexicographic breadth-rst-search algorithm used to nd perfect eliminationorderings of chordal graphs: see, e.g., the paragraph before Proposition 1 of Hauser and BΓΌhlmann (2014) andAlgorithm 6 of Hauser and BΓΌhlmann (2012) for a proof).
Now, let π· be the DAG obtained by orienting the edges ofπΊ according to π (i.e., the edge π’ β π£ β πΊ is directedas π’ β π£ in π· if and only if π (π’) < π (π£)). Suppose that sinkπ· (πΆ) = π . Note that πΆ cannot contain the nodesinkπ· (πΆ β²) for any other maximal cliqueπΆ β² since, as π starts withπΆ , this would implyπΆ β² β πΆ and would contradictthe maximality of πΆ β². Thus, there are |C| β 1 maximal-clique-sink nodes of π· other than π by Observation 3.1,and, by the above observation, they occur in π after πΆ . Thus, π β₯ |πΆ | + |C| β 1, which gives π β |C| β₯ π β 1 as|πΆ | = π . οΏ½
C Various Example GraphsIn Section 3.2 of the main paper, we proved that our universal lower bound is always at least as good as theprevious best universal lower bound given by Squires et al. (2020), and also gave examples of graph familieswhere our bound is signicantly better. We also pointed out that our lower bound and the lower bound of Squireset al. (2020) are close only in certain special circumstances. We now make give more details of these special cases.
We work with the same notation as that used in Lemma 3.12: πΊ is an undirected chordal graph, π is thenumber of nodes inπΊ , π is the size of its largest clique, and C is the set of its maximal cliques. From Lemma 3.12,
18
it follows that for our lower bound ofβπβ|C |
2
βand the lower bound of
βπ2β=βπβ12βof Squires et al. (2020) to be
equal, one of the following conditions must be true: either (i) π β |C| = π β 1, or (ii) π is even and π β |C| = π .Now consider the perfect elimination ordering π of πΊ used in the proof of Lemma 3.12, and let π· be the DAG
with skeleton πΊ constructed by orienting the edges of πΊ in accordance with π . Note that by the construction ofπ , the π vertices of a largest clique πΆ of πΊ are the rst π vertices in π . Note also that π β |C| is the number ofvertices in πΊ that are not maximal-clique-sink nodes of π· (by Observation 3.1).
Thus, it follows that condition (i) above for the two lower bounds to be equal can hold only when π· is suchthat all nodes ofπΊ outside the largest clique πΆ ofπΊ are maximal-clique-sink nodes of π· . In other words, π is aclique block ordering, in the sense of P1 of Denition 3.2 of a CBSP orderings, in which the rst clique blockπΏ1 (π) consists of the largest clique πΆ , while all other clique blocks πΏπ (π), π β₯ 2 are of size exactly 1. Similarly,condition (ii) above for the two lower bounds to be equal can hold only when π· is such that all but one of thenodes of πΊ outside the largest clique πΆ of πΊ are maximal-clique-sink nodes of πΊ .
We now give examples of two special families of chordal graphs where the above conditions for the equalityof the two lower bounds hold: Split graphs and π-trees.
Split graphs. πΊ is a split graph if its vertices can be partitioned into a clique πΆ and an independent set π . Forsuch a πΊ , one of the following possibilities must be true.
1. βπ₯ β π such that πΆ βͺ {π₯} is complete. In this case, πΆ βͺ {π₯} is a maximum clique and π is a maximumindependent set.
2. βπ₯ β πΆ such that π βͺ {π₯} is independent. In this case, π βͺ {π₯} is a maximum independent set and πΆ is amaximum clique.
3. πΆ is a maximal clique and π is a maximal independent set. In this case, πΆ must also be a maximum cliqueand π a maximum independent set.
For each of these cases, we have π β |C(πΊ) | = π β 1, which impliesβπβ|C(πΊ) |
2
β=βπ2β.
π-trees. A π-tree is formed by starting with πΎπ+1 (complete graph with π + 1 vertices) and repeatedly addingvertices in such a way that each added vertex π£ has exactly π neighbors, and such that these neighbors alongwith π£ form a clique. Thus, each added vertex creates exactly one clique of size π + 1. In particular, in a π-tree, allmaximal cliques are of size π + 1. So, in a π-treeπΊ with π = π + 1 + π nodes, we have, |C(πΊ) | = 1 + π and π = π + 1,which implies π β |C(πΊ) | = π β 1. Thus,
βπβ|C(πΊ) |
2
β=βπβ12β=βπ2β.
In contrast to the above two families, block graphs are an example family of chordal graphs where our lowerbound can be signicantly better. Construction 1 and 2 presented in Section 3.2 of the main paper are examplesof block graphs, and as discussed there, our lower bound can be Ξ(π) times the previous best universal lowerbound for block graphs, where π can be as large as Ξ(π) (where π is the number of nodes in the graph).
D Details of Experimental SetupExperiment 1 For this experiment, we generate 1000 graphs from ErdΕs-RΓ©nyi graph model πΊ (π, π): for eachof these graphs, the number of nodes π is a random integer in [5, 25] and the connection probability π is a randomvalue in [0.1, 0.3). Each of these graphs is then converted to a DAG without v-structures, using the followingprocedure. First, the edges ofπΊ are oriented according to a topological ordering π which is a random permutationof the nodes ofπΊ : this convertsπΊ into a DAG π· (possibly with v-structures). Now, the nodes of π· are processedin a reverse order according to π (i.e., nodes coming later in π are processed rst) and whenever we nd twonon-adjacent parents, π and π of the current node π’ being processed, we add an edge π β π in π· if π (π) < π (π),and π β π in π· if π (π) < π (π). Since nodes are processed in an order that is a reversal of π , this procedureensures that the resulting DAG π· has no v-structures.
In Figure 5, we provide plots from four further runs of Experiment 1. These plots use exactly the same set-upand procedure as the plot given in Figure 3 in the main paper, and dier only in the initial seed provided to theunderlying pseudo-random number generator. These seeds are used for generation of random graphs as well asfor generating π and π . To avoid any post selection bias, the seeds for these plots were formed using the decimalexpansion of π after skipping rst 1015 digits in the decimal expansion, and then taking the next 10 digits as therst seed, 10 consecutive digits after that as the second seed, and so on. Our interpretation and inferences fromthese further runs remain the same as that reported in the main paper for the run underlying Figure 3.
19
(i) Run 1 (ii) Run 2
(iii) Run 3 (iv) Run 4
Figure 5: Experiment 1 Runs with Varying Seeds
Experiment 2 For this experiment, we generate 1000 random DAGs without v-structures for each size in{10, 20, 30, 40, 50, 60}. We now describe the procedure for generating a DAGπ· (without v-structures) with π nodes,other thanπ, this procedure takes twomore inputs,πππ_πππππ’π_π ππ§π andπππ₯_πππππ’π_π ππ§π . Ifπππ_πππππ’π_π ππ§π = πandπππ₯_πππππ’π_π ππ§π = π , we try to keep the size of all cliques of π· in [π,π ]. First, we initialize a DAG π· withnodes 0, . . . , π β 1, and no edges. We take π = (0, . . . , π β 1) to be a perfect elimination ordering of π· . We thenprocess the nodes of π· in reverse order of π . When node π’ is being processed, we rst compute the number ofparents that π’ already has in π· . Now we compute lower and upper bounds β1 β₯ 0 and β2 β₯ 0 on the number ofparents that could be added to the set of parents of π’ while still keeping the total number of parents below π β 1,and at least π β 1. (Note that β1, β2 β€
οΏ½οΏ½{0, . . . , π’ β 1} \ paπ· (π’)οΏ½οΏ½, since the latter is the number of currently availablevertices that could be added to the parent set of π’). We now choose an integer β uniformly at random from [β1, β2]:this will be the number of new parents to be added to the set of parents of π’. Note that it may happen that β = 0,for example when π’ already has π β 1 or more parents, so that β2 = 0. Next, we sample a set π of β nodes (withoutreplacement) from ({0, . . . , π’ β 1} \ paπ· (π’)), and add the edges π§ β π’ to π· for each π§ β π . Further, for any twonon-adjacent parents of π’, we add an edge π β π to π· if π (π) < π (π), and π β π if π (π) < π (π). This makessure that there are no v-structures in π· . Note that, as described in the main paper, the procedure used here onlytries to keep the maximum clique size bounded above by π , but it can overshoot and produce a graph with aclique of size larger than π as well. In our experiments, we takeπππ_πππππ’π_π ππ§π = 2,πππ₯_πππππ’π_π ππ§π = 4.
In Figure 6, we provide plots from four further runs of Experiment 2. These plots use exactly the same set-upand procedure as the plot given in Figure 4 in the main paper, and dier only in the initial seed provided to theunderlying pseudo-random number generator. Again, to avoid post-selection bias, we use seeds given by theprocedure given for Experiment 1 above. Our interpretation and inferences from these further runs remain thesame as that reported in the main paper for the run underlying Figure 4.
20
(i) Run 1 (ii) Run 2
(iii) Run 3 (iv) Run 4
Figure 6: Experiment 2 Runs with Varying Seeds
E Proof of Remark B.1In this section, we supply the proof of Remark B.1 for completeness.
Proof of Remark B.1. By Theorem 10(iv) of Hauser and BΓΌhlmann (2012), it follows that EI (π·) must have thesame skeleton and the same v-structures as π· , and must also have all its directed edges directed in the samedirection as in π· . This proves that EI (π·) satises all the conditions of Theorem 2.1. To complete the proof, wenow show that it is the only graph showing all the conditions of the theorem.
For if not, then let let πΊ and π» be two dierent graphs satisfying all the conditions of Theorem 2.1. Thus, πΊand π» have the same skeleton as π· , all their directed edges are in the same direction as in π· , and further, allv-structures of π· are directed in both πΊ and π» . If πΊ β π» , the set πΈ β² of edges that are directed in πΊ but not inπ» is therefore non-empty (possibly after interchanging the labelsπΊ and π» ). Fix a topological ordering π of π· ,and let π β π β πΈ β² be such that π (π) β€ π (π β²) for all πβ² β π β² β πΈ β². Since π β π in πΊ , π β π must be stronglyI-protected in πΊ . Now, there cannot exist π½ β I such that |π½ β© {π, π} | = 1, since in that case π β π would bedirected in π» as well (by item 3 of Theorem 2.1). Thus, at least one of the four graphs in Figure 1 must appear asan induced subgraph of πΊ , with π β π appearing in that induced subgraph in the coguration indicated in thegure. If subgraph (i) appears as an induced subgraph of πΊ , then we must have π β π in π» since π (π) < π (π),but this means that π β π β π would be induced subgraph of π» , contradicting item 2 of Theorem 2.1. Similarly,π β π cannot be in the conguration indicated in subgraph (ii), since any v-structure in πΊ is directed in π» , sothat π β π would be directed in π» as well. If subgraph (iii) appears as an induced subgraph of πΊ , π β π mustbe directed in π» , as π (π) < π (π), but this would mean that π» contains a directed cycle π, π, π, π (since π β π is
21
undirected in π» ), and this contradicts that π» is a chain graph.3 If subgraph (iv) appears as an induced subgraphof πΊ , then π1 β π β π2 appears in π» as well since any v-structure of πΊ must also be directed in π» . Further, atleast one of the following four congurations must appear in π» : (a) π β π1 (b) π β π2 (c) π β π1 (d) π β π2 (for ifnot, then π1 β π β π2 would be a v-structure in π» that is not directed in πΊ , contradicting that all v-structures ofπ· are directed in both πΊ and π» ). However, if any of the four conguration appears in π» , we get a directed cyclein π» (since π β π is undirected in π» ), which contradicts the fact that π» is a chain graph.
We conclude therefore that πΈ β² must in fact be empty and hence πΊ = π» . Thus, given a DAG π· , the uniquegraph satisfying all conditions of Theorem 2.1 is the I-essential graph EI (π·) of π· . οΏ½
F Proof of Corollary B.2In this section, we supply the proof of the folklore Corollary B.2, for completeness.
Proof of Corollary B.2. Let π» be the graph with the same skeleton as π· in which exactly the edges satisfying oneof the two conditions of the corollary are directed. We prove that π» satises all the conditions of Theorem 2.1,and must therefore be the same as EI (π·) (see also Remark B.1). This will complete the proof of the corollary.
Recall that by construction, any edge of π» is directed if and only if
1. the endpoints of the edge are in dierent chain components of E(π·), so that it is already directed in E(π·),or
2. the endpoints of the edge lie in the same chain component π of E(π·), and the edge is directed in EIπ (π·π ).
In particular, item 1 implies that any edge that is directed in E(π·) is also directed in π» (since all directed edges ofa chain graph have their endpoints in dierent chain components).
We now verify that π» satises all the conditions of Theorem 2.1. By construction, π» has the same skeleton asπ· , and all its directed edges are directed in the same direction as π· . Further, all the v-structures of π· are directedin π» , since these are directed in E(π·).
Any directed cycle πΆ in π» would be a directed cycle either in E(π·) (in case πΆ includes vertices from at leasttwo dierent chain components of E(π·)), or in EIπ (π·π ) for some chain component π of E(π·) (in case πΆ iscontained within a single chain component π of E(π·)). Since both E(π·) and EIπ (π·) are chain graphs (fromTheorem 2.1), they do not have any directed cycles. It therefore follows that π» cannot have a directed cycle either,and hence is a chain graph. Further, the chain components of π» are induced subgraphs of the chain componentsof E(π·). Since the chain components of E(π·) are chordal (again from Theorem 2.1), it follows that the chaincomponents of π» are also chordal. Thus, π» satises item 1 of Theorem 2.1.
Suppose now that, if possible, π» has an induced subgraph of the form π β π β π . Thus, the edge π β π mustbe undirected in E(π·) as well, so that π and π are in the same chain component π of E(π·). If π is also in π , thenπ β π β π would be an induced sub-graph of the interventional essential graph EIπ (π·π ), which would contradictitem 2 of Theorem 2.1. Similarly, if π is not in π , then π β π would be directed in E(π·), so that π β π β π wouldbe an induced sub-graph of the essential graph E(π·) = E{β } (π·), again contradicting item 2 of Theorem 2.1.We conclude that an induced subgraph of the form π β π β π cannot occur in π» . Thus, π» satises item 2 ofTheorem 2.1.
To verify item 3, consider any two adjacent vertices π and π in π» such that |πΌ β© {π, π}| = 1 for some πΌ β I.If π and π are in dierent chain components of E(π·), then the edge between them is directed in E(π·) andhence also in π» . On the other hand, if π and π are in the same chain component π of E(π·), then we have| (πΌ β© π) β© {π, π}| = |πΌ β© {π, π}| = 1 for πΌ β© π β Iπ , so that the edge between π and π is directed in EIπ (π·π ) (byitem 3 of Theorem 2.1) and hence also in π» . It thus follows that π» satises item 3 of Theorem 2.1.
Finally, we show that any directed edge in π» is I-strongly protected. Consider rst a directed edge π β π inπ» where π and π belong to the same chain component π of E(π·). Then, since π β π is directed also in EIπ (π·π ),it must be Iπ -strongly protected in EIπ (π·π ). It follows directly from the denition of interventional strongprotection and the construction ofπ» then that π β π is I-strongly protected inπ» (since any of the congurationsof Figure 1, if present as an induced subgraph of EIπ (π·π ), is also present as an induced subgraph in π» ).
Consider now a directed edge π β π in π» when π and π are in dierent chain components of E(π·). Thenπ β π must be directed, and hence also {β }-strongly protected, in E(π·). Now, if π β π appears as part of an
3Recall that a directed cycle in a general graph is a cycle in which all directed edges point in the same direction, and in which at least oneedge is directed. The formal denition is given in Section 2.
22
induced subgraph of E(π·) of the forms (i), (ii) or (iii) of Figure 1, then the same congurarion also appears asan induced subgraph of π» (since all directed edges of E(π·) are directed in π» ), so that π β π is also I-stronglyprotected in π» . Suppose then that π β π appears as part of an induced subgraph of the form (iv) of Figure 1.Then, the vertices π, π1 and π2 appearing in the conguration must be in the same chain component π of E(π·)(since they are in a connected component of undirected edges). It follows that the congurations π1 β π β π2 andπ2 β π β π1 cannot appear in π» . For, if they did, then they would also appear in the Iπ essential graph EIπ (π·π ),thereby contradicting item 2 of Theorem 2.1 (when applied to the interventional essential graph EIπ (π·π )). Theconguration π1 β π β π2 also cannot occur in π» , since otherwise, the v-structure π1 β π β π2 of π· could nothave remained undirected in E(π·). It follows that at least one of the following three congurations must appearin π» : π β π1, π β π2 or π1 β π β π2. In the last case, π β π is I-strongly protected in π» as conguration (iv)of Figure 1 appears as an induced subgraph of π» (exactly as it does in E(π·))). In the rst two cases, π β π isI-strongly protected in π» as congurations (iii) of Figure 1 appears as an induced subgraph of π» (with the roleof the vertex π in that conguration played by either π1 or π2, as the edges π1 β π and π2 β π are both directed inπ» , since they are directed in E(π·)). Thus, we see that every directed edge in π» is I-strongly protected in π» , andhence π» satises item 4 of Theorem 2.1 as well.
It follows from Theorem 2.1 therefore that π» = EI (π·). As discussed at the beginning of the proof, thiscompletes the proof of the corollary. οΏ½
23