17
1 Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi, Perry Groot, Marianne Heins, Hans Knoop, Tom Heskes, The OPTIMISTIC consortium * Abstract—Causal modeling has long been an attractive topic for many researchers and in recent decades there has seen a surge in theoretical development and discovery algorithms. Generally discovery algorithms can be divided into two approaches: constraint-based and score-based. The constraint-based approach is able to detect common causes of the observed variables but the use of independence tests makes it less reliable. The score-based approach produces a result that is easier to interpret as it also measures the reliability of the inferred causal relationships, but it is unable to detect common confounders of the observed variables. A drawback of both score-based and constrained-based approaches is the inherent instability in structure estimation. With finite samples small changes in the data can lead to completely different optimal structures. The present work introduces a new hypothesis-free score-based causal discovery algorithm, called stable specification search, that is robust for finite samples based on recent advances in stability selection using subsampling and selection algorithms. Structure search is performed over Structural Equation Models. Our approach uses exploratory search but allows incorporation of prior background knowledge. We validated our approach on one simulated data set, which we compare to the known ground truth, and two real-world data sets for Chronic Fatigue Syndrome and Attention Deficit Hyperactivity Disorder, which we compare to earlier medical studies. The results on the simulated data set show significant improvement over alternative approaches and the results on the real-word data sets show consistency with the hypothesis driven models constructed by medical experts. Index Terms—Causal modeling, Structural equation model, Stability selection, Multi-objective evolutionary algorithm, NSGA-II. 1 I NTRODUCTION C AUSAL modeling has been an attractive topic for many researchers for decades. Especially since the 1990s there has been an enormous increase in theoretical develop- ment, partly because of advances in graphi- cal modeling [2]. This has led to a variety of causal discovery algorithms in the literature. In general, causal discovery algorithms can R. Rahmadi is with the Department of Informatics, Universitas Islam Indonesia and Institute for Computing and Information Sciences, Radboud University Nijmegen, the Netherlands. E-mail: [email protected] P. Groot and T. Heskes are with Institute for Computing and Information Sciences, Radboud University Nijmegen, the Nether- lands. M. Heins and H. Knoop are with Expert Centre for Chronic Fatigue, Radboud University Medical Centre, the Netherlands. * The members of OPTIMISTIC consortium are described in [1]. be divided into two approaches: constraint- based and score-based. Constraint-based ap- proaches work with conditional independence tests. First, they construct a skeleton graph starting with the complete graph and excluding edges between variables that are conditionally independent. Second, edges are oriented to ar- rive at a causal graph. Examples of constraint- based approaches are the IC algorithm [3], PC-FCI [4], and TC [5]. Constraint-based ap- proaches do not have to rely on the causal sufficiency assumption, and then can detect common causes of the observed variables [4]. A disadvantage of this approach is the use of independence tests on a large number of conditioning variables, making it less reliable [6]. Score-based approaches assign scores to particular graph structures based on the data fit and the complexity of the graph. Different scor- arXiv:1506.05600v3 [stat.ML] 14 Jul 2016

Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

1

Causality on Cross-Sectional Data:Stable Specification Search in Constrained

Structural Equation ModelingRidho Rahmadi, Perry Groot, Marianne Heins,

Hans Knoop, Tom Heskes, The OPTIMISTIC consortium∗

Abstract—Causal modeling has long been an attractive topic for many researchers and in recent decades there hasseen a surge in theoretical development and discovery algorithms. Generally discovery algorithms can be divided intotwo approaches: constraint-based and score-based. The constraint-based approach is able to detect common causesof the observed variables but the use of independence tests makes it less reliable. The score-based approach producesa result that is easier to interpret as it also measures the reliability of the inferred causal relationships, but it is unableto detect common confounders of the observed variables. A drawback of both score-based and constrained-basedapproaches is the inherent instability in structure estimation. With finite samples small changes in the data can leadto completely different optimal structures. The present work introduces a new hypothesis-free score-based causaldiscovery algorithm, called stable specification search, that is robust for finite samples based on recent advances instability selection using subsampling and selection algorithms. Structure search is performed over Structural EquationModels. Our approach uses exploratory search but allows incorporation of prior background knowledge. We validatedour approach on one simulated data set, which we compare to the known ground truth, and two real-world data sets forChronic Fatigue Syndrome and Attention Deficit Hyperactivity Disorder, which we compare to earlier medical studies.The results on the simulated data set show significant improvement over alternative approaches and the results on thereal-word data sets show consistency with the hypothesis driven models constructed by medical experts.

Index Terms—Causal modeling, Structural equation model, Stability selection, Multi-objective evolutionary algorithm,NSGA-II.

F

1 INTRODUCTION

CAUSAL modeling has been an attractivetopic for many researchers for decades.

Especially since the 1990s there has beenan enormous increase in theoretical develop-ment, partly because of advances in graphi-cal modeling [2]. This has led to a variety ofcausal discovery algorithms in the literature.In general, causal discovery algorithms can

• R. Rahmadi is with the Department of Informatics, UniversitasIslam Indonesia and Institute for Computing and InformationSciences, Radboud University Nijmegen, the Netherlands. E-mail:[email protected]

• P. Groot and T. Heskes are with Institute for Computing andInformation Sciences, Radboud University Nijmegen, the Nether-lands.

• M. Heins and H. Knoop are with Expert Centre for ChronicFatigue, Radboud University Medical Centre, the Netherlands.

• ∗The members of OPTIMISTIC consortium are described in [1].

be divided into two approaches: constraint-based and score-based. Constraint-based ap-proaches work with conditional independencetests. First, they construct a skeleton graphstarting with the complete graph and excludingedges between variables that are conditionallyindependent. Second, edges are oriented to ar-rive at a causal graph. Examples of constraint-based approaches are the IC algorithm [3],PC-FCI [4], and TC [5]. Constraint-based ap-proaches do not have to rely on the causalsufficiency assumption, and then can detectcommon causes of the observed variables [4].A disadvantage of this approach is the useof independence tests on a large number ofconditioning variables, making it less reliable[6]. Score-based approaches assign scores toparticular graph structures based on the data fitand the complexity of the graph. Different scor-

arX

iv:1

506.

0560

0v3

[st

at.M

L]

14

Jul 2

016

Page 2: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

2

ing metrics that are often used are the Bayesianscore [7] and the BIC score [8]. An example of ascore-based method is greedy equivalence search(GES) [9]. The goal of the score-based approachis to find the graph structure with the highestscore. An advantage of this approach is that itmeasures the reliability of the inferred causalrelationships, which makes the result easy tointerpret [10]. Score-based approaches typicallydo make the causal sufficiency assumption, andthen cannot detect common confounders ofthe observed variables. Moreover, the involvedoptimization problem is usually NP-hard, sothat different search heuristics are often used.The approach advocated in this paper is anexample of a score-based approach.

Furthermore, in causal modeling based onobservational data, the causal models are unde-termined unless a preference for parsimoniousmodels over more complex models is made[6]. In score-based approaches, such simplic-ity assumptions are typically implemented byadding a penalty for model complexity [9].Constraint-based approaches often make theimplicit assumption of so-called causal faithful-ness [6], which states that there are no condi-tional independencies that hold in the densityover a set of variables V , except those thatare entailed by the causal structure. However,in practice faithfulness can be violated andbetter constrained-based approaches have beendeveloped to handle this, such as CPC [11] andACPC [12].

A drawback of both score-based andconstrained-based approaches, however,is the inherent instability in structureestimation. With finite samples small changesin the data can lead to completely differentoptimal structures. Outcomes of borderlineindependence tests can be incorrect and canlead to multiple errors when propagated bythe discovery algorithm [6].

The present work introduces a new score-based causal discovery algorithm, called stablespecification search, that is robust for finite sam-ples based on advances in stability selectionusing subsampling and selection algorithms.Structure search is performed over StructuralEquation Models (SEM), which is the mostwidely used language for causal discovery in

various scientific disciplines. The method usesexploratory search, but allows incorporation ofprior background knowledge. In order to showthat our method can handle various kinds ofdata (continuous, discrete, and a combinationof both) we evaluated our method on simu-lated and real-world data. The simulated datais used to compare our method with someadvanced constrained-based approaches (PC-stable [13], CPC) and a score-based approach(GES). Specifically, we compare the robustnessof each method in computing causal structure.The real-world data sets, about Chronic FatigueSyndrome and Attention Deficit Hyperactiv-ity Disorder, are used to compare our resultswith some previous studies. The results showthat our exploratory, hypothesis-free approachgives significant improvement over alternativeapproaches, and is able to obtain structure es-timates that are consistent with the hypothesisdriven models constructed by medical expertsbased on medical data and years of experience.

The rest of this paper is structured as follows.Section 2 describes all the background materialobtained from the existing literature. Section 3describes our robust score-based approach forcausal discovery. Section 4 presents experi-mental result on one simulated and two real-world data sets. Section 5 gives conclusionsand suggestions for future work.

2 BACKGROUND

2.1 Directed Acyclic Graph

We first describe some graphical notation andterminology used in the remaining sections. Agraph is a pair (V,E) with V a set of nodesand E a set of edges. A directed graph has alledges in E directed (arc); a single arrowheadon every edge, e.g., A→ B. Directed cyclesrepresent feedback or reciprocal relationships,e.g., A → B → A. A graph with no directedcycles is called acyclic. A graph which is bothdirected and acyclic is called a Directed AcyclicGraph (DAG) [2]. Figure 1 depicts a DAG offour variables. The skeleton of a DAG is theundirected graph that results from removingthe directionality of every edge. A v-structure ina DAG G is an ordered triple (x, y, z) such that

Page 3: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

3

G contains the directed edges x → y, z → y,and x and z are not adjacent in G [14].

2.2 Causal Modeling in SEMIn this study, we focus on causal models withno reciprocal or feedback relationships, andno latent variables. Generally, there are twocommon ways of representing a model in SEM:by stating all relations in the set as equations,which is called a causal model, or by drawingthem as a causal diagram (graph). The generalform of the equations is

xi = fi(pai, εi), i = 1, . . . , n. (1)

where pai denotes the parents which representthe set of variables considered to be directcauses of Xi and εi represents errors on accountof omitted factors that are assumed to be mu-tually independent [2].

2.3 Specification Search in SEMTypically, a SEM is used as follows: 1) set ahypothesis as the prior model, 2) fit the modelto the data, 3) evaluate the model, and 4) mod-ify the model to improve the parsimony andscore [15]. The last step is called specificationsearch [16], [17]. This typical model refinementapproach is hypothesis-driven. It works byadding or deleting some arcs between variablesfrom the initial model. Typically only a fewmodels are evaluated, making it difficult toderive causal relationships.

An alternative approach is exploratorysearch in which no prior hypothesis is speci-fied. Typical approaches in the literature for ad-dressing the exponential search space includetabu search [18], genetic algorithms [19], [20],ant colony optimization [21], and others [22],[23], [24].

A B

C

D

Fig. 1: A DAG of four variables.

2.4 Multi-objective OptimizationFollowing the principle of Occam’s razor, weshould prefer models that are simple and fit thedata well. These two objectives, however, areoften conflicting as a well-fit model is likely tobe a complex model. In this paper, we proposeto make use of multi-objective optimization toexplicitly optimize both objectives.

In multi-objective optimization, optimal so-lutions are defined in terms of domination. Amodel x1 is said to dominate model x2, if thefollowing conditions are satisfied [25]:

x1 � x2 iff

{∀i ∈ {1, . . . ,M} fi(x1) ≤ fi(x2)

∃j ∈ {1, . . . ,M} fj(x1) < fj(x2)

(2)The first condition states that the model x1 isno worse than x2 in all objectives fi. The secondcondition states that the model x1 is strictlybetter than x2 in at least one objective. By usingthis concept, given the population of modelsP , we can partition P into n sets called frontsF1, . . . , Fn, such that Fk dominates Fl where1 ≤ k < l ≤ n and the models within thesame front do not dominate each other. Theso-called Pareto Front or non-dominated set F1

includes models that are not dominated by anymember of P . Essentially, using multi-objectiveoptimization we efficiently find the best fittingmodels over a whole range of model com-plexities using a single coherent optimizationapproach. Figure 2 provides a sketch.

𝑓1

𝑓2

𝐹1𝐹2 𝐹3

𝐹4

Fig. 2: Example of a population P partitionedinto fronts F1, . . . , Fn when minimizing objec-tives f1 and f2. F1 is the Pareto front notdominated by any member of P .

Page 4: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

4

2.4.1 NSGA-IINon-dominated Sorting Genetic Algorithm IIor NSGA-II [26] is a well-known multi-objective evolutionary algorithm (MOEA), stillwidely applied in various fields, such as imageretrieval [27], reactive power planning [28],building design [29], and robot grippers [30].A characteristic feature is fast non-dominatedsorting which sorts models based on the con-cept of domination. With M the number ofobjectives and N the size of population, thetime complexity has order O (MN2), which isbetter than a naıve approach with O (MN3).Another characteristic feature is crowding dis-tance sorting which is implemented to preservethe diversity among the solutions in the Paretofront. This feature sorts models based on thedistance metric which explains the proximityof a model to other models.

The iterative procedure of NSGA-II shownin Figure 3 is a sequence of steps started bygenerating a population of solutions P of sizeN . P is then manipulated by genetic opera-tors such as selection, crossover, and mutation,forming a new population Q of size N . Pand Q are then combined into population Rwith size 2N . After that R is sorted usingfast non-dominated sorting, yielding a set offronts F . In the next iteration each front in Fis sorted using the crowding distance sortingand the first N members are used to generatea new population P . At t = 0, P is formed bycreating N random solutions sorted with fastnon-dominated sorting.

2.5 Stability SelectionStructure estimation is a notoriously difficultproblem, both because of computational as-pects (finding the optimal structure can be NPhard) and because of instability (small changesin the data can lead to completely different op-timal structures). In this section we describe themethod of [31] for robust estimation of modelstructure based on subsampling in combina-tion with selection algorithms. The method hasbeen shown to yield finite sample family wiseerror control and improved structure estimates.

Let β be a sparse p-dimensional vector whichgenerally represents, for example, the coeffi-

𝐹1

𝑄𝑡

𝑅𝑡

𝐹2

𝐹3

𝑃𝑡

𝑃𝑡+1

rejected

Fast non-dominated sorting

Crowding distancesorting

Fig. 3: Adopted from [26]. P is the currentpopulation with size N and is manipulated tomake a new population Q. Both are combined,forming R, which will be sorted using fast non-dominated yielding a set of fronts F . Everymember of front Fn ∈ F will be assigned a so-called crowding distance in order to sort Fk.The first N members of F will be selected tobe the next population P .

cient vector in linear regression or the edgesin a graph. In structure estimation the goalis to infer the set S = {k : βk 6= 0} ofnon-zero components from noisy observations.Many methods tackle this problem by mini-mizing some loss function augmented with aregularization term to avoid overfitting. Usu-ally the regularization term is parameterized byλ ∈ Λ ⊆ R+ and each λ leads to an estimatedstructure Sλ ⊆ {1, . . . , p}. The objective is todetermine λ such that Sλ is identical to S withhigh probability. To this end, [31] introducesthe concepts of selection probabilities and stabilitypaths.

Definition 1 (Selection probabilities). Let I be asubset of {1, . . . , n} of size bn/2c randomly drawnwithout replacement, K ⊆ {1, . . . , p}, and Sλ(I) bethe selected set Sλ for subsample I . The probabilityof K being in set Sλ(I) is

ΠλK = P

(K ⊆ Sλ(I)

)where the probability being is with respect to therandom subsampling and possibly the constructionof Sλ(I).

Page 5: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

5

Definition 2 (Stability path). For each variablek = 1, . . . , p the stability path is given by theselection probabilities {Πλ

k : λ ∈ Λ}.

Furthermore, in stability selection we do notselect a single element from the set of models{Sλ : λ ∈ Λ} as traditional methods do, butperturb the data many times and select struc-tures that occur in a large fraction of selectedsets. To this end, [31] introduces the concept ofstable variables.

Definition 3 (Stable variables). The set of stablevariables is defined as

Sstable = {k : maxλ∈Λ

Πλk ≥ πthr}

where πthr is a cutoff with 0 < πthr < 1.

Variables with a high selection probabilityare kept whereas those with low selection prob-abilities are disregarded. The threshold πthr isa tuning parameter but its influence is smalland sensible values (e.g., πthr ∈ (0.6, 0.9)) tendto give similar results.

2.6 Model EquivalenceThere is one further subtlety that makes ourapproach for finding stable models (or sub-models) slightly more complicated than that in[31]. If we find a particular model, we have toaccount for the fact that there may be differentmodels that are observationally indistinguish-able. Causal models represented by DAGs havetheir corresponding model equivalent classes,called Completed Partially Directed Acyclic Graph(CPDAG). This means that every probabilitydistribution derived from a model in a partic-ular CPDAG, can also be derived by modelsbelonging to the same CPDAG. In SEMs, thesemodels are called covariance equivalent [2].

The characterization of equivalent structuresis given by the following theorem [32].

Theorem 1. (Verma and Pearl, 1990) Two DAGsare equivalent if and only if they have the sameskeletons and the same v-structures.

Furthermore, a directed edge x → y is com-pelled in G if for every DAG G ′ equivalent to G,x→ y exists in G. For any edge e in G, if e is notcompelled in G, then e is reversible. A CPDAG

can be represented by a directed edge (arc) forevery compelled edge and an undirected edgefor every reversible edge [14].

Converting a model into a CPDAG allowsone to observe the relations that hold amongthe variables. Arcs in a CPDAG indicate acause-effect relation among variables since thesame arc occurs in all members of the CPDAG.Undirected edges A− B in a CPDAG indicatethat some members of the CPDAG contain anarc A→ B whereas other members contain anarc B → A.

3 PROPOSED METHOD

3.1 The General Idea

Our proposed method can be divided into twophases. The first phase is search and the secondphase is visualization. In the search phase SEMand NSGA-II are synergically combined forexploratory search of the model space. As por-trayed in Figure 4, the inner loop is an iterativeprocess, searching over the model space andreturns a Pareto front of models. The outer loopis an iterative process that samples a differentsubset of the data in each iteration and at theend returns a number of Pareto fronts comingfrom those subsets. Each model returned by theouter loop is transformed into a CPDAG which

VisualizationSearch

SEM NSGA-IIStability

SelectionGraph

Inner loop

Outer loop

Fig. 4: The proposed method consists of twophases: search and visualization. The searchphase is an iterative process using an outer loopand inner loop that combines SEM, NSGA-II,and stability selection, which outputs all rel-evant edges and causal paths between twovariables. The visualization phase displays therelevant relationships as a causal model.

Page 6: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

6

are then used to compute the edge stability graphand the causal path stability graph.

Definition 4. (Stability graphs) Let A and Bbe two variables and G a multiset (or bag) ofCPDAGS. Let Gc be the submultiset of G con-taining all CPDAGS with complexity c. The edgestability for A and B at complexity c is the numberof models in Gc for which there exists an edgebetween A and B (i.e., A→ B, B → A, or A−B)divided by the total number of models in Gc. Thecausal path stability for A to B at complexity c isthe number of models in Gc for which there is adirected path from A to B (of any length) dividedby the total number of models in Gc. The terms edgestability graph and causal path stability graph areused to denote the corresponding measures for allvariable pairs and all complexity levels.

On top of the stability graphs we performstability selection. In [31], stability selection isdefined in terms of a regularization parameterλ. In our approach we do not have a reg-ularization parameter and instead use modelcomplexity (defined in Section 4.1) which isone of the objectives in our multi-objective op-timization approach. We therefore define twothresholds. The first threshold is the boundaryof selection probability πsel and correspondsto πthr in [31]. For example, setting πsel = 0.6means that all causal relationships with edgestability or causal path stability (Figure 5)above this threshold are considered stable. Thesecond threshold is the boundary of complexityπbic, which is used to control overfitting andcorresponds to minimal λ in [31]. We set πbic

to the level of model complexity at which theminimum average Bayesian Information Criterion(BIC) score is found. For example, πbic = 7means that all causal relationships with anedge stability or a causal path stability lowerthan this threshold (Figure 5) are consideredparsimonious. Causal relationships that intersectwith the top-left region are considered bothstable and parsimonious and called relevant.

In the visualization phase we combine thestability graphs into a graph with nodes andedges. This is done by adding the relevantedges and orienting them using backgroundknowledge (if any, see Section 3.2) and therelevant causal paths. In addition we annotate

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

00.

20.

40.

60.

81

pbic

psel

(a)

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

00.

20.

40.

60.

81

pbic

psel

(b)

Fig. 5: Example stability graphs from an arti-ficial data set of 400 instances with six contin-uous variables, without prior knowledge. (a)Edge stability graph. (b) Causal path stabilitygraph. Each line in (a) represents an edge be-tween a pair of variables and each line in (b)represents a causal path with any length froma variable to another variable. The threshold ofselection probability, πsel, is set to 0.6 and thethreshold for model complexity, πbic, is chosento minimize the average BIC score. See themain text for more details.

each edge with the highest selection probabilityit has across different model complexities in thetop-left region of the edge stability. This visu-alization eases interpretation but the stabilitygraphs are considered to be the main outcomeof our approach.

3.2 Constrained SEMIn practise, one often has prior knowledgeabout the domain, for example, that A does

Page 7: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

7

not cause B directly, denoted by A 6→ B.The method proposed here can include suchprior knowledge, extending previous work[33], since this translates to a DAG with nodirected edge from A to B.1

Model specifications should comply withany prior knowledge when performing spec-ification search and when measuring the edgeand causal path stability. When DAGs areconverted into CPDAGs in the outer loop,a constraint A 6→ B may be violated sincearcs B → A in the DAG may be convertedinto undirected (reversible) edges A − B inthe CPDAG. In order to preserve constraintswe therefore extended the efficient DAG-TO-CPDAG algorithm of [14] which runs in timeO (|E|) given a DAG G = (V,E).

Figure 6 provides pseudocode for the con-strained DAG to CPDAG algorithm. Line 2produces a total ordering E ′ over the edgesin DAG G. Lines 3-6 impose an arc upon theedges that match the constraints. Finally, Line 7uses [14] to label the remaining edges E \E ′ inG with “compelled” or “reversible” and returnsthe constrained CPDAG G′.

A DAG without edges will always be trans-formed into a CPDAG without edges. A fullyconnected DAG without constraints will betransformed into a CPDAG with only undi-rected edges. However, if background knowl-edge is added, a fully connected DAG will betransformed into a CPDAG in which the edgescorresponding to the background knowledgeare directed. From these observations it followsthat in the edge stability graph all paths startwith a selection probability of 0 and end upin a selection probability of 1. In the causalpath stability graph when no prior knowledgehas been added all paths start with a selectionprobability of 0 and end up in a selection prob-ability of 0. However, when prior knowledgeis added some of the paths may end up in aselection probability of 1 because of the addedconstraints.

1. This still allows for directed edges from B to A or indirectrelations from A to B.

3.3 Stable Specification Search AlgorithmFigure 7 provides pseudocode for our approach(cf. Figure 4). Lines 3-18 represent the outerloop, Lines 6-16 represent the inner loops,Lines 19-21 compute stability graphs.

An inner loop (Lines 6-16) starts by forminga population P of size N , initially at ran-dom, or else from a previous population usingcrowding distance sorting (Lines 7-12). Modelsare represented with a binary vector y withyi ∈ {0, 1} denoting the existence of some arcA → B. Line 13 forms a new population Qby manipulating P using binary tournamentselection, one-point crossover, and one-bit flipmutation, which are compatible with a binaryrepresentation. The selection scheme selects Ntimes two models from P and places the bestmodel (i.e., lowest front or else smallest crowd-ing distance) in a mating pool Mpool. One-point crossover takes two models from Mpool

and swaps the data after the crossover point(the middle). One-bit flip mutation flips eachbit according to a predetermined rate. Line 14combines P and Q and sorts them using fastnon-dominated sorting. Line 15 updates thePareto front in F1.

An outer loop (Lines 3-18) randomly sam-ples a subset T from D with size b|D|/2c(Line 4), runs the inner loop I times to obtaina Pareto front (Lines 6-16), and stores it inH (Line 17). After J iterations, H contains JPareto fronts.

Lines 19-21 convert the J Pareto fronts inH from DAGs into CPDAGs using the algo-rithm in Figure 7 and then computes the edgeand causal path stability graphs. The stabilitygraphs are considered to be the main outcomeof our approach, but can also be visualized asa graph with nodes and edges.

4 EXPERIMENTAL STUDY

We implemented the stable specification searchas an R package named stablespec. Thepackage is publicly available at the Compre-hensive R Archive Network (CRAN)2, so it canbe installed directly, e.g., from R console by

2. https://cran.r-project.org/web/packages/stablespec/index.html

Page 8: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

8

1: function consDag2Cpdag(DAG G, constraint C)2: E ′ ← orderEdges(G)3: for every constraint c ∈ C do4: get e ∈ E ′ that matches c5: label e with “compelled” in the direction consistent with c6: end for7: return G′ ← labelEdges(G,E ′) . label remaining edges using [14]8: end function

Fig. 6: The constrained DAG-TO-CPDAG algorithm returns a CPDAG which is consistent withthe added prior knowledge and extends [14]. The algorithm first labels the edges that matchthe constraints with ”compelled” and then labels the remaining edges with ”reversible” or”compelled” using [14].

1: procedure stableSpecificationSearch(data set D, constraint C)2: H ← () . initialize3: for j ← 0, . . . , J − 1 do . J is number of outer loop iterations4: T ← subset of D with size b|D|/2c without replacement5: F1 ← () . initialize Pareto fronts to empty list6: for i← 0, . . . , I − 1 do . I is number of inner loop iterations7: if i = 0 then8: P ← N random DAGs consistent with C9: P ← fastNonDominatedSort(P)

10: else11: P ← crowdingDistanceSort(F) . draw the first N models12: end if13: Q← make population from P14: F ← fastNonDominatedSort(P_Q)15: F1 ← pareto front of F and F1

16: end for17: H ← H_F1 . concatenation18: end for19: G← consDag2Cpdag(H, C)20: edges ← edge stability of G21: paths ← path stability of G22: end procedure

Fig. 7: Stable specification search consists of an outer and an inner loop. The outer loop samples asubset of the data, and for every subset, the inner loop searches for the Pareto front by applyingNSGA-II. The Pareto fronts are converted into constrained CPDAGs which are then used tocompute the edge and causal path stability graph.

typing install.package("stablespec")or from RStudio by using feature to installpackage. We also included a package documen-tation as a brief tutorial of using the functions.All experiments were run on an Intel Xeon E7-4870 v2 Processor 2.3 GHz, 15 Core, 96 of 32GBLRDIMM.

4.1 Parameter Settings

For all experiments, we employed the sameset of NSGA-II parameters and stabilitythresholds. We had 100 iterations in the outerloop, and in each iteration we drew a sub-sample with size b|D|/2c. We did not do acomprehensive parameter tuning for NSGA-II,

Page 9: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

9

instead, we followed guidelines provided in[34]. The parameters were set as follows: thenumber of generations (inner loop) was 20, thesize of the population P was 100, the crossoverrate was 0.85, the mutation rate was 0.075 andwith binary tournament selection.

We score models using the chi-square χ2 andthe model complexity. The χ2 is considered theoriginal fit index in SEM and measures whetherthe model-implied covariance matrix is closeenough to the sample covariance matrix [35].

The model complexity represents how manypredicted parameters the model contains. As-suming that variances of parameters are al-ways predicted, the maximum model complex-ity with n variables is given by n(n− 1)/2.

When using multi-objective optimization weminimize both the χ2 and model complexityobjectives. These two objectives are, however,conflicting with each other. For example, min-imizing the model complexity typically meanscompromising the data fit.

4.2 Application to Simulated Data4.2.1 Data GenerationIn this experiment we generated data using theWaste Incinerator network in Figure 8, which isa model of waste emissions from an incineratorplant [36]. This model contains both discreteand continuous random variables, with B thewaste burning regimen, W the compositionaldifferences in incoming waste, C the concen-tration of CO2, F the filter state, E the filterefficiency, L the light penetrability, D the emis-sion of dust, Min the metals in waste, and Mout

the metals emission. Following [37], we treatall discrete variables as continuous. We addedprior knowledge that none of the variablesdirectly cause the filter state. We generated 10data sets containing 400 samples from this net-work using the BNT toolbox with the defaultparameter setting as described in [38].

4.2.2 Performance MeasureWe compared the stable specification searchwith GES (score-based method), PC-stable, andCPC (both constrained-based methods). Ourmethod intrinsically subdivides the data in anumber of subsets, here 50 of size 200 samples,

B F W

C E Min

L D Mout

Fig. 8: The Waste Incinerator network. Rectan-gular nodes represent discrete variables, ovalnodes represent continuous variables, and arcsrepresent direct causal relations.

and then runs the multi-objective optimizationto obtain 50 Pareto fronts (see Section 2.4).For a fair comparison, for each algorithm weconsider subsampling (e.g., [39]), giving eachmethod 50 subsets. For every subset, each al-gorithm returns a CPDAG from which we canderive the edges and causal paths.

Since the true model of the Waste Inciner-ator data is known, we can measure the per-formance of both methods by means of theReceiver Operating Characteristic (ROC) curve[40]. The True Positive Rate (TPR) and the FalsePositive Rate (FPR) are computed with respectto the CPDAG of the true model. For example,in the case of causal path stability, a true posi-tive means that a causal path with any lengthobtained through our approach or the PC al-gorithm is actually present in the CPDAG ofthe true model. By increasing the threshold πsel,we increase the TPR at the expense of the FPR.In addition, we conducted three significancetests to compare the ROC curves. The first test[41] compares the Area Under the Curve (AUC)of the ROC curves based on the theory of U-statistics. The second test [42], a modificationof [43], compares the AUC of ROC curvesthat are generated from bootstrap replicates.The third test [44] compares the actual ROCcurves by evaluating the absolute difference.The null hypothesis is that the AUC of the ROCcurves of our method and the PC algorithm areequivalent.

We repeated the above procedure 10 timeson different Waste Incinerator data sets andcomputed the ROC curves using two differentschemes: averaging and individual. In the av-eraging scheme, the ROC curves are computed

Page 10: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

10

based on the average edge and causal pathstability from different data sets. We conductedstatistical significance tests on these averageROC curves. Conversely, in the individualscheme the ROC curves are computed directlyfrom the edge and causal path stability on eachdata set. We conducted individual statisticalsignificance tests on the ROC curves for eachdata set and then used Fisher’s method, as de-scribed in [45], [46], to combine these tests intoa single test statistic. Both schemes are intendedto show empirically and comprehensively howrobust the results of each algorithm are acrosschanges in the data.

4.2.3 Discussion of Waste Incinerator ResultFigure 9 shows the ROC curves for (a) theedge stability and (b) the causal path stabilityfrom the averaging scheme. The correspondingAUCs for edge stability are 0.96 (stable specifi-cation search), 0.89 (PC-stable), 0.88 (CPC), and0.69 (GES). The AUCs for causal path stabilityare 0.98 (stable specification search), 0.85 (PC-stable), 0.88 (CPC), and 0.61 (GES).

Table 1 lists the results of the significancetests for both the averaging and individualschemes. The ROC and AUC for the edgestability are comparable with PC-stable andCPC (p-value > 0.1), but always significant(p-value < 0.01) compared with GES. TheROC and AUC for the causal path stabilitycompared with CPC are marginally significant(p-value < 0.1) using the averaging scheme,but significant using the individual scheme (p-value < 0.01); compared with PC-stable sig-nificant (p-value < 0.05) using the averagingscheme, but highly significant using the in-dividual scheme (p-value < 10−5); comparedwith GES highly significant using both schemes(p-value < 10−5). To conclude, we show thatthe stable specification search obtains at leastcomparable performance as, but often signifi-cant improvement over alternative approaches,especially in obtaining the causal relations.

4.3 Application to Real-world DataThis section describes the results of applyingour proposed method on two real-world datasets. Both of them are about particular diseases,

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC-stableCPCGESStable Specification Search

(a)

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

FPR

TP

R

PC-stableCPCGESStable Specification Search

(b)

Fig. 9: ROC curves for (a) the edge stability and(b) the causal path stability, for different valuesof πsel in the range of [0, 1]. In (a), the AUCsare 0.96 (stable specification search), 0.89 (PC-stable), 0.88 (CPC), and 0.69 (GES). In (b), TheAUCs are 0.98 (stable specification search), 0.85(PC-stable), 0.88 (CPC), and 0.61 (GES).

for which the underlying causal relationshipsare often not clear. Revealing such causal rela-tionships can lead to the development of (new)dedicated treatments and medications. Here,we consider data on Attention Deficit Hyper-activity Disorder (ADHD) and Chronic FatigueSyndrome (CFS).

4.3.1 Performance Measure

Since the true model is unknown we measurethe performance of our method using the edgestability and causal path stability graphs. Weset the thresholds to πsel = 0.6 and πbic tothe minimum average of BIC scores. The rel-

Page 11: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

11

TABLE 1: Table of p-values from comparisons between stable specification search and alternativeapproaches. For each significance test, we compared the ROC of the edge (Edge) and causal path(Causal) stability on both averaging (Ave.) and individual (Ind.) schemes.

GES PC-stable CPCSignificance test Ave. Ind. Ave. Ind. Ave. Ind.

DeLong [41] Edge 0.003 < 10−5 0.317 0.175 0.284 0.131Causal < 10−5 < 10−5 0.027 < 10−5 0.073 < 10−5

Bootstrap [42] Edge 0.003 < 10−5 0.296 0.135 0.261 0.098Causal < 10−5 < 10−5 0.022 < 10−5 0.064 < 10−5

Venkatraman [44] Edge 0.004 < 10−5 0.591 0.684 0.539 0.592Causal < 10−5 < 10−5 0.023 < 10−5 0.096 0.005

evant causal relations are those which occur inthe top-left region (see Figure 5 as example).We compare the stability graphs to studiesreported in the literature.

4.3.2 Application to CFS

In this experiment we consider a data set aboutChronic Fatigue Syndrome (CFS) of 183 subjects[47]. Originally the data comes from a longi-tudinal study with five time slices, but in thispaper, we focus only on one time slice repre-senting the subjects after the first treatment.

The data set contains six discrete variables;fatigue severity assessed with the subscalefatigue severity of the Checklist IndividualStrength (CIS), the sense of control over fa-tigue assessed with the self-efficacy scale (SES),focusing on symptoms measured with the Ill-ness Management Questionnaire, the objective ac-tivity of the patient measured using an actome-ter (oActivity), the subject’s perceived activitymeasured with the subscale activity of the CIS(pActivity), and physical functioning measuredwith subscale physical functioning of the medi-cal outcomes survey (SF36). We refer to the origi-nal paper [47], for a detailed description of thequestionnaires used and the actometer. Miss-ing values were imputed using an imputationmethod Expectation Maximization implementedin SPSS [48]. As all of the variables have largescales, e.g., in the range between 0 to 155, wetreat them as continuous variables. We addedprior knowledge that the variable fatigue doesnot cause any of the other variables directly.

The total computation time for one subsetwas around 5.5 minutes. Figure 10 shows that

eight relevant edges were found. These edgesare between pActivity and fatigue, focusing andfatigue, functioning and fatigue, control andfatigue, pActivity and focusing, pActivity andoActivity, focusing and control, and functioningand control.

Figure 10 shows that four relevant causalpaths were found. These causal pathsare: pActivity to fatigue, control to fatigue,functioning to fatigue, and focusing to fatigue.

The stability graphs can be combined into amodel as follows. First, the nodes are connectedaccording to the eight relevant edges obtained.Second, the edges are oriented according to thebackground knowledge added. The fact thatthe variable fatigue does not directly cause anyother variable results in four directed edges,which, in this case, correspond exactly to therelevant causal paths obtained. The inferredmodel is shown in Figure 11.

A (direct) causal path X → Y in Figure 11indicates that a change in variable X causesa change in variable Y . All variables exceptfor objective activity were found to be directcauses for fatigue severity, which are corrob-orated by literature studies. In [49], changesin physical activity, sense of control, and focuson symptoms measured, were shown to resultin changes in fatigue. In [50], changes in per-ceived activity, sense of control, and physicalfunctioning were shown to result in changes infatigue. In [47], an increase in sense of control,perceived activity, and self-reported physicalfunctioning, as well as a decrease in focusingon symptoms resulted in a decrease of fatigue,whereas changes in objective activity did not

Page 12: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

12

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

00.

20.

40.

60.

81

pbic

psel

(a)

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

00.

20.

40.

60.

81

pbic

psel

(b)

Fig. 10: The stability graphs for CFS togetherwith πsel and πbic, yielding four regions. Thetop-left region is the area containing the rel-evant causal relations. (a) The edge stabilitygraph showing eight relevant edges. (b) Thecausal path stability graph showing four rel-evant causal paths. See Tables 2 and 3 in Ap-pendix A for more detail.

result in any change in fatigue.

4.3.3 Application to ADHDIn this experiment we consider a data set aboutAttention Deficit Hyperactivity Disorder (ADHD)of 245 subjects with 23 variables [51]. Following[52], we excluded instances with missing val-ues and variables that either have insufficientinstances or are considered irrelevant. The re-maining data set consists of 221 instances andsix variables with gender the gender of subjects,AD the attention deficit measure, HI the assess-ment of hyperactivity/impulsivity symptoms,aggression the measure of aggressive behavior,

pActivity

control

functioning

focusing

fatigue

0.98

0.97

0.97

0.71

oActivity

0.64

0.86

0.94

0.84

Fig. 11: The inferred model of CFS by combin-ing the edge stability and causal path stabilitygraphs. Each edge has a reliability score whichis the highest selection probability in the top-left region of the edge stability graph.

medication the medication status of subjects,and handedness represents whether a subjectuses the right and/or left hand. Following [37],we treat all discrete variables as continuousvariables. We added prior knowledge that thevariable gender does not cause any of the othervariables directly.

The total computation time for one subsetwas around 4.9 minutes. Figure 12 shows thatthere are four relevant edges, namely betweengender and AD, AD and medication, AD andHI, and HI and aggression. Moreover, Figure 12shows that there are seven relevant causalpaths; gender to AD, gender to HI, gender tomedication, gender to aggression, AD to HI, ADto medication, and AD to aggression.

The stability graphs can be combined into amodel as follows. First, the nodes are connectedaccording to the four relevant edges obtained.Second, the edges are oriented according tothe background knowledge added. The factthat the variable gender does not directly causeany other variable results in one directed edgegender → AD. Third, the edges are orientedaccording to the relevant causal paths obtained.This results in two directed edges, AD → HIand AD → medication. Since there is no rele-vant edge between AD and aggression and norelevant causal path from HI to aggression wecannot orient any other edges and therefore

Page 13: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

13

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

00.

20.

40.

60.

81

pbic

psel

(a)

Model complexity

Sel

ectio

n pr

obab

ility

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

00.

20.

40.

60.

81

pbic

psel

(b)

Fig. 12: The stability graphs for ADHD togetherwith πsel and πbic, yielding four regions. Thetop-left region is the area containing the rel-evant causal relations. (a) The edge stabilitygraph showing four relevant edges. (b) Thecausal path stability graph showing seven rel-evant causal paths. See Tables 4 and 5 in Ap-pendix A for more detail.

cannot represent two of the relevant causalpaths in the model. We loose some informationwhen converting the stability graphs into amodel. The inferred model is shown in Fig-ure 13.

The causal relations obtained for ADHD arecorroborated by studies reported in the liter-ature. In [52], gender is shown to be a directcause for attention deficit, attention deficit isshown to be a direct cause for both hyperac-tivity, medication, and aggression, and hyper-activity and aggression are related but neithervariable is a direct cause for the other.

gender

AD medication

aggression

HI

handedness

0.74

1 0.73

0.73

Fig. 13: The inferred model of ADHD by com-bining the edge stability and causal path sta-bility graphs. Each edge has a reliability scorewhich is the highest selection probability in thetop-left region of the edge stability graph

.

5 CONCLUSION AND FUTURE WORK

In the last decades the field of causal modelinghas seen a surge in theoretical developmentand the construction of various causal discov-ery algorithms. In general, causal discoveryalgorithms can be divided into two approaches:constraint-based and score-based. A disadvan-tage, however, of current causal discovery al-gorithms is the inherent instability of structureestimation. With finite samples small changesin the data can lead to completely differentoptimal structures.

The present work introduces a newhypothesis-free score-based causal discoveryalgorithm, stable specification search, that isrobust for finite samples based on subsamplingand selection algorithms. Our approachuses exploratory search to search overStructural Equation Models and allows for theincorporation of prior background knowledge,without the need to specify the completemodel structure in advance.

The comparison conducted on the simulateddata shows that our method, the stable specifi-cation search, shows significant improvementover alternative approaches in obtaining thecausal relations. The results on both real-worlddata sets, CFS and ADHD, are consistent withprevious studies [47], [49], [50], [52]. In generalwe may conclude that our causal discoveryalgorithm is able to robustly estimate the un-derlying causal structure.

Several issues have not yet been exploredin our current approach that warrant further

Page 14: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

14

research, such as latent variables and longitu-dinal data. Taking into account the existence oflatent variables can further improve our struc-ture estimate by properly identifying depen-dencies between variables as an unmeasuredcommon cause acting on both variables. Inlongitudinal data several subjects are measuredat different time slices which provides a richerstructure that can be incorporated in the causaldiscovery algorithm. A first attempt in thisdirection can be found in [53].

Our approach can be viewed as a novel ap-plication of multi-objective optimization. Themain idea of stability selection [31], is to in-crease the robustness of structure estimationby considering a whole range of model com-plexities. In the original work, this is done byvarying a continuous regularization parame-ter. For causal discovery we have to explicitlyconsider different discrete model complexities.Furthermore, finding the optimal structure foreach model complexity is a hard optimizationproblem. By rephrasing stability selection as amulti-objective optimization problem, we canjointly run over various model complexitiesand find the corresponding optimal structuresfor each model complexity. In this paper, wehave used NSGA-II for multi-objective opti-mization, because of its popularity and avail-ability, but realize that more recent multi-objective optimization approaches [54], [55],[56], [57] may be even more efficient. This is be-yond the scope of this work and left for futureresearch. In the same spirit, one can easily com-bine freely available software packages, e.g., forscoring Structural Equation Models, bootstrapsampling, and multi-objective optimization, tobuild one’s own robust structural estimationapproach.

ACKNOWLEDGMENTSThe research leading to these results has re-ceived funding from the DGHE of Indone-sia and the European Community’s SeventhFramework Programme (FP7/2007-2013) undergrant agreement n◦ 305697.

REFERENCES[1] B. van Engelen and the OPTIMISTIC consortium, “Cog-

nitive behaviour therapy plus aerobic exercise training to

increase activity in patients with myotonic dystrophy type1 (DM1) compared to usual care (OPTIMISTIC): studyprotocol for randomised controlled trial,” Trials, vol. 16,no. 224.

[2] J. Pearl, Causality: models, reasoning and inference. Cam-bridge University Press, 2000, vol. 29.

[3] J. Pearl and T. Verma, A theory of inferred causation. Mor-gan Kaufmann San Mateo, CA, 1991.

[4] P. Spirtes, C. N. Glymour, and R. Scheines, Causation,prediction, and search. MIT press, 2000, vol. 81.

[5] J.-P. Pellet and A. Elisseeff, “Using Markov blankets forcausal structure learning,” The Journal of Machine LearningResearch, vol. 9, pp. 1295–1342, 2008.

[6] P. Spirtes, “Introduction to causal inference,” The Journalof Machine Learning Research, vol. 11, pp. 1643–1662, 2010.

[7] A. P. Dawid, “Present position and potential develop-ments: Some personal views: Statistical theory: The pre-quential approach,” Journal of the Royal Statistical Society.Series A (General), pp. 278–292, 1984.

[8] G. Schwarz, “Estimating the dimension of a model,” Theannals of statistics, vol. 6, no. 2, pp. 461–464, 1978.

[9] D. M. Chickering, “Optimal structure identification withgreedy search,” Journal of machine learning research, vol. 3,no. Nov, pp. 507–554, 2002.

[10] D. Heckerman, C. Meek, and G. Cooper, “A Bayesianapproach to causal discovery,” Computation, causation, anddiscovery, vol. 19, pp. 141–166, 1999.

[11] J. Ramsey, J. Zhang, and P. L. Spirtes, “Adjacency-faithfulness and conservative causal inference,” arXivpreprint arXiv:1206.6843, 2012.

[12] J. Lemeire, S. Meganck, F. Cartella, and T. Liu, “Conser-vative independence-based causal structure learning inabsence of adjacency faithfulness,” International Journal ofApproximate Reasoning, vol. 53, no. 9, pp. 1305–1325, 2012.

[13] D. Colombo and M. H. Maathuis, “Order-independentconstraint-based causal structure learning.” Journal of Ma-chine Learning Research, vol. 15, no. 1, pp. 3741–3782, 2014.

[14] D. M. Chickering, “Learning equivalence classes ofBayesian-network structures,” The Journal of MachineLearning Research, vol. 2, pp. 445–498, 2002.

[15] R. MacCallum, “Specification searches in covariancestructure modeling,” Psychological Bulletin, vol. 100, no. 1,pp. 107–120, 1986.

[16] E. E. Leamer, Specification searches: Ad hoc inference withnonexperimental data. Wiley New York, 1978.

[17] J. S. Long, Covariance structure models: An introduction toLISREL. Sage, 1983, no. 34.

[18] G. A. Marcoulides, Z. Drezner, and R. E. Schumacker,“Model specification searches in structural equation mod-eling using tabu search,” Structural Equation Modeling: AMultidisciplinary Journal, vol. 5, no. 4, pp. 365–376, 1998.

[19] G. A. Marcoulides and Z. Drezner, “Specification searchesin structural equation modeling with a genetic algo-rithm,” New developments and techniques in structural equa-tion modeling, pp. 247–268, 2001.

[20] H. Murohashi and H. Toyoda, “Model specification searchusing a genetic algorithm with factor reordering for a sim-ple structure factor analysis model,” Japanese PsychologicalResearch, vol. 49, no. 3, pp. 179–191, 2007.

[21] G. A. Marcoulides and Z. Drezner, “Model specifica-tion searches using ant colony optimization algorithms,”Structural Equation Modeling: A Multidisciplinary Journal,vol. 10, no. 1, pp. 154–164, 2003.

[22] J. Herting and H. Costner, “Respecification in multiple

Page 15: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

15

indicator models,” Causal Models in the Social Sciences:, pp.321–393, 1985.

[23] P. Spirtes, R. Scheines, and C. Glymour, “Simulation stud-ies of the reliability of computer-aided model specificationusing the TETRAD II, EQS, and LISREL programs,” Soci-ological Methods & Research, vol. 19, no. 1, pp. 3–66, 1990.

[24] W. E. Saris, A. Satorra, and D. Sorbom, “The detection andcorrection of specification errors in structural equationmodels,” Sociological methodology, vol. 17, pp. 105–129,1987.

[25] K. Deb, Multi-objective optimization using evolutionary algo-rithms. John Wiley & Sons Chichester, 2001, vol. 2012.

[26] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “Afast and elitist multiobjective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6,no. 2, pp. 182–197, 2002.

[27] M. Arevalillo-Herraez, F. J. Ferri, and S. Moreno-Picot, “Ahybrid multi-objective optimization algorithm for contentbased image retrieval,” Applied Soft Computing, vol. 13,no. 11, pp. 4358–4369, 2013.

[28] F. Hajabdollahi, Z. Hajabdollahi, and H. Hajabdollahi,“Soft computing based multi-objective optimization ofsteam cycle power plant using NSGA-II and ANN,”Applied Soft Computing, vol. 12, no. 11, pp. 3648–3655,2012.

[29] A. E. Brownlee and J. A. Wright, “Constrained, mixed-integer and multi-objective optimisation of building de-signs by NSGA-II with fitness approximation,” AppliedSoft Computing, vol. 33, pp. 114–126, 2015.

[30] R. Saravanan, S. Ramabalan, N. G. R. Ebenezer, andC. Dharmaraja, “Evolutionary multi criteria design opti-mization of robot grippers,” Applied Soft Computing, vol. 9,no. 1, pp. 159–172, 2009.

[31] N. Meinshausen and P. Buhlmann, “Stability selection,”Journal of the Royal Statistical Society: Series B (StatisticalMethodology), vol. 72, no. 4, pp. 417–473, 2010.

[32] T. Verma and J. Pearl, “Equivalence and synthesis ofcausal models,” in Proceedings of the Sixth Annual Con-ference on Uncertainty in Artificial Intelligence. ElsevierScience Inc., 1990, pp. 255–270.

[33] R. Rahmadi, P. Groot, and T. Heskes, “Stable specifi-cation searches in structural equation modeling usingmulti-objective evolutionary algorithm,” in Proceedings ofSNATI, 2014.

[34] J. J. Grefenstette, “Optimization of control parameters forgenetic algorithms,” Systems, Man and Cybernetics, IEEETransactions on, vol. 16, no. 1, pp. 122–128, 1986.

[35] R. Kline, Principles and Practice of Structural Equation Mod-eling, ser. Methodology in the social sciences. GuilfordPress, 2011.

[36] S. L. Lauritzen, “Propagation of probabilities, means, andvariances in mixed graphical association models,” Journalof the American Statistical Association, vol. 87, no. 420, pp.1098–1108, 1992.

[37] M. Rhemtulla, P. E. Brosseau-Liard, and V. Savalei, “Whencan categorical variables be treated as continuous? Acomparison of robust continuous and categorical SEMestimation methods under suboptimal conditions.” Psy-chological methods, vol. 17, no. 3, pp. 354–373, 2012.

[38] K. Murphy, “The Bayes Net Toolbox for Matlab,” Com-puting science and statistics, vol. 33, no. 2, pp. 1024–1034,2001.

[39] J. Ramsey, “Bootstrapping the PC and CPC algorithms toimprove search accuracy,” 2010.

[40] T. Fawcett, “ROC graphs: Notes and practical considera-

tions for researchers,” Machine learning, vol. 31, pp. 1–38,2004.

[41] E. R. DeLong, D. M. DeLong, and D. L. Clarke-Pearson,“Comparing the areas under two or more correlatedreceiver operating characteristic curves: a nonparametricapproach,” Biometrics, pp. 837–845, 1988.

[42] X. Robin, N. Turck, A. Hainard, N. Tiberti, F. Lisacek, J.-C.Sanchez, and M. Muller, “proc: an open-source packagefor R and S+ to analyze and compare ROC curves,” BMCbioinformatics, vol. 12, no. 1, p. 77, 2011.

[43] J. A. Hanley and B. J. McNeil, “A method of comparingthe areas under receiver operating characteristic curvesderived from the same cases.” Radiology, vol. 148, no. 3,pp. 839–843, 1983.

[44] E. Venkatraman and C. B. Begg, “A distribution-freeprocedure for comparing receiver operating characteristiccurves from a paired experiment,” Biometrika, vol. 83,no. 4, pp. 835–848, 1996.

[45] R. A. Fisher, Statistical methods for research workers. Gen-esis Publishing Pvt Ltd, 1925.

[46] R. A. F. Frederick Mosteller, “Questions and answers,”The American Statistician, vol. 2, no. 5, pp. 30–31, 1948.[Online]. Available: http://www.jstor.org/stable/2681650

[47] M. J. Heins, H. Knoop, W. J. Burk, and G. Bleijenberg,“The process of cognitive behaviour therapy for chronicfatigue syndrome: Which changes in perpetuating cogni-tions and behaviour are related to a reduction in fatigue?”Journal of psychosomatic research, vol. 75, no. 3, pp. 235–241,2013.

[48] IBM SPSS Statistics for Windows, Version 19.0., IBM Corp.,Armonk, NY, 2010.

[49] J. Vercoulen, C. Swanink, J. Galama, J. Fennis, P. Jongen,O. Hommes, J. Van der Meer, and G. Bleijenberg, “Thepersistence of fatigue in chronic fatigue syndrome andmultiple sclerosis: development of a model,” Journal ofpsychosomatic research, vol. 45, no. 6, pp. 507–517, 1998.

[50] J. F. Wiborg, H. Knoop, L. E. Frank, and G. Bleijenberg,“Towards an evidence-based treatment model for cogni-tive behavioral interventions focusing on chronic fatiguesyndrome,” Journal of psychosomatic research, vol. 72, no. 5,pp. 399–404, 2012.

[51] Q. Cao, Y. Zang, L. Sun, M. Sui, X. Long, Q. Zou, andY. Wang, “Abnormal neural activity in children with at-tention deficit hyperactivity disorder: a resting-state func-tional magnetic resonance imaging study,” Neuroreport,vol. 17, no. 10, pp. 1033–1036, 2006.

[52] E. Sokolova, P. Groot, T. Claassen, and T. Heskes, “Causaldiscovery from databases with discrete and continuousvariables,” in Probabilistic Graphical Models. Springer,2014, pp. 442–457.

[53] R. Rahmadi, P. Groot, M. Heins, H. Knoop, and T. Hes-kes, “Causality on longitudinal data: Stable specificationsearch in constrained structural equation modeling,” Pro-ceedings of AALTD 2015, p. 101, 2015.

[54] Y. Qi, Z. Hou, M. Yin, H. Sun, and J. Huang, “An immunemulti-objective optimization algorithm with differentialevolution inspired recombination,” Applied Soft Comput-ing, vol. 29, pp. 395–410, 2015.

[55] S. Kukkonen and J. Lampinen, “Gde3: The third evolutionstep of generalized differential evolution,” in EvolutionaryComputation, 2005. The 2005 IEEE Congress on, vol. 1.IEEE, 2005, pp. 443–450.

[56] Q. Zhang and H. Li, “MOEA/D: A multiobjective evolu-tionary algorithm based on decomposition,” Evolutionary

Page 16: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

16

Computation, IEEE Transactions on, vol. 11, no. 6, pp. 712–731, 2007.

[57] H. Taboada, J. F. Espiritu, and D. W. Coit, “Moms-ga: Amulti-objective multi-state genetic algorithm for systemreliability optimization design problems,” Reliability, IEEETransactions on, vol. 57, no. 1, pp. 182–191, 2008.

APPENDIX A

Page 17: Causality on Cross-Sectional Data: Stable Specification ... · Causality on Cross-Sectional Data: Stable Specification Search in Constrained Structural Equation Modeling Ridho Rahmadi,

17

TABLE 2: Edge stability of CFS

Lines Edgesfatigue and pActivity

● ● ● fatigue and controlcontrol and focusing

fatigue and functioningcontrol and functioningfocusing and pActivityfatigue and focusing

● ● ● pActivity and oActivityfunctioning and pActivitycontrol and pActivitycontrol and oActivityfocusing and oActivityfatigue and oActivity

● ● ● fucntioning and oActivityfunctioning and focusing

TABLE 3: Causal path stability of CFS

Lines Causal PathspActivity to fatigue

● ● ● control to fatiguefunctioning to fatiguefocusing to fatigueoActivity to fatiguefocusing to pActivity

functioning to pActivity● ● ● oActivity to pActivity

control to pActivityfunctioning to oActivityfocusing to oActivityfocusing to controlcontrol to oActivity

● ● ● functioning to controlpActivity to oActivitycontrol to functioning

oActivity to functioningoActivity to controloActivity to focusing

● ● ● control to focusingpActivity to functioningfocusing to functioningfunctioning to focusingpActivity to controlpActivity to focusing

● ● ● fatigue to pActivityfatigue to oActivityfatigue to focusing

fatigue to functioningfatigue to control

TABLE 4: Edge stability of ADHD

Lines EdgesAD and HI

AD and medicationHI and aggressiongender and AD

● ● ● aggression and ADAD and medication

handedness and aggressionmedication and aggression

● ● ● gender and HIHI and handedness

gender and medicationgender and handednessAD and handednessgender and aggression

● ● ● medication and handedness

TABLE 5: Causal path stability of ADHD

Lines Causal Pathsgender to ADgender to HI

gender to medicationAD to medication

AD to HIAD to aggression

gender to aggressionHI to aggressionHI to medication

aggression to medication● ● ● aggression to HI

HI to ADmedication to aggressionhandedness to aggression

● ● ● gender to handednessaggression to handedness

AD to handednessHI to handednesshandedness to HI

● ● ● handedness to medicationmedication to HI

aggression to gendermedication to handedness

aggression to HI● ● ● medication to AD

AD to gender● ● ● HI to gender

medication to genderaggression to genderhandedness to gender