Mining Top-k High Persistent Causal Rules Through Non ...nugget.unisa.edu.au/Thuc/zhou.pdf · 3958 Z. Jin et al. /Journal of Computational Information Systems 10: 9 (2014) 3955{3963

Journal of Computational Information Systems 10: 9 (2014) 3955–3963Available at http://www.Jofcis.com

Mining Top-k High Persistent Causal Rules Through

Non-graphic Methodology ?

Zhou JIN 1,2,∗, Rujing WANG 2, He HUANG 2, Yimin HU 2

1Department of Automation, University of Science and Technology of China, Hefei 230031, China2Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei 230031, China

Abstract

Causality discovery plays an important role in the community of knowledge discovery. Current workon finding persistent causal relationship is rooted in the theory of Bayesian network which applyinggraphical structure to constrict causal relationship, however, proved to be a NP-complete problem.Several constraint-based algorithms focus on local dependence relationship and discover the relationshipwhose significance higher than a user-specified minimum significance threshold mini sign. However it ischallenging to set an appropriate mini sign by trial and error for users. In this paper, we propose a non-graphical approach to mine top-k causal rules based on their persistency. An efficient top-k causal rulemining algorithm named TKCR is designed for mining such causal rules without predefining significancethreshold. Experiments on real and synthetic datasets show that our algorithm can effectively discovercausal rules in large databases and has good scalability.

Keywords: Data Mining; Causality; Top-k Pattern; Causal Rule

1 Introduction

Association rule mining is a fundamental research topic in data mining, which finds all rules inthe databases that satisfy minimum support and minimum confidence constraints. However, theframework of association rule mining usually produces mass of rules, which may contain uselessand redundant result. Hence, the inherent mechanism of data cannot be revealed in associationrule mining. Causal relationship is more powerful than associated relationship. Mining causalrules from the observational data refers to find the causal relationship between variables, withthe purpose of revealing complicated interactions among the factors of a system or process. Thecausal relationship should be generally much more useful and reliable to be applied by users invarious applications.

?Project supported by the National Nature Science Foundation of China (No. 31171456); The National KeyTechnology Research and Development Program of the Ministry of Science and Technology (No. 2013BAD15B03).

∗Corresponding author.Email address: [email protected] (Zhou JIN).

1553–9105 / Copyright © 2014 Binary Information PressDOI: 10.12733/jcis10424May 1, 2014

3956 Z. Jin et al. /Journal of Computational Information Systems 10: 9 (2014) 3955–3963

Current works around causality discovery mostly based on graphical structures. Pearl [1] pro-posed a framework that connected conditional independence with causal graphical structures.Some improved methods apply the idea of causal Bayesian network directly or indirectly to gen-erate a directed acyclic graph (DAG) for representing the conditional independence between vari-ables. Discovering complete or local causal graphical structures require high computational cost,hence several constraint-based algorithms have been presented to discover causal structures andproduced good results [2, 3, 4]. These methods use observational data to determine conditionalindependence of variables based on user specified significance threshold. However, it is crucialto set an appropriate mini sign for causal relationship mining in various complex systems. Theerror rate of false negatives will be large if mini sign is too low, producing too many redundantand causing the mining algorithms inefficient; On the other hand, the error rate of false positivewill be large if mini sign is too high, and there are no causal rules will be found.

In this paper, we propose a non-graphical approach to mine top-k high persistent causal rulesin large databases. An efficient algorithm is proposed for mining such causal rules without pre-defining significance threshold. Compared to normal causal structures, top-k causal rues containricher information and are easier to understand. Therefore the extraction of such lists can helpenrich exiting knowledge bases about general concepts, or act as a pre-processing step to producefacts for a fact answering engine. Experiments on real and synthetic datasets show that ourmethod can effectively discover causal rules in large databases and has good scalability.

2 Preliminaries and Definition

In this section, we introduce the preliminaries related to causal rule discovery and provide aformal definition for top-k high persistent causal rule mining.

2.1 Causal structure discovery

Causal structure discovery in data is to find a short list of relationships that are most likely causal.Since there is no commonly admitted definition for causal relationship, it is a challenge to discovercausal structures from observational data. Constraint-based approaches do not need to generatecomplete graphic structures and have been developed for efficiently discover causal relationshipwithout graphical structures, including LCD [2], CCC & CCU [3] and GLL [4]. However, Thesemethods normally output some fixed structures in DAG graphs, direct or indirectly adoptingthe idea of Bayesian learning for the causal structures discovery. Causal structures in the formof associated rule can capture the action form of causality and can be discovered in a generalframework. This distinguishes the causal rule discovery from the normal rule discovery.

2.2 Top-k frequent itemset mining

The common framework of frequent itemset mining requires a specified minimum support toensure the generate complete patterns. However, it is challenging to specify an appropriateminimum support threshold. To facilitate the significant frequent patterns, an alternative methodis to mine the top-k frequent patterns. Various top-k pattern mining algorithms have beenproposed [5, 6, 7, 8, 9, 10]. The task of top-k frequent patterns mining aim to find the k most

Z. Jin et al. /Journal of Computational Information Systems 10: 9 (2014) 3955–3963 3957

frequent patterns, where k is the desired number of frequent patterns.

Top-k frequent patterns mining provides a user-friendly path to find the potential top-k qualitypatterns. In order to embody usability and user-friendliness, top-k patterns mining permits user tofind k most interesting frequent patterns without providing a support threshold. The problem ofmining top-k frequent patterns is first discussed in [10]. A straightforward solution for addressingthis task is to directly choose a minimum support and mining the complete frequent patternsset N . If N contains more than k patterns, we sort the patterns and select the k most frequentpatterns; Otherwise, we reduce minimum support and repeat the processing until finding thetop-k frequent patterns. Obviously, the straightforward solution is computational costly for therepeatedly and redundantly execution. A better alternative is to dynamically raise the minimumsupport threshold during the process of mining algorithm. This strategy avoids mining redundantpatterns repeatedly and has been adopted by many literatures, such as [7, 10].

Extensive studies have been proposed for top-k pattern mining, but it is difficult to adaptthis idea to high persistent causal rules mining. The reason is that the causal rule mining isperformed in two phases and that the exact persistency of the causal rules is only known duringthe second phase. Therefore, mining the top-k potential causal rules during the first phase wouldnot necessarily result in finding the top-k causal rules in the end. Although there are many studiesabout causal structure discovery and top-k pattern mining, few of them focus on the integrationof mining top-k high persistent causal rules. This paper addresses this task to find top-k highpersistent casual rules.

2.3 Problem definition

Let I = {i1, i2, ..., im} be a set of binary distinct item, the value 1 indicates yes, and value 0indicates no. A pattern P is a set of items in which each item values 1. A transaction T is aset of values to I. The transaction database D = {T1, T2, ..., Tn} is a set of transactions, whereeach transaction Ti ∈ D. An association denotes the relationship between two patterns, such asP → Z, where P and Z are two patterns.

Definition 1 The support count of a pattern P is the number of transaction containing P in Dand denoted Supp(P ); the support count of an association P → Z is the number of transactionscontaining the patterns that form the association and denoted Supp(P,Z).

Definition 2 (Positive association) Given an attribute set X, and an outcome attribute Y. Xis positively associated with Y if χ2

X→Y ≥ χ2α, where χ2

X→Y is the Chi-square value of associationX → Y and χ2

α is the Chi-square value corresponding to the significance threshold α.

Definition 3 (Nonzero partial association) Let α ∈ [0, 1] be a significance threshold for apartial association. There is a nonzero partial association between I and J given K if the followinginequality holds.

PA(I, J,K) ≥ χ2α

Definition 4 (Causal rule) Given a significance threshold α ∈ [0, 1], an association X → Y isa causal rule if: 1) the support of X → Y in D is significant; 2) X and Y are positively associatedin D; and 3) there exists a nonzero partial association between X and Y in D.


The definitions related to high persistent causal rules follow the work in [11]. It is a generalprincipal that causal relationship is a persistent association, which has been widely used in sciencediscovery from observation. Following the persistent characterization of causal rules, the qualityof the causal rule can approximatively be weighed using the degree of its persistency. Accordingto the above definition, the persistency of a causal rule can be represented as the of associationand partial association.

Definition 5 R : X → Y is a causal rule in D, its degree of association is denoted by Associate(R), and the degree of partial association is denoted by PAssociate (R). The persistency of R in Dis defined as its the lower bound of its association and partial association and denoted by Persist(R), i.e. Persist (R) =min{Associate(R), PAssociate(R)}

Definition 6 (Top-k high persistent causal rule) Given an attribute set (predictive vari-ables) X, an outcome attribute (target variable) Y. X ′ is a pattern in X, the association X ′ → Yis called top-k high persistent causal rule in a database D if there are less than k associationswhose persistency are larger than X ′ → Y .

Definition 7 (Optimal persistent threshold) Assuming R is the set of top-k high persistentcausal rules in database D. The significance threshold α∗ is called minimum persistent thresholdif there is not existing α > α∗ that there are k causal rules whose persistency are larger than α.

Given a binary database D and the desired number of causal rule k, the problem of finding theset of top-k high persistent causal rule in D is to discover k causal rules with highest persistencyin D. An equivalent problem statement is to discover all the causal rules whose persistency areno less than α∗ in D.

3 Mining Top-k Causal Rules

In this section, we propose a novel model, top-k high persistent causal rules, to strive for anefficient discovery of causal rules from observational data. Some technique and methods areintroduced to develop the detail of the problem.

3.1 Top-k high persistent mining

Based on the definition of high persistent rules, we propose an top-k causal rule mining algorithmnamed TKCR for the purpose of mining top-k high persistent rule without specifying a fixedthreshold. The framework of TKCR can be divided into three parts. First it learns a compressedstructure from the observational data to store the information of the patterns. Taking full ad-vantage of the compressed structures, it then search and find the potential top-k high persistentpatterns. Finally, the top-k high persistent rules are identified from the positive patterns.

3.1.1 Compressed data structure: TP-tree

In order to reduce the cost of frequent pattern mining, we extend the frequent pattern tree [12]to store information about the frequent patterns. In the original frequent pattern tree (FP-tree


for short), each node N in the FP-tree is accompanied by an integer value indicating the supportcount of the node. To avoid scanning original records repeatedly and facilitate the measurementof positive association, we also count the co-occurrence of the node along with the target variable.As a result, we can test the positive association based on the TP-tree only.

Definition 8 (TP-tree) A top persistent tree (TP-tree for short) is a compressed data structure,and its formal definition can be described as following.

1. It consists of a set of nodes and an attribute link table.

2. Each node in the TP-tree represents an attribute and can be labeled as a 4-tuple < N,S,C,L >, where N denotes the name of the attribute, S denotes the support count of the attribute,C denotes the support count of the association between the attribute to the target and L links thenode to next node representing the same attribute.

3. Attribute link table consists of two columns, the name of the attributes and the starting pointof its link in the TP-tree.

3.1.2 Identifying top-k persistent rule from TP-tree

The compactness of TP-tree is attributed to the feature that multiple patterns share an commonprefix in the TP-tree. The patterns are sorted according to the attributes in the database andmapped into different paths in the TP-tree. Therefore, following the links of an attribute pi inattributes link table, it is easy to collect all the patterns related to pi contained in the TP-tree.It ensures that subsequent mining can be performed in a rather efficient way.

The optimal persistency threshold α∗ is starting from zero and dynamically raising with processof approach. According to the definition, an advanced measurement of association can help tojudge the persistency of causal rules. Hence the efficiency of the top-k algorithm depends largelyon how fast the persistency threshold can raise to prune the search space. The associations canbe distinguished into potential and redundant causal rules based on whether it can raise α∗.

Definition 9 (Potential causal rule and redundant causal rule) An rule R is called po-tential rule if its persistency larger than optimal persistency threshold α∗; Otherwise, the rule iscalled an redundant rule.

From the perspective of efficiency, the potential causal rules are reserved to update currentresult and the redundant rule are removed. The subsequent mining applies a level-wise strategy toprune the search space, which can be performed in three phases. First, the attributes are sorted indescending order according to their persistency and maintain in a list W . The optimal persistencythreshold α∗ will be initialized. Then we adapt a specified combining mode to achieve efficientcombinatorial search. In common sense, a potential causal rule with higher persistency is morelikely to generate a high persistent combined causal rule. For this reason, the combinations withmultiple attributes are generated in a top-down order in turn. Hence the potential causal ruleswith higher persistency will be combined preferentially. Meanwhile, the selection of attributes tocombine with follows the principle in bottom-up order. In other words, an attribute starts fromthe top of W and only combined with the attributes with higher persistency. Benefiting from theabove properties, the strategy is advantage to raise the threshold fast.


Since the combining starts from the highest persistent attributes, the potential high persistentcausal rules are supposed to be found much easier and the optimal persistency threshold can beraised quickly. More to the point, the combining process at the beginning is much more simple byusing bottom-up selection. In addition, the combination which fails to generate a new potentialcausal rule will be removed, as well as its superset in the search space. Those constrain ensuresthat the algorithm can find the optimal threshold efficiently.

3.2 Algorithm

To find the complete top-k high persistent causal rules, TKCR algorithm first scan the data andcount the supports of all attributes (1-associations), reserving the frequent ones. We select firstk elements from 1-associations to initialize a list L, and compute their partial association. Theoptimal persistency threshold α∗ is set as the least persistent association in L. Subsequently,we generate 2-association using 1-associations in L in sequence. As soon as an association notmeet the optimal persistency threshold, we prune the whole space of its supersets; otherwise, theassociation will be inserted into L and to replace the least persistent association in the L. Thelist L keeps dynamically updating, as well as α∗, until there is no more association found. Thecomplete procedure of the algorithm is detailed in pseudo-code as follows.

Finding top-k high persistent causal rule

Input: variable set U , data set T , target variable Z, optimal persistencythreshold α∗, desired number of causal rules K

Output: the set of top-k high persistent causal rules R1: R← ∅, L← ∅2: L = Frequent(U, T, Z)3: Tree = Global tree(L, T )4: (R,α∗) = Initialize(L,K)5: for (i = 1; X = L(i)&X 6= ∅; i+ +) do6: V = {y ∈ L|Supp(y) > Supp(X)}7: XTree = Local tree(X,Tree)8: Patterns = Pattern generate(α∗, V,XTree)9: for each C ∈ Patterns do

10: if Associate(C) > α∗ then11: PAssociate(C) = PAssociation(C,Z)12: Persist(C) = max{Associate(C), PAssociate(C)}13: if (Persist(C) > α∗) then14: Update(c, R)15: end if16: end if17: end for18: end for19: return R

Fig. 1: Top-k high persistent causal rule mining algorithm

Some parts of the above algorithm are summarized into functions and make the algorithmaccessible. The functions Global tree( ) and Local tree( ) are respectively to construct TP-tree


and conditional TP-tree. In Frequent( ), we count the frequency of the input variables and thevariables will be stored in descending order according to their support. In Update( ), a newpotential causal rule is used to update the list and optimal persistency threshold α∗.

Function:Frequent(L,D, z)1. for (each x ∈ L) do2. count Supp(x) and Supp(x, z)3. if (sup(x) > Significant) then4. insert x into R5. sort R according to support6. return RFunction:Initialize(L, k)7. create a list R of size k, α∗ = 08. for each x ∈ L9. if (x.persis > α∗ ‖ length(R) < k)10. insert x into R11. if (length(R) >= k)12. delete the least one in R13. α∗ = persistency of the the least14. return R

Function:Pattern generate(α,L, x)15. Rlist = ∅, Dlist = ∅, P = x15. while(P 6= ∅)15. P = P × L15. for (each Y ∈ P )16. if (@y ∈ Dlist ∨ y ⊂ Y )18. if (Persist(Y ) > α)19. insert Y into Rlist20. else insert Y into Dlist20. delete Y in P20. return RlistFunction: Update(C,List)20. insert C into List20. delete smallest from List20. α = min{x ∈ List}20. return α

Fig. 2: The functions of TKCR algorithm

The functions Initialize( ) and Pattern generate( ) play an important role in the algorithmimplementation. The top-k rule list L and optimal persistency threshold α∗ are both initializedin Initialize( ). In for loops, potential causal rules are found and inserted into L. L containsat most k causal rules, which have a persistency larger than α∗. A potential causal rule willbe inserted directly if L is partially filled, i.e. the elements of L are less than k; otherwise,the least persistent causal rule will be removed before adding the potential causal rule. As aresult, the list L and optimal persistency threshold α∗ are in the dynamic changes. The functionPattern generate( ) is designed to search the combinational space. Two lists Rlist and Dlist arecreated to reserve potential causal rules and redundant causal rules separately. For each elementY in candidate P , we measure the persistency of Y and insert it into Rlist if Y is a potentialcausal rule; otherwise, we remove Y from P and insert it into Dlist. The candidate P thenjoins with feasible attributes and repeat the procedure until there are no elements in P . Finally,the function outputs the list of potential causal rules, which are used to update the top-k highpersistent causal rules in L.

4 Experiment Result

In this section, the proposed TKCR algorithm is evaluated on both synthetical and real worlddatasets in comparison with naive solution, which is implemented with the threshold < mini sign,mini sign-step, mini sign-2step,... > until there is at least k causal rules. Table 1 shows the nameof datasets, the number of records (#records) and attributes (#attributes). The dataset Censusis collected from UCML repository [13], others from FIMI repository [14], where T10T4D100K issynthesized data generated by IBM Synthetic Data Generator.


Table 1: Data sets

#dateset Connect Census T10T4D100K

#records 67997 300K 100K

#attributes 129 278 870

To investigate the performance of the algorithm, we empirically evaluate the computationalcomplexity of TKCR in comparison with naive algorithms on above three datasets. Each one ismade use to survey the performance at 10 different desired number of k: 5, 10, 15, 20, 25, 30,35, 40, 45, 50, the results are shown in Fig. 3. Naive (step=1) and Naive (step=2) refers to theimplementation of Naive algorithm with mini sign = 10 and step = 1 (step=2). Fig. 3(a) andFig. 3(b) illustrate the performance of TKCR algorithm on two real world datasets, Fig. 3(c)on synthetical dataset. As shown in the figures, the extraction time of the algorithms increasewith the desired number of k. The TKCR algorithm outperforms both implementations of Naivealgorithm for different k’s. Obviously, the TKCR algorithm shows the superiority on all the threedatasets and with different desired number of k.

(a) Connect (b) Census (c) T10T4D100K

Fig. 3: Performance comparison of TKCR and Naive algorithm

(a) Connect (b) Census (c) T10T4D100K

Fig. 4: Scale-up evaluation of TKCR algorithm

Scalability of the TKCR algorithm with record size and desired number of k is evaluated byfurther experiments. The increase of extraction time for different record size with different desirednumber of k is evaluated and the curves are shown in Fig. 4. Fig. 4(a) and Fig. 4(b) show theresult on two real world datasets and Fig. 4(c) on the synthetic dataset. 9 sample points rangingfrom 20K to 60K are extracted to create the variation curves with different k, where k is assignedto 10, 30 and 50 in sequence. It clearly illustrate that the extraction time of TKCR algorithm is


approximatively linear in all the results. It’s worth noting that the linear relationship betweenmakes the TKCR algorithm possible to discover top-k causal rules in larger dataset efficiently.

5 Conclusion

In this paper, we propose a non-graphical approach to mine top-k causal rules in large databases.An efficient algorithm named TKCR is designed for mining such causal rules without predefiningsignificance threshold. Experimental results on both real and synthetic datasets show that ouralgorithm can outperforms both implementations of naive algorithm and shows good scalability,which makes the algorithm possible to discover top-k causal rules in larger dataset efficiently.

References

[1] J. Pearl and T. S. Verma. A theory of inferred causation. In Proceedings of the Second InternationalConference on Principles of Knowledge Representation and Reasoning, 1991, 441-452.

[2] G. F. Cooper. A Simple Constraint-Based Algorithm for Efficiently Mining Observational Databas-es for Causal Relationship. Data Mining and Knowledge Discovery, 1997, 1 (2): 203-224.

[3] C. Silverstein, S. Brin. Scalable Techniques for Mining Causal Structures. Data Mining and Knowl-edge Discovery, 2000, 4 (2-3): 163-192.

[4] C. F. Aliferis, A. Statnikov. Local Causal and Markov Blanket Induction for Causal Discoveryand Feature Selection for Classification Part I: Algorithm and Empirical Evaluation. Journal ofMachine Learning Research, 2010, 11: 171-234.

[5] Y. L. Cheung, A. W. Fu. Mining frequent itemsets without support threshold: with and withoutitem constraints. IEEE transactions on Knowledge and Data Engineering, 2004, 16 (6): 1052-1069.

[6] J. Li, S. Gong. Top-k-FCI: mining top-k frequent closed itemsets in data streams. Journal ofComputational Information Systems, 2011, 7 (13): 4819-4826.

[7] Y. Hirate, E. Iwahashi. TF2P-Growth: An Efficient Algorithm for Mining Frequent patternswithout any Thresholds, In Proceeding of 4nd International Conference On Data Mining, 2004.

[8] W. Ying, J. Q. Yu. A Top-k query algorithm on uncertain streaming data. Journal of Computa-tional Information Systems, 2013, 9 (13): 5273-5279.

[9] T. M. Quang, S. Oyanagi, and K. Yamazaki, ExMiner: An Efficient Algorithm for Mining Top-KFrequent Patterns. In Proceeding of ADMA 2006, 2006, 436-447.

[10] L. Shen, H. Shen, P. Pritchard and R. Topor. Finding the N Largest Itemsets. in Proceeding ofICDM’98, 1998, 211-222.

[11] Z. Jin, J. Li, L. Liu et al, Discovery of Causal Rules Using Partial Association. In Proceedings ofthe 12nd International Conference On Data Mining, 2012, 309-318.

[12] J. Han, J. Pei and Y. Yin. Mining frequent patterns without candidate generation. In Proc. of the2000 ACM SIGMOD, New York, USA, 2000, 1-12.

[13] A. Frank, A. Asuncion, UCI Machine Learning Repository, http://archive.ics.uci.edu/ml/, 2010.

[14] B. Goethals, Frequent Itemset Mining Dataset Repository, http://fimi.ua.ac.be/data/, 2013.

Documents

Mining Top-k High Persistent Causal Rules Through Non ...nugget.unisa.edu.au/Thuc/zhou.pdf · 3958 Z. Jin et al. /Journal of Computational Information Systems 10: 9 (2014) 3955{3963