12
Research Article An Improved Apriori Algorithm Based on an Evolution-Communication Tissue-Like P System with Promoters and Inhibitors Xiyu Liu, 1 Yuzhen Zhao, 1 and Minghe Sun 2 1 College of Management Science and Engineering, Shandong Normal University, Jinan, Shandong, China 2 College of Business, e University of Texas at San Antonio, San Antonio, TX, USA Correspondence should be addressed to Yuzhen Zhao; [email protected] Received 4 November 2016; Revised 6 January 2017; Accepted 30 January 2017; Published 19 February 2017 Academic Editor: Stefan Balint Copyright © 2017 Xiyu Liu et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Apriori algorithm, as a typical frequent itemsets mining method, can help researchers and practitioners discover implicit associations from large amounts of data. In this work, a fast Apriori algorithm, called ECTPPI-Apriori, for processing large datasets, is proposed, which is based on an evolution-communication tissue-like P system with promoters and inhibitors. e structure of the ECTPPI-Apriori algorithm is tissue-like and the evolution rules of the algorithm are object rewriting rules. e time complexity of ECTPPI-Apriori is substantially improved from that of the conventional Apriori algorithms. e results give some hints to improve conventional algorithms by using membrane computing models. 1. Introduction Frequent itemsets mining, as a subfield of data mining, aims at discovering itemsets with high frequency from huge amounts of data. Interesting implicit associations between items then can be extracted from these data, which can help researchers and practitioners make informed decisions. One famous example is “beer and diapers” [1]. e supermarket management discovered a significant correlation between the purchases of beer and diapers which had nothing to do with each other ostensibly through frequent itemsets mining. Consequently, they put diapers next to beer. rough this layout adjustment, sales of both beer and diapers increased. e Apriori algorithm is a typical frequent itemsets min- ing algorithm, which is suitable for the discovery of frequent itemsets in transactional databases [2]. To process large datasets, many parallel improvements have been made to improve the computational efficiency of the Apriori algo- rithm [3–6]. How to implement the Apriori algorithm in parallel to improve its computational efficiency is still an on- going research topic. Given the extremity of the technology and theory of the silicon-based computing, new non-silicon- based computing devices P systems are used in this study. P systems are new bioinspired computing models of membrane computing, which focus on abstracting comput- ing ideas from the study of biological cells, particularly of cellular membranes [7, 8]. is study uses an evolution- communication tissue-like P system with promoters and inhibitors (ECPI tissue-like P systems) for computation. P systems are powerful distributed and parallel bioinspired computing devices, being able to do what Turing machines can do [9–11], and have been applied to many fields. e applications of P systems are based on two types of membrane algorithms, the coupled membrane algorithm and the direct membrane algorithm. e coupled membrane algorithm combines the traditional algorithm with some structural characters of P systems, such as dividing the whole system into several relatively independent computing units, where the computing units can communicate with each other, the computing units can be dynamically rebuilt, and rules can be executed in parallel [12–16]. e direct membrane algorithm designs the algorithm based on the structure, the objects, and the rules of P systems directly [17–21]. e final goal of membrane computing is to build biocomputers and the direct membrane algorithm can be transplanted to the biocomputers directly, which is more meaningful from this Hindawi Discrete Dynamics in Nature and Society Volume 2017, Article ID 6978146, 11 pages https://doi.org/10.1155/2017/6978146

An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

Research ArticleAn Improved Apriori Algorithm Based on anEvolution-Communication Tissue-Like P System withPromoters and Inhibitors

Xiyu Liu1 Yuzhen Zhao1 and Minghe Sun2

1College of Management Science and Engineering Shandong Normal University Jinan Shandong China2College of Business The University of Texas at San Antonio San Antonio TX USA

Correspondence should be addressed to Yuzhen Zhao 723567558qqcom

Received 4 November 2016 Revised 6 January 2017 Accepted 30 January 2017 Published 19 February 2017

Academic Editor Stefan Balint

Copyright copy 2017 Xiyu Liu et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Apriori algorithm as a typical frequent itemsets mining method can help researchers and practitioners discover implicitassociations from large amounts of data In this work a fast Apriori algorithm called ECTPPI-Apriori for processing large datasetsis proposed which is based on an evolution-communication tissue-like P systemwith promoters and inhibitorsThe structure of theECTPPI-Apriori algorithm is tissue-like and the evolution rules of the algorithm are object rewriting rules The time complexity ofECTPPI-Apriori is substantially improved from that of the conventional Apriori algorithmsThe results give some hints to improveconventional algorithms by using membrane computing models

1 Introduction

Frequent itemsets mining as a subfield of data miningaims at discovering itemsets with high frequency from hugeamounts of data Interesting implicit associations betweenitems then can be extracted from these data which can helpresearchers and practitioners make informed decisions Onefamous example is ldquobeer and diapersrdquo [1] The supermarketmanagement discovered a significant correlation betweenthe purchases of beer and diapers which had nothing to dowith each other ostensibly through frequent itemsets miningConsequently they put diapers next to beer Through thislayout adjustment sales of both beer and diapers increased

The Apriori algorithm is a typical frequent itemsets min-ing algorithm which is suitable for the discovery of frequentitemsets in transactional databases [2] To process largedatasets many parallel improvements have been made toimprove the computational efficiency of the Apriori algo-rithm [3ndash6] How to implement the Apriori algorithm inparallel to improve its computational efficiency is still an on-going research topic Given the extremity of the technologyand theory of the silicon-based computing new non-silicon-based computing devices P systems are used in this study

P systems are new bioinspired computing models ofmembrane computing which focus on abstracting comput-ing ideas from the study of biological cells particularly ofcellular membranes [7 8] This study uses an evolution-communication tissue-like P system with promoters andinhibitors (ECPI tissue-like P systems) for computation Psystems are powerful distributed and parallel bioinspiredcomputing devices being able to do what Turing machinescan do [9ndash11] and have been applied to many fields Theapplications of P systems are based on two types ofmembranealgorithms the coupled membrane algorithm and the directmembrane algorithm The coupled membrane algorithmcombines the traditional algorithm with some structuralcharacters of P systems such as dividing the whole systeminto several relatively independent computing units wherethe computing units can communicate with each otherthe computing units can be dynamically rebuilt and rulescan be executed in parallel [12ndash16] The direct membranealgorithm designs the algorithm based on the structure theobjects and the rules of P systems directly [17ndash21] The finalgoal of membrane computing is to build biocomputers andthe direct membrane algorithm can be transplanted to thebiocomputers directly which is more meaningful from this

HindawiDiscrete Dynamics in Nature and SocietyVolume 2017 Article ID 6978146 11 pageshttpsdoiorg10115520176978146

2 Discrete Dynamics in Nature and Society

perspective However the direct membrane algorithm needsto transform the whole traditional algorithm into P systemwhich is complex and difficult Up to date a few simplestudies on the direct membrane algorithm focus on thearithmetic operations the logic operations the generation ofgraphic language and clustering [17ndash21]

In this study a novel improved Apriori algorithm basedon an ECPI tissue-like P system (ECTPPI-Apriori) is pro-posed using the parallel mechanism in P systems The infor-mation communication between different computing unitsin ECTPPI-Apriori is implemented through the exchange ofmaterials between membranes Specifically all itemsets aresearched in parallel regulated by a set of promoters andinhibitors For a database with 119905 fields 119905 + 2 cells are usedin the algorithm where 1 cell is used to enter the data in thedatabase into the system 119905 cells are used to detect the frequentitemsets and one specific cell called output cell is used tostore the results The time complexity of ECTPPI-Apriori iscompared with those of other parallel Apriori algorithms toshow that the proposed algorithm is time saving

The contributions of this study are twofold From theviewpoint of data mining new bioinspired techniques areintroduced into frequent itemsets mining to improve theefficiency of the algorithms P systems are natural dis-tributed parallel computing devices which can improve thetime efficiency in computation Besides the hardware andsoftware implementations P systems can be implementedby biological methods The computing resources neededare only several cells which can decrease the computingresource requirements From the viewpoint of P systems theapplication areas of the new bioinspired devices P systems areextended to frequent itemset mining The applications basedon the direct membrane algorithms are limited This studyprovides a new application of P systems in frequent itemsetsmining which expands the application areas of the directmembrane algorithms

The paper is organized as follows Section 2 introducessome preliminaries about the Apriori algorithm and aboutthe ECPI tissue-like P systems The ECTPPI-Apriori algo-rithm using the parallel mechanism of the ECPI tissue-like Psystem is developed in Section 3 In Section 4 one illustrativeexample is used to show how the proposed algorithm worksComputational experiments using two datasets to show theperformance of the proposed algorithm in frequent itemsetsmining are reported in Section 5 Conclusions are given inSection 6

2 Preliminaries

In this section some basic concepts and notions in Apriorialgorithm [2] and ECPI tissue-like P system [7] are intro-duced

21 The Apriori Algorithm TheApriori algorithm is a typicalfrequent itemsetsmining algorithmproposed byAgrawal andSrikant [2] which aims at discovering relationships betweenitems in transactional databases

Definitions(i) Item a field in a transactional database is called

an item If one record contains a certain item ldquo1rdquootherwise ldquo0rdquo is placed in the corresponding field ofthe record in the transactional database

(ii) Itemset a set of items is called an itemset For nota-tional convenience an itemset with 119899 items 1198681 1198682 and 119868119899 is represented by 1198681 1198682 119868119899

(iii) h-itemset a set containing ℎ items is called a ℎ-itemset

(iv) Transaction a record in a transactional databaseis called a transaction and each transaction is anonempty itemset

(v) Support count the number of transactions containinga certain itemset is called the support count of theitemset Support count is also called the frequency orcount of the itemset

(vi) Frequent itemset if the support count of an itemset isequal to or larger than the given minimum supportcount threshold 119896 this itemset is called a frequentitemset

The general procedure of Apriori from Han et al [1] is asfollows

Input The database contains 119863 transactions and the supportcount threshold 119896

Step 1 Scan the database to compute the support count ofeach item and obtain the frequent 1-itemsets 1198711 Let ℎ = 2

Step 2 Obtain the candidate frequent ℎ-itemsets 119862ℎ byjoining two frequent (ℎ minus 1)-itemsets with only one itemdifferent

Step 3 Prune those itemsets which have infrequent subset oflength (ℎ minus 1) from the candidate frequent ℎ-itemsets

Step 4 Scan the database to compute the support countof each candidate frequent ℎ-itemset Delete those itemsetswhich do not meet the support count threshold 119896 and obtainthe frequent ℎ-itemsets 119871ℎ

Step 5 Let ℎ = ℎ + 1 Repeat Steps 2 to 4 until no itemsetmeets the support count threshold 119896

Output The collection of all frequent itemsets is representedby 119871

22 Evolution-CommunicationTissue-Like P Systemswith Pro-moters and Inhibitors Membrane computing is a new branchof natural computing which abstracts computing ideas fromthe construct and the functions of cells or tissues In thenature each organelle membrane or cell membrane worksas a relatively independent computing unit The amountand the types of materials in each organelle or cell changethrough chemical reactions Materials can flow between dif-ferent organelle or cell membranes to transport information

Discrete Dynamics in Nature and Society 3

Reactions in different organelles or cells take place in parallelwhile reactions in the same organelle or cell take place alsoin parallel These biological processes are abstracted as thecomputing processes of membrane computing The internalparallel feature makes membrane computing a powerfulcomputing method which has been proven to be equivalentto Turing machines [7ndash11]

The ECPI tissue-like P system composed of a networkof cells linked by synapses (channels) is a typical membranecomputing model The whole P system is divided intoseparate regions through these cells each forming one regionEach cell has two main components multisets of objects(materials) and rules also called evolution rules (chemicalreactions) Objects as information carriers are representedby characters

Rules regulate the ways objects evolve to new objectsand the ways objects in different cells communicate throughsynapses Rules are executed in nondeterministic flat maxi-mally parallel in each cell That is at any step if more thanone rule can be executed but the objects in the cell can onlysupport some of them then a maximal number of rules willbe executed and each rule can be executed for only once [22]

The computation halts if no rule can be executed in thewhole system The computational results are represented bythe types and numbers of specified objects in a specified cellBecause objects in a P system evolve in flatmaximally parallelregulated by promoters and inhibitors the systems computevery efficiently [10 22] Paun [7] provided more details aboutP systems

A formal description of the ECPI tissue-like P system isas follows

An ECPI tissue-like P system of degree119898 is of the form

Π = (119874 1205901 1205902 120590119898 syn 120588 119894out) (1)

where (1) 119874 represents the alphabets including all objects ofthe system (2) syn sube 1 2 119898times1 2 119898 represents allsynapses between the cells (3) 120588 defines the partial orderingrelationship of the rules that is rules with higher orders areexecuted with higher priority (4) 119894out represents the subscriptof the output cell where the computation results are placed(5) 1205901 120590119898 represent the119898 cells Each cell is of the form

120590ℎ = (119908ℎ0 119877ℎ) for 1 le ℎ le 119898 (2)

In (2)119908ℎ0 represents the initial objects in cell ℎ A119908ℎ0 =120582 means that there is no object in cell ℎ If 119886 representsan object 119886119899 represents the multiplicity of 119899 copies of suchobjects 119877ℎ in (2) represents a set of rules in cell ℎ with theform of 119908119911 rarr 119909119910go where 119908 is the multiset of objectsconsumed by the rule 119911 in the subscript is the promoter orthe inhibitor of the form 119911 = 1199111015840 or 119911 = not1199111015840 and 119909 and 119910 arethe multisets of objects generated by the rule A rule can beexecuted only when all objects in the promoter appear andcannot be executed when any objects in the inhibitor appearMultiset of objects 119909 stay in the current cell and multisetof objects 119910 go to the cells which have synapses connectedfrom the current cell The 119902th subset of rules in cell ℎ havingsimilar functions is represented by 119903ℎ119902 and the rules in thesame subset are connected by cup

0

1 2

middot middot middot

t

t + 1

Input

Figure 1 Cell structure for the ECTPPI-Apriori algorithm

3 The ECTPPI-Apriori Algorithm

In this section the structure of the P system used in theECTPPI-Apriori algorithm is presented first the computa-tional processes in different cells are then discussed in detaila pseudocode summarizing the operations is presented andan analysis of the algorithm complexity is provided

31 Algorithm and Rules Assume a transactional databasecontains119863 records and 119905 fields An object 119886119894119895 is generated onlyif the 119894th transaction contains the 119895th item 119868119895 (ie there is a1 in the corresponding field in the transactional database) Inthis way the database is transformed into objects a form thatthe P system can recognizeThe support count threshold is setto 119896 A cell structure with 119905 + 2 cells labeled by 0 1 119905 + 1as shown in Figure 1 is used as the framework for ECTPPI-Apriori The evolution rules are not shown in this figure dueto their length Transactional databases are usually sparseTherefore the number of objects represented by 119886119894119895 to beprocessed in this algorithm is much smaller than119863119905

When computation begins objects 119886119894119895 encoded from thetransactional database and object 120579119896 representing the supportcount threshold 119896 are entered into cell 0 Objects 119886119894119895 and120579119896 are passed to cells 1 2 119905 in parallel using a parallelevolution mechanism in tissue-like P systems The auxiliaryobjects 120573119896119895 are generated in cell 1 Next the frequent 1-itemsets are produced and objects representing frequent 1-itemsets are generated in cell 1 by executing the evolutionrules in parallel The objects representing the frequent 1-itemsets are passed to cells 2 and 119905 + 1 Cell 119905 + 1 isused to store the computational results The frequent 2-itemsets and the objects representing the frequent 2-itemsetsare produced in cell 2 by executing the evolution rules inparallel The objects representing the frequent 2-itemsets arepassed to cells 3 and 119905 + 1 This process continues until allfrequent itemsets have been produced As compared withthat of the conventional Apriori algorithm the computationaltime needed by ECTPPI-Apriori to generate the candidatefrequent ℎ-itemsets 119862ℎ and to compute the support countof each candidate frequent ℎ-itemset can be substantiallyreduced

4 Discrete Dynamics in Nature and Society

The ECPI tissue-like P system for ECTPPI-Apriori is asfollows

ΠApriori = (119874 1205900 1205901 120590119905+1 syn 120588 119894out) (3)

where (1) 119874 = 119886119894119895 120573119895 12057311989511198952 1205731198951 sdotsdotsdot119895119905 120575119895 12057511989511198952 1205751198951 sdotsdotsdot119895119905 12057211989512057211989511198952 1205721198951 sdotsdotsdot119895119905 for 1 le 119894 le 119863 1 le 119895 1198951 119895119905 le 119905 (2)syn = 0 1 0 2 0 119905 1 119905 + 1 2 119905 + 1 119905 119905 + 11 2 2 3 119905 minus 1 119905 (3) 120588 = 119903119894 gt 119903119895 | 119894 lt 119895 (4) 1205900 =(11990800 1198770) 1205901 = (11990810 1198771) 1205902 = (11990820 1198772) 120590119905 = (1199081199050 119877119905)(5) 119894out = 119905 + 1

In 1205900 = (11990800 1198770) 11990800 = 120582 and119877011990301 = 119886119894119895 rarr 119886119894119895go cup 120579119896 rarr 120579119896go

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205901 = (11990810 1198771) 11990810 = 120582 and119877111990311 = 120579119896 rarr 1205731198961 sdot sdot sdot 120573

119896119905

11990312 = 120575119901119895 120573119896minus119901119895 rarr 120582 cup (120575119896119895 )not120573119895 rarr 120572119895go

for 1 le 119895 le 119905 and 1 le 119901 le 11989611990313 = 119886119894119895120573119895 rarr 120575119895

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205902 = (11990820 1198772) 11990820 = 120582 and119877211990321 = ()12057211989511205721198952120579

119896not12057311989611989511198952rarr 12057311989611989511198952

for 1 le 1198951 lt 1198952 le 11990511990322 = 120572119895 rarr 120582 (1 le 119895 le 119905)

11990323 = 12057511990111989511198952120573119896minus11990111989511198952

rarr 120582 cup (12057511989611989511198952)not12057311989511198952rarr 12057211989511198952 go

for 1 le 1198951 lt 1198952 le 119905 and 1 le 119901 le 11989611990324 = (12057311989511198952)11988611989411989511198861198941198952

rarr 12057511989511198952

for 1 le 119894 le 119863 and 1 le 1198951 lt 1198952 le 119905

In 120590ℎ = (119908ℎ0 119877ℎ) 119908ℎ0 = 120582 and119877ℎ

119903ℎ1 = ()1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 1205721198951sdotsdotsdot119895ℎminus2119895ℎ2 120579119896not1205731198961198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr1205731198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le 119895ℎ1 119895ℎ2 le 119905119903ℎ2 = 1205721198951 sdotsdotsdot119895ℎminus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus1 le 119905

119903ℎ3 = 1205751199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2120573119896minus1199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)not1205731198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2 go

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 1 le 119895ℎ1 119895ℎ2 le 119905 and1 le 119901 le 119896

119903ℎ4 = (1205731198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)1198861198941198951 sdotsdotsdot119886119894119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205751198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le119895ℎ1 119895ℎ2 le 119905

In 120590119905 = (1199081199050 119877119905) 1199081199050 = 120582 and1198771199051199031199051 = ()1205721198951sdotsdotsdot119895119905minus21198951199051 1205721198951sdotsdotsdot119895119905minus21198951199052 120579

119896not1205731198961198951sdotsdotsdot119895119905minus21198951199051 1198951199052rarr 1205731198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le 1199051199031199052 = 1205721198951 sdotsdotsdot119895119905minus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus1 le 119905

1199031199053 = 1205751199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052120573119896minus1199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052)not1205731198951sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 1205721198951 sdotsdotsdot119895119905minus21198951199051 1198951199052 go

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 1 le 1198951199051 1198951199052 le 119905 and1 le 119901 le 1198961199031199054 = (1205731198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

)1198861198941198951 sdotsdotsdot119886119894119895119905minus21198951199051 1198951199052rarr 1205751198951 sdotsdotsdot119895tminus21198951199051 1198951199052

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le119905

In 120590119905+1 = (119908119905+10 119877119905+1) 119908119905+10 = 120582 and 119877119905+1 =

The auxiliary objects 120573119896119895 and 120575119895 for 1 le 119895 le 119905 are usedto detect the frequent 1-itemsets The auxiliary objects 120573119896119895store the items in the candidate frequent 1-itemsets using theirsubscripts For example object 1205731198961 means the itemset 1198681 is acandidate frequent 1-itemsetThe auxiliary objects 120575119895 are usedto identify the frequent 1-itemsets The 119896 copies of 120573119895 initiallyin cell 1 indicate that the 119895th item needs to appear in at least119896 records to make the itemset 119868119895 a frequent 1-itemset Oneobject 120573119895 is removed from and one object 120575119895 is generated incell 1 when one more of the 119895th item is detected in a recordTherefore if there is no object 120573119895 left and 119896 objects 120575119895 havebeen generated in cell 1 at least 119896 records have been found tocontain the 119895th item The functions of 12057311989611989511198952 in cells 2 120573119896119895111989521198953in cell 3 and 1205731198961198951 sdotsdotsdot119895119905 in cell 119905 are similar to that of 120573119896119895 in cell1 The functions of 12057511989511198952 in cell 2 120575119895111989521198953 in cell 3 and 1205751198951 sdotsdotsdot119895119905in cell 119905 are similar to that of 120575119895 in cell 1 The objects 120572119895 for1 le 119895 le 119905 are used to store the items in the frequent 1-itemsetsusing their subscript For example 1205721 means the itemset 1198681is a frequent 1-itemset The functions of 12057211989511198952 in cell 2 120572119895111989521198953in cell 3 and 1205721198951sdotsdotsdot119895119905 in cell 119905 are similar to that of 120572119895 in cell1

The evolution rules are object rewriting rules similar tochemical reactions They take objects transform them intoother objects and may transport them to other cells

32 Computing Process

Input Cell 0 is the input cell The objects 119886119894119895 encoded fromthe transactional database and objects 120579119896 representing the

Discrete Dynamics in Nature and Society 5

support count threshold 119896 are entered into cell 0 to activatethe computation process Rule 11990301 is executed to put copies of119886119894119895 and 120579119896 to cells 1 2 119905

Frequent 1-ItemsetsGenerationFrequent 1-itemsets are gener-ated in cell 1 Rule 11990311 is executed to generate 120573

119896119895 for 1 le 119895 le 119905

Rule 11990313 is executed to detect all frequent 1-itemsets using theinternal flat maximally parallel mechanism in the P systemRule 11990312 cannot be executed because no object 120575119895 is in cell 1at this time The detection process of the candidate frequent1-itemset 1198681 is taken as an example The detection processesof other candidate frequent 1-itemsets are performed in thesame way Rule 11990313 is actually composed of multiple subrulesworking on objects with different subscripts If object 1198861198941 is incell 1 which means the 119894th record contains the first item thesubrule 11988611989411205731 rarr 1205751meets the execution condition and canbe executed If object 1198861198941 is not in cell 1 which means the 119894threcord does not contain the first item the subrule 11988611989411205731 rarr1205751 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 119896 records for theitemset 1198681 to be a frequent 1-itemset Each execution of asubrule consumes one 1205731 Therefore at most 119896 subrules ofthe form 11988611989411205731 rarr 1205751 can be executed in nondeterministicflat maximally parallel The checking process continues untilall objects 119886119894119895 have been checked or all of the 119896 copies of 1205731have been consumed If all of the 119896 copies of 1205731 have beenconsumed the first item appeared in at least 119896 records andthe itemset 1198681 is a frequent 1-itemset If some copies of 1205731are still in this cell after all objects 119886119894119895 have been checked theitemset 1198681 is not a frequent 1-itemset

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example If 119901copies of 1205731 have been consumed by rule 11990313 and 119896minus119901 copiesof 1205731 are still in this cell subrule 1205751199011120573

119896minus1199011 rarr 120582 is executed to

delete the objects 1205751199011 and 120573119896minus1199011 If all of the 119896 copies of 1205731 have

been consumed subrule (1205751198961)not1205731 rarr 1205721go is executed to putan object 1205721 to cells 2 and 119905 + 1 to indicate that the itemset1198681 is a frequent 1-itemset and to activate the computation incell 2 If no 1-itemset is a frequent 1-itemset the computationhalts

Frequent 2-Itemsets Generation The frequent 2-itemsets aregenerated in cell 2 Rule 11990321 is executed to obtain all candidatefrequent 2-itemsets using the internal flat maximally parallelmechanism in the P system The pair of empty parenthesesin this subrule indicates that no objects are consumedwhen this rule is executed The detection process of thecandidate frequent 2-itemset 1198681 1198682 is taken as an exampleThe detection processes of the other candidate frequent 2-itemsets are performed in the same way Rule 11990321 is actuallycomposed of multiple subrules working on objects withdifferent subscripts If objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets subrule()12057211205722120579119896not12057311989612 rarr 12057311989612 is executed to generate 12057311989612 The presenceof 12057311989612 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset

Rule 11990322 is executed to delete the redundant objects120572119895 thatwere used by rule 11990321 but are not needed anymore

Rule 11990324 is executed to detect all frequent 2-itemsetsusing the internal flat maximally parallel mechanism in theP system Rule 11990323 cannot be executed because no object12057511989511198952 is in cell 2 at this time The detection process of thefrequent 2-itemset 1198681 1198682 is taken as an example Rule 11990324 isactually composed of multiple subrules working on objectswith different subscripts If objects 1198861198941 and 1198861198942 are in cell2 which means the 119894th record contains the first and thesecond items subrule (12057312)11988611989411198861198942 rarr 12057512meets the executioncondition and can be executed If objects 1198861198941 and 1198861198942 are notboth in cell 2 which means the 119894th record does not containboth the first and the second items subrule (12057312)11988611989411198861198942 rarr12057512 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 12057312 are in cell 2 indicating thatboth the first and the second items need to appear togetherin at least 119896 records for the itemset 1198681 1198682 to be a frequent 2-itemset Each execution of these subrules consumes one 12057312Therefore at most 119896 subrules of the form (12057312)11988611989411198861198942 rarr 12057512can be executed in nondeterministic flat maximally parallelThe checking process continues until all objects 119886119894119895 have beenchecked or all of the 119896 copies of 12057312 have been consumed Ifall of the 119896 copies of 12057312 have been consumed the first andthe second items appeared together in at least 119896 records andthe itemset 1198681 1198682 is a frequent 2-itemset If some copies of12057312 are still in this cell after rule 11990324 is executed the itemset1198681 1198682 is not a frequent 2-itemset

Rule 11990323 is executed to process the results obtained by rule11990324 The 2-itemset 1198681 1198682 is again taken as an example If 119901copies of 12057312 have been consumed by rule 11990324 and 119896minus119901 copiesof objects 12057312 are still in this cell subrule 12057511990112120573

119896minus11990112 rarr 120582

is executed to delete the objects 12057511990112 and 120573119896minus11990112 If all of the119896 copies of 12057312 have been consumed subrule (12057511989612)not12057312 rarr12057212go is executed to put an object 12057212 to cells 3 and 119905 + 1 toindicate that the itemset 1198681 1198682 is a frequent 2-itemset and toactivate the computation in cell 3 If no 2-itemset is a frequent2-itemset the computation halts

Each cell 119895 for 3 le 119895 le 119905 has 4 rules which are similar tothose in cell 2 Each cell 119895 performs similar functions as cell 2does but for frequent 119895-itemsets

After the computation halts all the results that is objectsrepresenting the identified frequent itemsets are stored in cell119905 + 1

33 AlgorithmFlow TheconventionalApriori algorithmexe-cutes sequentially ECTPPI-Apriori uses the parallel mecha-nism of the ECPI tissue-like P system to execute in parallel Apseudocode of ECTPPI-Apriori is shown as in Algorithm 1

34 Time Complexity The time complexity of ECTPPI-Apri-ori in theworst case is analyzed Initially 1 computational stepis needed to put copies of 119886119894119895 and 120579119896 to cells 1 2 119905

Generating the frequent 1-itemsets needs 3 computationalsteps Generating the candidate frequent 1-itemsets 1198621 needs1 computational step Finding the support counts of thecandidate frequent 1-itemsets needs 1 computational step All

6 Discrete Dynamics in Nature and Society

InputThe objects 119886119894119895 encoded from the transactional database and objects 120579119896 representing the support countthreshold 119896Rule 11990301Copy all 119886119894119895 and 120579119896 to cells 1 to 119905 + 1

MethodRule 11990311Generate 120573119896119895 for 1 le 119895 le 119905 to form the candidate frequent 1-itemsets 1198621Rule 11990313Scan each object 119886119894119895 in cell 1 to count the frequency of each item If 119886119894119895 is in cell 1 consume one 120573119895 andgenerate one 120575119895 Continue until all 119896 copies of 120573119895 have been consumed or all objects 119886119894119895 have been scannedRule 11990312If all 119896 copies of 120573119895 have been consumed generate an object 120572119895 to add 119868119895 to 1198711 as a frequent 1-itemsetand pass 120572119895 to cells 2 and 119905 + 1 Delete all remaining copies of 120573119895 and delete all copies of 120575119895

For (2 le ℎ le 119905 and 119871ℎminus1 = ) do the following in cell ℎRule 119903ℎ1Scan the objects 1205721198951 sdotsdotsdot119895ℎminus1 representing the frequent (ℎ minus 1)-itemsets 119871ℎminus1 to generate the objects 120573

1198961198951 sdotsdotsdot119895ℎ

representing the candidate frequent ℎ-itemsets 119862ℎRule 119903ℎ2Delete all objects 1205721198951 sdotsdotsdot119895ℎminus1 after they have been used by rule 119903ℎ1Rule 119903ℎ4Scan the objects 119886119894119895 representing the database to count the frequency of each candidate frequent ℎ-itemset119862ℎ If the objects 1198861198941198951 119886119894119895ℎminus1 and 119886119894119895ℎ are all in cell ℎ consume one 1205731198951 sdotsdotsdot119895ℎ and generate one 1205751198951 sdotsdotsdot119895ℎ Continue until all copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed or all objects 119886119894119895 have been scannedRule 119903ℎ3If all 119896 copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed generate 1205721198951 sdotsdotsdot119895ℎ to add 1198681198951 119868119895ℎ to 119871ℎ as a frequentℎ-itemset and put 1205721198951 sdotsdotsdot119895ℎ in cells ℎ + 1 and 119905 + 1 Delete all remaining copies of 1205731198951 sdotsdotsdot119895ℎ and delete all copiesof 1205751198951 sdotsdotsdot119895ℎ Let ℎ = ℎ + 1

OutputThe collection of all frequent itemsets 119871 encoded by objects 120572119895 12057211989511198952 1205721198951 sdotsdotsdot119895119899

Algorithm 1 ECTP-Apriori algorithm

candidate frequent 1-itemsets in the database are checked inthe flat maximally parallel Passing the results of the frequent1-itemsets to cells 2 and 119905 + 1 needs 1 computational step

Generating the frequent ℎ-itemsets (1 lt ℎ le 119905) needs4 computational steps Generating the candidate frequent ℎ-itemsets119862ℎ needs 1 computational step Cleaning thememoryused by the objects needs 1 computational step Finding thesupport counts of the candidate frequent ℎ-itemsets needs 1computational step All candidate frequent ℎ-itemsets in thedatabase are checked in flat maximally parallel Passing theresults of the frequent ℎ-itemsets to cells ℎ+1 and 119905 + 1 needs1 computational step

Therefore the time complexity of ECTPPI-Apriori is 1 +3 + 4(119905 minus 1) = 4119905 which gives 119874(119905) Note that 119874 is usedtraditionally to indicate the time complexity of an algorithmand the 119874 used here has a different meaning from that usedearlier when ECTPPI-Apriori is described

Some comparison results between ECTPPI-Apriori andthe original as well as some other improved parallel Apriorialgorithms are shown in Table 1 where |119862119896| is the number

Table 1 Time complexities of some Apriori algorithms

Algorithm Time complexityApriori [2] 119874 (119863119905 1003816100381610038161003816119862119896

1003816100381610038161003816 + 119905 1003816100381610038161003816119871119896minus110038161003816100381610038161003816100381610038161003816119871119896minus1

1003816100381610038161003816)

Apriori based on the booleanmatrix and Hadoop [3] 119874(119863119905)

Multi-GPU-based Apriori [4] 119874(119905|119871119896minus1||119871119896minus1|)

PApriori [5] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

Parallel Apriori algorithm onHadoop cluster [6] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

ECTPPI-Apriori 119874(119905)

of candidate frequent 119896-itemsets and |119871119896minus1| is the number offrequent 119896 minus 1-itemsets

4 An Illustrative Example

An illustrative example is presented in this section todemonstrate how ECTPPI-Apriori works Table 2 shows the

Discrete Dynamics in Nature and Society 7

Table 2 The transactional database of the illustrative example

TID ItemsT100 1198681 1198682 1198685T200 1198682 1198684T300 1198682 1198683T400 1198681 1198682 1198684T500 1198681 1198683T600 1198682 1198683T700 1198681 1198683T800 1198681 1198682 1198683 1198685T900 1198681 1198682 1198683

transactional database of one branch office of All Electronics[1] There are 9 transactions and 5 fields in this database thatis 119863 = 9 and 119905 = 5 Suppose the support count threshold is119896 = 2 The computational processes are as follows

Input The database is transformed into objects 11988611 11988612 1198861511988622 11988624 11988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 1198868111988682 11988683 11988685 11988691 11988692 and 11988693 in a form that the P systemcan recognize These objects and objects 1205792 representing thesupport count threshold 119896 = 2 are entered into cell 0 toactivate the computation process Rule 11990301 = 119886119894119895 rarr 119886119894119895go cup

120579119896 rarr 120579119896go is executed to put copies of 11988611 11988612 11988615 11988622 1198862411988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 11988681 11988682 11988683 1198868511988691 11988692 11988693 and 1205792 to cells 1 sdot sdot sdot 5

Frequent 1-Itemsets Generation Within cell 1 the auxiliaryobjects 1205732119895 for 1 le 119895 le 5 are created by rule 11990311 to indicatethat each item 119895 needs to appear in at least 119896 = 2 records forit to be a frequent 1-itemset Rule 11990313 is executed to detect allfrequent 1-itemsets in flat maximally parallel The detectionprocess of the candidate frequent 1-itemset 1198681 is taken asan example Objects 11988611 11988641 11988651 11988671 11988681 and 11988691 are in cell1 which means the first the fourth the fifth the sevenththe eighth and the ninth records contain 1198681 The subrules119886111205731 rarr 1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751119886811205731 rarr 1205751 and 119886911205731 rarr 1205751meet the execution conditionand can be executed Objects 11988621 11988631 and 11988661 are not in cell 1which means the second the third and the sixth records donot contain 1198681 The subrules 119886211205731 rarr 1205751 119886311205731 rarr 1205751 and119886611205731 rarr 1205751donotmeet the execution condition and cannotbe executed Initially 2 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 2 records to make theitemset 1198681 a frequent 1-itemset Each execution of a subruleconsumes one 1205731 Therefore 2 of subrules among 119886111205731 rarr1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751 119886811205731 rarr 1205751and 119886911205731 rarr 1205751 can be executed in nondeterministic flatmaximally parallelThrough the execution of 2 such subrulesboth of the 2 copies of 1205731 are consumed and 2 copies of1205751 are generated The detection processes of other candidatefrequent 1-itemsets are performed in the same way After thedetection processes 12057321 120573

22 12057323 12057324 and 120573

25 are consumed and

12057521 12057522 12057523 12057524 and 12057525 are generated

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example All of

the 2 copies of 1205731 have been consumed subrule (12057521)not1205731 rarr1205721go is executed to put an object 1205721 to cells 2 and 6 toindicate that the itemset 1198681 is a frequent 1-itemset and toactivate the computation in cell 2 Subrules (12057522)not1205733 rarr 1205722go(12057523)not1205733 rarr 1205723go (120575

24)not1205734 rarr 1205724go and (12057525)not1205735 rarr 1205725go

are also executed to put objects 1205722 1205723 1205724 and 1205725 to cells 2and 6 to indicate that the itemsets 1198682 1198683 1198684 and 1198685 arefrequent 1-itemsets and to activate the computation in cell 2

Frequent 2-Itemsets Generation Within cell 2 rule 11990321 isexecuted to obtain all candidate frequent 2-itemsets Thedetection process of the candidate frequent 2-itemset 1198681 1198682is taken as an example Objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets Subrule()120572112057221205792not120573212 rarr 120573212 is executed to generate 120573212 The presenceof 120573212 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset and both 1198681 and 1198682 need to appear together in at least2 records for the itemset 1198681 1198682 to be a frequent 2-itemsetThedetection processes of the other candidate frequent 2-itemsetsare performed in the same way After the detection processesobjects 120573212 120573

213 120573214 120573215 120573223 120573224 120573225 120573234 120573235 and 120573245 are

generatedRule 11990322 is executed to delete the objects 1205721 1205722 1205723 1205724 and

1205725 that are not needed anymoreRule 11990324 is executed to detect all frequent 2-itemsets

Objects 11988611 and 11988612 11988641 and 11988642 11988681 and 11988682 and 11988691 and 11988692are in cell 2 which means the first the fourth the eighthand the ninth records contain both 1198681 and 1198682 The subrules(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 meet the execution condition andcan be executed Objects 11988621 and 11988622 11988631 and 11988632 11988651 and11988652 11988661 and 11988662 or 11988671 and 11988672 are not both in cell 2 whichmeans the second the third the fifth the sixth and theseventh records do not contain both 1198681 and 1198682 The subrules(12057312)1198862111988622 rarr 12057512 (12057312)1198863111988632 rarr 12057512 (12057312)1198865111988652 rarr 12057512(12057312)1198866111988662 rarr 12057512 and (12057312)1198867111988672 rarr 12057512 do not meetthe execution condition and cannot be executed Initially2 copies of 12057312 are in cell 2 indicating that both 1198681 and 1198682need to appear together in at least 2 records for the itemset1198681 1198682 to be a frequent 2-itemset Each execution of thesesubrules consumes one 12057312 Therefore 2 subrules among(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 can be executed After the execution2 copies of 12057312 are consumed and 2 copies of 12057512 are gener-ated The detection processes of other candidate frequent 2-itemsets are performed in the same way After the detectionprocesses objects 120573212 120573

213 12057314 120573

215 120573223 120573224 120573225 and 12057335 are

consumed and objects 120575212 120575213 12057514 120575

215 120575223 120575224 120575225 and 12057535

are generatedRule 11990323 is executed to process the results obtained by rule

11990324 The 2-itemset 1198681 1198682 is again taken as an example All ofthe 2 copies of 12057312 have been consumed by rule 11990324 subrule(120575212)not12057312 rarr 12057212go is executed to put an object 12057212 to cells3 and 6 to indicate that the itemset 1198681 1198682 is a frequent 2-itemset and to activate the computation in cell 3 Subrules(120575213)not12057313 rarr 12057213go (120575

215)not12057315 rarr 12057215go (120575

223)not12057323 rarr

12057223go (120575224)not12057324 rarr 12057224go and (120575225)not12057325 rarr 12057225go are also

executed which put objects 12057213 12057215 12057223 12057224 and 12057225 to cells

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

2 Discrete Dynamics in Nature and Society

perspective However the direct membrane algorithm needsto transform the whole traditional algorithm into P systemwhich is complex and difficult Up to date a few simplestudies on the direct membrane algorithm focus on thearithmetic operations the logic operations the generation ofgraphic language and clustering [17ndash21]

In this study a novel improved Apriori algorithm basedon an ECPI tissue-like P system (ECTPPI-Apriori) is pro-posed using the parallel mechanism in P systems The infor-mation communication between different computing unitsin ECTPPI-Apriori is implemented through the exchange ofmaterials between membranes Specifically all itemsets aresearched in parallel regulated by a set of promoters andinhibitors For a database with 119905 fields 119905 + 2 cells are usedin the algorithm where 1 cell is used to enter the data in thedatabase into the system 119905 cells are used to detect the frequentitemsets and one specific cell called output cell is used tostore the results The time complexity of ECTPPI-Apriori iscompared with those of other parallel Apriori algorithms toshow that the proposed algorithm is time saving

The contributions of this study are twofold From theviewpoint of data mining new bioinspired techniques areintroduced into frequent itemsets mining to improve theefficiency of the algorithms P systems are natural dis-tributed parallel computing devices which can improve thetime efficiency in computation Besides the hardware andsoftware implementations P systems can be implementedby biological methods The computing resources neededare only several cells which can decrease the computingresource requirements From the viewpoint of P systems theapplication areas of the new bioinspired devices P systems areextended to frequent itemset mining The applications basedon the direct membrane algorithms are limited This studyprovides a new application of P systems in frequent itemsetsmining which expands the application areas of the directmembrane algorithms

The paper is organized as follows Section 2 introducessome preliminaries about the Apriori algorithm and aboutthe ECPI tissue-like P systems The ECTPPI-Apriori algo-rithm using the parallel mechanism of the ECPI tissue-like Psystem is developed in Section 3 In Section 4 one illustrativeexample is used to show how the proposed algorithm worksComputational experiments using two datasets to show theperformance of the proposed algorithm in frequent itemsetsmining are reported in Section 5 Conclusions are given inSection 6

2 Preliminaries

In this section some basic concepts and notions in Apriorialgorithm [2] and ECPI tissue-like P system [7] are intro-duced

21 The Apriori Algorithm TheApriori algorithm is a typicalfrequent itemsetsmining algorithmproposed byAgrawal andSrikant [2] which aims at discovering relationships betweenitems in transactional databases

Definitions(i) Item a field in a transactional database is called

an item If one record contains a certain item ldquo1rdquootherwise ldquo0rdquo is placed in the corresponding field ofthe record in the transactional database

(ii) Itemset a set of items is called an itemset For nota-tional convenience an itemset with 119899 items 1198681 1198682 and 119868119899 is represented by 1198681 1198682 119868119899

(iii) h-itemset a set containing ℎ items is called a ℎ-itemset

(iv) Transaction a record in a transactional databaseis called a transaction and each transaction is anonempty itemset

(v) Support count the number of transactions containinga certain itemset is called the support count of theitemset Support count is also called the frequency orcount of the itemset

(vi) Frequent itemset if the support count of an itemset isequal to or larger than the given minimum supportcount threshold 119896 this itemset is called a frequentitemset

The general procedure of Apriori from Han et al [1] is asfollows

Input The database contains 119863 transactions and the supportcount threshold 119896

Step 1 Scan the database to compute the support count ofeach item and obtain the frequent 1-itemsets 1198711 Let ℎ = 2

Step 2 Obtain the candidate frequent ℎ-itemsets 119862ℎ byjoining two frequent (ℎ minus 1)-itemsets with only one itemdifferent

Step 3 Prune those itemsets which have infrequent subset oflength (ℎ minus 1) from the candidate frequent ℎ-itemsets

Step 4 Scan the database to compute the support countof each candidate frequent ℎ-itemset Delete those itemsetswhich do not meet the support count threshold 119896 and obtainthe frequent ℎ-itemsets 119871ℎ

Step 5 Let ℎ = ℎ + 1 Repeat Steps 2 to 4 until no itemsetmeets the support count threshold 119896

Output The collection of all frequent itemsets is representedby 119871

22 Evolution-CommunicationTissue-Like P Systemswith Pro-moters and Inhibitors Membrane computing is a new branchof natural computing which abstracts computing ideas fromthe construct and the functions of cells or tissues In thenature each organelle membrane or cell membrane worksas a relatively independent computing unit The amountand the types of materials in each organelle or cell changethrough chemical reactions Materials can flow between dif-ferent organelle or cell membranes to transport information

Discrete Dynamics in Nature and Society 3

Reactions in different organelles or cells take place in parallelwhile reactions in the same organelle or cell take place alsoin parallel These biological processes are abstracted as thecomputing processes of membrane computing The internalparallel feature makes membrane computing a powerfulcomputing method which has been proven to be equivalentto Turing machines [7ndash11]

The ECPI tissue-like P system composed of a networkof cells linked by synapses (channels) is a typical membranecomputing model The whole P system is divided intoseparate regions through these cells each forming one regionEach cell has two main components multisets of objects(materials) and rules also called evolution rules (chemicalreactions) Objects as information carriers are representedby characters

Rules regulate the ways objects evolve to new objectsand the ways objects in different cells communicate throughsynapses Rules are executed in nondeterministic flat maxi-mally parallel in each cell That is at any step if more thanone rule can be executed but the objects in the cell can onlysupport some of them then a maximal number of rules willbe executed and each rule can be executed for only once [22]

The computation halts if no rule can be executed in thewhole system The computational results are represented bythe types and numbers of specified objects in a specified cellBecause objects in a P system evolve in flatmaximally parallelregulated by promoters and inhibitors the systems computevery efficiently [10 22] Paun [7] provided more details aboutP systems

A formal description of the ECPI tissue-like P system isas follows

An ECPI tissue-like P system of degree119898 is of the form

Π = (119874 1205901 1205902 120590119898 syn 120588 119894out) (1)

where (1) 119874 represents the alphabets including all objects ofthe system (2) syn sube 1 2 119898times1 2 119898 represents allsynapses between the cells (3) 120588 defines the partial orderingrelationship of the rules that is rules with higher orders areexecuted with higher priority (4) 119894out represents the subscriptof the output cell where the computation results are placed(5) 1205901 120590119898 represent the119898 cells Each cell is of the form

120590ℎ = (119908ℎ0 119877ℎ) for 1 le ℎ le 119898 (2)

In (2)119908ℎ0 represents the initial objects in cell ℎ A119908ℎ0 =120582 means that there is no object in cell ℎ If 119886 representsan object 119886119899 represents the multiplicity of 119899 copies of suchobjects 119877ℎ in (2) represents a set of rules in cell ℎ with theform of 119908119911 rarr 119909119910go where 119908 is the multiset of objectsconsumed by the rule 119911 in the subscript is the promoter orthe inhibitor of the form 119911 = 1199111015840 or 119911 = not1199111015840 and 119909 and 119910 arethe multisets of objects generated by the rule A rule can beexecuted only when all objects in the promoter appear andcannot be executed when any objects in the inhibitor appearMultiset of objects 119909 stay in the current cell and multisetof objects 119910 go to the cells which have synapses connectedfrom the current cell The 119902th subset of rules in cell ℎ havingsimilar functions is represented by 119903ℎ119902 and the rules in thesame subset are connected by cup

0

1 2

middot middot middot

t

t + 1

Input

Figure 1 Cell structure for the ECTPPI-Apriori algorithm

3 The ECTPPI-Apriori Algorithm

In this section the structure of the P system used in theECTPPI-Apriori algorithm is presented first the computa-tional processes in different cells are then discussed in detaila pseudocode summarizing the operations is presented andan analysis of the algorithm complexity is provided

31 Algorithm and Rules Assume a transactional databasecontains119863 records and 119905 fields An object 119886119894119895 is generated onlyif the 119894th transaction contains the 119895th item 119868119895 (ie there is a1 in the corresponding field in the transactional database) Inthis way the database is transformed into objects a form thatthe P system can recognizeThe support count threshold is setto 119896 A cell structure with 119905 + 2 cells labeled by 0 1 119905 + 1as shown in Figure 1 is used as the framework for ECTPPI-Apriori The evolution rules are not shown in this figure dueto their length Transactional databases are usually sparseTherefore the number of objects represented by 119886119894119895 to beprocessed in this algorithm is much smaller than119863119905

When computation begins objects 119886119894119895 encoded from thetransactional database and object 120579119896 representing the supportcount threshold 119896 are entered into cell 0 Objects 119886119894119895 and120579119896 are passed to cells 1 2 119905 in parallel using a parallelevolution mechanism in tissue-like P systems The auxiliaryobjects 120573119896119895 are generated in cell 1 Next the frequent 1-itemsets are produced and objects representing frequent 1-itemsets are generated in cell 1 by executing the evolutionrules in parallel The objects representing the frequent 1-itemsets are passed to cells 2 and 119905 + 1 Cell 119905 + 1 isused to store the computational results The frequent 2-itemsets and the objects representing the frequent 2-itemsetsare produced in cell 2 by executing the evolution rules inparallel The objects representing the frequent 2-itemsets arepassed to cells 3 and 119905 + 1 This process continues until allfrequent itemsets have been produced As compared withthat of the conventional Apriori algorithm the computationaltime needed by ECTPPI-Apriori to generate the candidatefrequent ℎ-itemsets 119862ℎ and to compute the support countof each candidate frequent ℎ-itemset can be substantiallyreduced

4 Discrete Dynamics in Nature and Society

The ECPI tissue-like P system for ECTPPI-Apriori is asfollows

ΠApriori = (119874 1205900 1205901 120590119905+1 syn 120588 119894out) (3)

where (1) 119874 = 119886119894119895 120573119895 12057311989511198952 1205731198951 sdotsdotsdot119895119905 120575119895 12057511989511198952 1205751198951 sdotsdotsdot119895119905 12057211989512057211989511198952 1205721198951 sdotsdotsdot119895119905 for 1 le 119894 le 119863 1 le 119895 1198951 119895119905 le 119905 (2)syn = 0 1 0 2 0 119905 1 119905 + 1 2 119905 + 1 119905 119905 + 11 2 2 3 119905 minus 1 119905 (3) 120588 = 119903119894 gt 119903119895 | 119894 lt 119895 (4) 1205900 =(11990800 1198770) 1205901 = (11990810 1198771) 1205902 = (11990820 1198772) 120590119905 = (1199081199050 119877119905)(5) 119894out = 119905 + 1

In 1205900 = (11990800 1198770) 11990800 = 120582 and119877011990301 = 119886119894119895 rarr 119886119894119895go cup 120579119896 rarr 120579119896go

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205901 = (11990810 1198771) 11990810 = 120582 and119877111990311 = 120579119896 rarr 1205731198961 sdot sdot sdot 120573

119896119905

11990312 = 120575119901119895 120573119896minus119901119895 rarr 120582 cup (120575119896119895 )not120573119895 rarr 120572119895go

for 1 le 119895 le 119905 and 1 le 119901 le 11989611990313 = 119886119894119895120573119895 rarr 120575119895

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205902 = (11990820 1198772) 11990820 = 120582 and119877211990321 = ()12057211989511205721198952120579

119896not12057311989611989511198952rarr 12057311989611989511198952

for 1 le 1198951 lt 1198952 le 11990511990322 = 120572119895 rarr 120582 (1 le 119895 le 119905)

11990323 = 12057511990111989511198952120573119896minus11990111989511198952

rarr 120582 cup (12057511989611989511198952)not12057311989511198952rarr 12057211989511198952 go

for 1 le 1198951 lt 1198952 le 119905 and 1 le 119901 le 11989611990324 = (12057311989511198952)11988611989411989511198861198941198952

rarr 12057511989511198952

for 1 le 119894 le 119863 and 1 le 1198951 lt 1198952 le 119905

In 120590ℎ = (119908ℎ0 119877ℎ) 119908ℎ0 = 120582 and119877ℎ

119903ℎ1 = ()1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 1205721198951sdotsdotsdot119895ℎminus2119895ℎ2 120579119896not1205731198961198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr1205731198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le 119895ℎ1 119895ℎ2 le 119905119903ℎ2 = 1205721198951 sdotsdotsdot119895ℎminus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus1 le 119905

119903ℎ3 = 1205751199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2120573119896minus1199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)not1205731198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2 go

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 1 le 119895ℎ1 119895ℎ2 le 119905 and1 le 119901 le 119896

119903ℎ4 = (1205731198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)1198861198941198951 sdotsdotsdot119886119894119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205751198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le119895ℎ1 119895ℎ2 le 119905

In 120590119905 = (1199081199050 119877119905) 1199081199050 = 120582 and1198771199051199031199051 = ()1205721198951sdotsdotsdot119895119905minus21198951199051 1205721198951sdotsdotsdot119895119905minus21198951199052 120579

119896not1205731198961198951sdotsdotsdot119895119905minus21198951199051 1198951199052rarr 1205731198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le 1199051199031199052 = 1205721198951 sdotsdotsdot119895119905minus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus1 le 119905

1199031199053 = 1205751199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052120573119896minus1199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052)not1205731198951sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 1205721198951 sdotsdotsdot119895119905minus21198951199051 1198951199052 go

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 1 le 1198951199051 1198951199052 le 119905 and1 le 119901 le 1198961199031199054 = (1205731198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

)1198861198941198951 sdotsdotsdot119886119894119895119905minus21198951199051 1198951199052rarr 1205751198951 sdotsdotsdot119895tminus21198951199051 1198951199052

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le119905

In 120590119905+1 = (119908119905+10 119877119905+1) 119908119905+10 = 120582 and 119877119905+1 =

The auxiliary objects 120573119896119895 and 120575119895 for 1 le 119895 le 119905 are usedto detect the frequent 1-itemsets The auxiliary objects 120573119896119895store the items in the candidate frequent 1-itemsets using theirsubscripts For example object 1205731198961 means the itemset 1198681 is acandidate frequent 1-itemsetThe auxiliary objects 120575119895 are usedto identify the frequent 1-itemsets The 119896 copies of 120573119895 initiallyin cell 1 indicate that the 119895th item needs to appear in at least119896 records to make the itemset 119868119895 a frequent 1-itemset Oneobject 120573119895 is removed from and one object 120575119895 is generated incell 1 when one more of the 119895th item is detected in a recordTherefore if there is no object 120573119895 left and 119896 objects 120575119895 havebeen generated in cell 1 at least 119896 records have been found tocontain the 119895th item The functions of 12057311989611989511198952 in cells 2 120573119896119895111989521198953in cell 3 and 1205731198961198951 sdotsdotsdot119895119905 in cell 119905 are similar to that of 120573119896119895 in cell1 The functions of 12057511989511198952 in cell 2 120575119895111989521198953 in cell 3 and 1205751198951 sdotsdotsdot119895119905in cell 119905 are similar to that of 120575119895 in cell 1 The objects 120572119895 for1 le 119895 le 119905 are used to store the items in the frequent 1-itemsetsusing their subscript For example 1205721 means the itemset 1198681is a frequent 1-itemset The functions of 12057211989511198952 in cell 2 120572119895111989521198953in cell 3 and 1205721198951sdotsdotsdot119895119905 in cell 119905 are similar to that of 120572119895 in cell1

The evolution rules are object rewriting rules similar tochemical reactions They take objects transform them intoother objects and may transport them to other cells

32 Computing Process

Input Cell 0 is the input cell The objects 119886119894119895 encoded fromthe transactional database and objects 120579119896 representing the

Discrete Dynamics in Nature and Society 5

support count threshold 119896 are entered into cell 0 to activatethe computation process Rule 11990301 is executed to put copies of119886119894119895 and 120579119896 to cells 1 2 119905

Frequent 1-ItemsetsGenerationFrequent 1-itemsets are gener-ated in cell 1 Rule 11990311 is executed to generate 120573

119896119895 for 1 le 119895 le 119905

Rule 11990313 is executed to detect all frequent 1-itemsets using theinternal flat maximally parallel mechanism in the P systemRule 11990312 cannot be executed because no object 120575119895 is in cell 1at this time The detection process of the candidate frequent1-itemset 1198681 is taken as an example The detection processesof other candidate frequent 1-itemsets are performed in thesame way Rule 11990313 is actually composed of multiple subrulesworking on objects with different subscripts If object 1198861198941 is incell 1 which means the 119894th record contains the first item thesubrule 11988611989411205731 rarr 1205751meets the execution condition and canbe executed If object 1198861198941 is not in cell 1 which means the 119894threcord does not contain the first item the subrule 11988611989411205731 rarr1205751 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 119896 records for theitemset 1198681 to be a frequent 1-itemset Each execution of asubrule consumes one 1205731 Therefore at most 119896 subrules ofthe form 11988611989411205731 rarr 1205751 can be executed in nondeterministicflat maximally parallel The checking process continues untilall objects 119886119894119895 have been checked or all of the 119896 copies of 1205731have been consumed If all of the 119896 copies of 1205731 have beenconsumed the first item appeared in at least 119896 records andthe itemset 1198681 is a frequent 1-itemset If some copies of 1205731are still in this cell after all objects 119886119894119895 have been checked theitemset 1198681 is not a frequent 1-itemset

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example If 119901copies of 1205731 have been consumed by rule 11990313 and 119896minus119901 copiesof 1205731 are still in this cell subrule 1205751199011120573

119896minus1199011 rarr 120582 is executed to

delete the objects 1205751199011 and 120573119896minus1199011 If all of the 119896 copies of 1205731 have

been consumed subrule (1205751198961)not1205731 rarr 1205721go is executed to putan object 1205721 to cells 2 and 119905 + 1 to indicate that the itemset1198681 is a frequent 1-itemset and to activate the computation incell 2 If no 1-itemset is a frequent 1-itemset the computationhalts

Frequent 2-Itemsets Generation The frequent 2-itemsets aregenerated in cell 2 Rule 11990321 is executed to obtain all candidatefrequent 2-itemsets using the internal flat maximally parallelmechanism in the P system The pair of empty parenthesesin this subrule indicates that no objects are consumedwhen this rule is executed The detection process of thecandidate frequent 2-itemset 1198681 1198682 is taken as an exampleThe detection processes of the other candidate frequent 2-itemsets are performed in the same way Rule 11990321 is actuallycomposed of multiple subrules working on objects withdifferent subscripts If objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets subrule()12057211205722120579119896not12057311989612 rarr 12057311989612 is executed to generate 12057311989612 The presenceof 12057311989612 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset

Rule 11990322 is executed to delete the redundant objects120572119895 thatwere used by rule 11990321 but are not needed anymore

Rule 11990324 is executed to detect all frequent 2-itemsetsusing the internal flat maximally parallel mechanism in theP system Rule 11990323 cannot be executed because no object12057511989511198952 is in cell 2 at this time The detection process of thefrequent 2-itemset 1198681 1198682 is taken as an example Rule 11990324 isactually composed of multiple subrules working on objectswith different subscripts If objects 1198861198941 and 1198861198942 are in cell2 which means the 119894th record contains the first and thesecond items subrule (12057312)11988611989411198861198942 rarr 12057512meets the executioncondition and can be executed If objects 1198861198941 and 1198861198942 are notboth in cell 2 which means the 119894th record does not containboth the first and the second items subrule (12057312)11988611989411198861198942 rarr12057512 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 12057312 are in cell 2 indicating thatboth the first and the second items need to appear togetherin at least 119896 records for the itemset 1198681 1198682 to be a frequent 2-itemset Each execution of these subrules consumes one 12057312Therefore at most 119896 subrules of the form (12057312)11988611989411198861198942 rarr 12057512can be executed in nondeterministic flat maximally parallelThe checking process continues until all objects 119886119894119895 have beenchecked or all of the 119896 copies of 12057312 have been consumed Ifall of the 119896 copies of 12057312 have been consumed the first andthe second items appeared together in at least 119896 records andthe itemset 1198681 1198682 is a frequent 2-itemset If some copies of12057312 are still in this cell after rule 11990324 is executed the itemset1198681 1198682 is not a frequent 2-itemset

Rule 11990323 is executed to process the results obtained by rule11990324 The 2-itemset 1198681 1198682 is again taken as an example If 119901copies of 12057312 have been consumed by rule 11990324 and 119896minus119901 copiesof objects 12057312 are still in this cell subrule 12057511990112120573

119896minus11990112 rarr 120582

is executed to delete the objects 12057511990112 and 120573119896minus11990112 If all of the119896 copies of 12057312 have been consumed subrule (12057511989612)not12057312 rarr12057212go is executed to put an object 12057212 to cells 3 and 119905 + 1 toindicate that the itemset 1198681 1198682 is a frequent 2-itemset and toactivate the computation in cell 3 If no 2-itemset is a frequent2-itemset the computation halts

Each cell 119895 for 3 le 119895 le 119905 has 4 rules which are similar tothose in cell 2 Each cell 119895 performs similar functions as cell 2does but for frequent 119895-itemsets

After the computation halts all the results that is objectsrepresenting the identified frequent itemsets are stored in cell119905 + 1

33 AlgorithmFlow TheconventionalApriori algorithmexe-cutes sequentially ECTPPI-Apriori uses the parallel mecha-nism of the ECPI tissue-like P system to execute in parallel Apseudocode of ECTPPI-Apriori is shown as in Algorithm 1

34 Time Complexity The time complexity of ECTPPI-Apri-ori in theworst case is analyzed Initially 1 computational stepis needed to put copies of 119886119894119895 and 120579119896 to cells 1 2 119905

Generating the frequent 1-itemsets needs 3 computationalsteps Generating the candidate frequent 1-itemsets 1198621 needs1 computational step Finding the support counts of thecandidate frequent 1-itemsets needs 1 computational step All

6 Discrete Dynamics in Nature and Society

InputThe objects 119886119894119895 encoded from the transactional database and objects 120579119896 representing the support countthreshold 119896Rule 11990301Copy all 119886119894119895 and 120579119896 to cells 1 to 119905 + 1

MethodRule 11990311Generate 120573119896119895 for 1 le 119895 le 119905 to form the candidate frequent 1-itemsets 1198621Rule 11990313Scan each object 119886119894119895 in cell 1 to count the frequency of each item If 119886119894119895 is in cell 1 consume one 120573119895 andgenerate one 120575119895 Continue until all 119896 copies of 120573119895 have been consumed or all objects 119886119894119895 have been scannedRule 11990312If all 119896 copies of 120573119895 have been consumed generate an object 120572119895 to add 119868119895 to 1198711 as a frequent 1-itemsetand pass 120572119895 to cells 2 and 119905 + 1 Delete all remaining copies of 120573119895 and delete all copies of 120575119895

For (2 le ℎ le 119905 and 119871ℎminus1 = ) do the following in cell ℎRule 119903ℎ1Scan the objects 1205721198951 sdotsdotsdot119895ℎminus1 representing the frequent (ℎ minus 1)-itemsets 119871ℎminus1 to generate the objects 120573

1198961198951 sdotsdotsdot119895ℎ

representing the candidate frequent ℎ-itemsets 119862ℎRule 119903ℎ2Delete all objects 1205721198951 sdotsdotsdot119895ℎminus1 after they have been used by rule 119903ℎ1Rule 119903ℎ4Scan the objects 119886119894119895 representing the database to count the frequency of each candidate frequent ℎ-itemset119862ℎ If the objects 1198861198941198951 119886119894119895ℎminus1 and 119886119894119895ℎ are all in cell ℎ consume one 1205731198951 sdotsdotsdot119895ℎ and generate one 1205751198951 sdotsdotsdot119895ℎ Continue until all copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed or all objects 119886119894119895 have been scannedRule 119903ℎ3If all 119896 copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed generate 1205721198951 sdotsdotsdot119895ℎ to add 1198681198951 119868119895ℎ to 119871ℎ as a frequentℎ-itemset and put 1205721198951 sdotsdotsdot119895ℎ in cells ℎ + 1 and 119905 + 1 Delete all remaining copies of 1205731198951 sdotsdotsdot119895ℎ and delete all copiesof 1205751198951 sdotsdotsdot119895ℎ Let ℎ = ℎ + 1

OutputThe collection of all frequent itemsets 119871 encoded by objects 120572119895 12057211989511198952 1205721198951 sdotsdotsdot119895119899

Algorithm 1 ECTP-Apriori algorithm

candidate frequent 1-itemsets in the database are checked inthe flat maximally parallel Passing the results of the frequent1-itemsets to cells 2 and 119905 + 1 needs 1 computational step

Generating the frequent ℎ-itemsets (1 lt ℎ le 119905) needs4 computational steps Generating the candidate frequent ℎ-itemsets119862ℎ needs 1 computational step Cleaning thememoryused by the objects needs 1 computational step Finding thesupport counts of the candidate frequent ℎ-itemsets needs 1computational step All candidate frequent ℎ-itemsets in thedatabase are checked in flat maximally parallel Passing theresults of the frequent ℎ-itemsets to cells ℎ+1 and 119905 + 1 needs1 computational step

Therefore the time complexity of ECTPPI-Apriori is 1 +3 + 4(119905 minus 1) = 4119905 which gives 119874(119905) Note that 119874 is usedtraditionally to indicate the time complexity of an algorithmand the 119874 used here has a different meaning from that usedearlier when ECTPPI-Apriori is described

Some comparison results between ECTPPI-Apriori andthe original as well as some other improved parallel Apriorialgorithms are shown in Table 1 where |119862119896| is the number

Table 1 Time complexities of some Apriori algorithms

Algorithm Time complexityApriori [2] 119874 (119863119905 1003816100381610038161003816119862119896

1003816100381610038161003816 + 119905 1003816100381610038161003816119871119896minus110038161003816100381610038161003816100381610038161003816119871119896minus1

1003816100381610038161003816)

Apriori based on the booleanmatrix and Hadoop [3] 119874(119863119905)

Multi-GPU-based Apriori [4] 119874(119905|119871119896minus1||119871119896minus1|)

PApriori [5] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

Parallel Apriori algorithm onHadoop cluster [6] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

ECTPPI-Apriori 119874(119905)

of candidate frequent 119896-itemsets and |119871119896minus1| is the number offrequent 119896 minus 1-itemsets

4 An Illustrative Example

An illustrative example is presented in this section todemonstrate how ECTPPI-Apriori works Table 2 shows the

Discrete Dynamics in Nature and Society 7

Table 2 The transactional database of the illustrative example

TID ItemsT100 1198681 1198682 1198685T200 1198682 1198684T300 1198682 1198683T400 1198681 1198682 1198684T500 1198681 1198683T600 1198682 1198683T700 1198681 1198683T800 1198681 1198682 1198683 1198685T900 1198681 1198682 1198683

transactional database of one branch office of All Electronics[1] There are 9 transactions and 5 fields in this database thatis 119863 = 9 and 119905 = 5 Suppose the support count threshold is119896 = 2 The computational processes are as follows

Input The database is transformed into objects 11988611 11988612 1198861511988622 11988624 11988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 1198868111988682 11988683 11988685 11988691 11988692 and 11988693 in a form that the P systemcan recognize These objects and objects 1205792 representing thesupport count threshold 119896 = 2 are entered into cell 0 toactivate the computation process Rule 11990301 = 119886119894119895 rarr 119886119894119895go cup

120579119896 rarr 120579119896go is executed to put copies of 11988611 11988612 11988615 11988622 1198862411988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 11988681 11988682 11988683 1198868511988691 11988692 11988693 and 1205792 to cells 1 sdot sdot sdot 5

Frequent 1-Itemsets Generation Within cell 1 the auxiliaryobjects 1205732119895 for 1 le 119895 le 5 are created by rule 11990311 to indicatethat each item 119895 needs to appear in at least 119896 = 2 records forit to be a frequent 1-itemset Rule 11990313 is executed to detect allfrequent 1-itemsets in flat maximally parallel The detectionprocess of the candidate frequent 1-itemset 1198681 is taken asan example Objects 11988611 11988641 11988651 11988671 11988681 and 11988691 are in cell1 which means the first the fourth the fifth the sevenththe eighth and the ninth records contain 1198681 The subrules119886111205731 rarr 1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751119886811205731 rarr 1205751 and 119886911205731 rarr 1205751meet the execution conditionand can be executed Objects 11988621 11988631 and 11988661 are not in cell 1which means the second the third and the sixth records donot contain 1198681 The subrules 119886211205731 rarr 1205751 119886311205731 rarr 1205751 and119886611205731 rarr 1205751donotmeet the execution condition and cannotbe executed Initially 2 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 2 records to make theitemset 1198681 a frequent 1-itemset Each execution of a subruleconsumes one 1205731 Therefore 2 of subrules among 119886111205731 rarr1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751 119886811205731 rarr 1205751and 119886911205731 rarr 1205751 can be executed in nondeterministic flatmaximally parallelThrough the execution of 2 such subrulesboth of the 2 copies of 1205731 are consumed and 2 copies of1205751 are generated The detection processes of other candidatefrequent 1-itemsets are performed in the same way After thedetection processes 12057321 120573

22 12057323 12057324 and 120573

25 are consumed and

12057521 12057522 12057523 12057524 and 12057525 are generated

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example All of

the 2 copies of 1205731 have been consumed subrule (12057521)not1205731 rarr1205721go is executed to put an object 1205721 to cells 2 and 6 toindicate that the itemset 1198681 is a frequent 1-itemset and toactivate the computation in cell 2 Subrules (12057522)not1205733 rarr 1205722go(12057523)not1205733 rarr 1205723go (120575

24)not1205734 rarr 1205724go and (12057525)not1205735 rarr 1205725go

are also executed to put objects 1205722 1205723 1205724 and 1205725 to cells 2and 6 to indicate that the itemsets 1198682 1198683 1198684 and 1198685 arefrequent 1-itemsets and to activate the computation in cell 2

Frequent 2-Itemsets Generation Within cell 2 rule 11990321 isexecuted to obtain all candidate frequent 2-itemsets Thedetection process of the candidate frequent 2-itemset 1198681 1198682is taken as an example Objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets Subrule()120572112057221205792not120573212 rarr 120573212 is executed to generate 120573212 The presenceof 120573212 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset and both 1198681 and 1198682 need to appear together in at least2 records for the itemset 1198681 1198682 to be a frequent 2-itemsetThedetection processes of the other candidate frequent 2-itemsetsare performed in the same way After the detection processesobjects 120573212 120573

213 120573214 120573215 120573223 120573224 120573225 120573234 120573235 and 120573245 are

generatedRule 11990322 is executed to delete the objects 1205721 1205722 1205723 1205724 and

1205725 that are not needed anymoreRule 11990324 is executed to detect all frequent 2-itemsets

Objects 11988611 and 11988612 11988641 and 11988642 11988681 and 11988682 and 11988691 and 11988692are in cell 2 which means the first the fourth the eighthand the ninth records contain both 1198681 and 1198682 The subrules(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 meet the execution condition andcan be executed Objects 11988621 and 11988622 11988631 and 11988632 11988651 and11988652 11988661 and 11988662 or 11988671 and 11988672 are not both in cell 2 whichmeans the second the third the fifth the sixth and theseventh records do not contain both 1198681 and 1198682 The subrules(12057312)1198862111988622 rarr 12057512 (12057312)1198863111988632 rarr 12057512 (12057312)1198865111988652 rarr 12057512(12057312)1198866111988662 rarr 12057512 and (12057312)1198867111988672 rarr 12057512 do not meetthe execution condition and cannot be executed Initially2 copies of 12057312 are in cell 2 indicating that both 1198681 and 1198682need to appear together in at least 2 records for the itemset1198681 1198682 to be a frequent 2-itemset Each execution of thesesubrules consumes one 12057312 Therefore 2 subrules among(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 can be executed After the execution2 copies of 12057312 are consumed and 2 copies of 12057512 are gener-ated The detection processes of other candidate frequent 2-itemsets are performed in the same way After the detectionprocesses objects 120573212 120573

213 12057314 120573

215 120573223 120573224 120573225 and 12057335 are

consumed and objects 120575212 120575213 12057514 120575

215 120575223 120575224 120575225 and 12057535

are generatedRule 11990323 is executed to process the results obtained by rule

11990324 The 2-itemset 1198681 1198682 is again taken as an example All ofthe 2 copies of 12057312 have been consumed by rule 11990324 subrule(120575212)not12057312 rarr 12057212go is executed to put an object 12057212 to cells3 and 6 to indicate that the itemset 1198681 1198682 is a frequent 2-itemset and to activate the computation in cell 3 Subrules(120575213)not12057313 rarr 12057213go (120575

215)not12057315 rarr 12057215go (120575

223)not12057323 rarr

12057223go (120575224)not12057324 rarr 12057224go and (120575225)not12057325 rarr 12057225go are also

executed which put objects 12057213 12057215 12057223 12057224 and 12057225 to cells

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

Discrete Dynamics in Nature and Society 3

Reactions in different organelles or cells take place in parallelwhile reactions in the same organelle or cell take place alsoin parallel These biological processes are abstracted as thecomputing processes of membrane computing The internalparallel feature makes membrane computing a powerfulcomputing method which has been proven to be equivalentto Turing machines [7ndash11]

The ECPI tissue-like P system composed of a networkof cells linked by synapses (channels) is a typical membranecomputing model The whole P system is divided intoseparate regions through these cells each forming one regionEach cell has two main components multisets of objects(materials) and rules also called evolution rules (chemicalreactions) Objects as information carriers are representedby characters

Rules regulate the ways objects evolve to new objectsand the ways objects in different cells communicate throughsynapses Rules are executed in nondeterministic flat maxi-mally parallel in each cell That is at any step if more thanone rule can be executed but the objects in the cell can onlysupport some of them then a maximal number of rules willbe executed and each rule can be executed for only once [22]

The computation halts if no rule can be executed in thewhole system The computational results are represented bythe types and numbers of specified objects in a specified cellBecause objects in a P system evolve in flatmaximally parallelregulated by promoters and inhibitors the systems computevery efficiently [10 22] Paun [7] provided more details aboutP systems

A formal description of the ECPI tissue-like P system isas follows

An ECPI tissue-like P system of degree119898 is of the form

Π = (119874 1205901 1205902 120590119898 syn 120588 119894out) (1)

where (1) 119874 represents the alphabets including all objects ofthe system (2) syn sube 1 2 119898times1 2 119898 represents allsynapses between the cells (3) 120588 defines the partial orderingrelationship of the rules that is rules with higher orders areexecuted with higher priority (4) 119894out represents the subscriptof the output cell where the computation results are placed(5) 1205901 120590119898 represent the119898 cells Each cell is of the form

120590ℎ = (119908ℎ0 119877ℎ) for 1 le ℎ le 119898 (2)

In (2)119908ℎ0 represents the initial objects in cell ℎ A119908ℎ0 =120582 means that there is no object in cell ℎ If 119886 representsan object 119886119899 represents the multiplicity of 119899 copies of suchobjects 119877ℎ in (2) represents a set of rules in cell ℎ with theform of 119908119911 rarr 119909119910go where 119908 is the multiset of objectsconsumed by the rule 119911 in the subscript is the promoter orthe inhibitor of the form 119911 = 1199111015840 or 119911 = not1199111015840 and 119909 and 119910 arethe multisets of objects generated by the rule A rule can beexecuted only when all objects in the promoter appear andcannot be executed when any objects in the inhibitor appearMultiset of objects 119909 stay in the current cell and multisetof objects 119910 go to the cells which have synapses connectedfrom the current cell The 119902th subset of rules in cell ℎ havingsimilar functions is represented by 119903ℎ119902 and the rules in thesame subset are connected by cup

0

1 2

middot middot middot

t

t + 1

Input

Figure 1 Cell structure for the ECTPPI-Apriori algorithm

3 The ECTPPI-Apriori Algorithm

In this section the structure of the P system used in theECTPPI-Apriori algorithm is presented first the computa-tional processes in different cells are then discussed in detaila pseudocode summarizing the operations is presented andan analysis of the algorithm complexity is provided

31 Algorithm and Rules Assume a transactional databasecontains119863 records and 119905 fields An object 119886119894119895 is generated onlyif the 119894th transaction contains the 119895th item 119868119895 (ie there is a1 in the corresponding field in the transactional database) Inthis way the database is transformed into objects a form thatthe P system can recognizeThe support count threshold is setto 119896 A cell structure with 119905 + 2 cells labeled by 0 1 119905 + 1as shown in Figure 1 is used as the framework for ECTPPI-Apriori The evolution rules are not shown in this figure dueto their length Transactional databases are usually sparseTherefore the number of objects represented by 119886119894119895 to beprocessed in this algorithm is much smaller than119863119905

When computation begins objects 119886119894119895 encoded from thetransactional database and object 120579119896 representing the supportcount threshold 119896 are entered into cell 0 Objects 119886119894119895 and120579119896 are passed to cells 1 2 119905 in parallel using a parallelevolution mechanism in tissue-like P systems The auxiliaryobjects 120573119896119895 are generated in cell 1 Next the frequent 1-itemsets are produced and objects representing frequent 1-itemsets are generated in cell 1 by executing the evolutionrules in parallel The objects representing the frequent 1-itemsets are passed to cells 2 and 119905 + 1 Cell 119905 + 1 isused to store the computational results The frequent 2-itemsets and the objects representing the frequent 2-itemsetsare produced in cell 2 by executing the evolution rules inparallel The objects representing the frequent 2-itemsets arepassed to cells 3 and 119905 + 1 This process continues until allfrequent itemsets have been produced As compared withthat of the conventional Apriori algorithm the computationaltime needed by ECTPPI-Apriori to generate the candidatefrequent ℎ-itemsets 119862ℎ and to compute the support countof each candidate frequent ℎ-itemset can be substantiallyreduced

4 Discrete Dynamics in Nature and Society

The ECPI tissue-like P system for ECTPPI-Apriori is asfollows

ΠApriori = (119874 1205900 1205901 120590119905+1 syn 120588 119894out) (3)

where (1) 119874 = 119886119894119895 120573119895 12057311989511198952 1205731198951 sdotsdotsdot119895119905 120575119895 12057511989511198952 1205751198951 sdotsdotsdot119895119905 12057211989512057211989511198952 1205721198951 sdotsdotsdot119895119905 for 1 le 119894 le 119863 1 le 119895 1198951 119895119905 le 119905 (2)syn = 0 1 0 2 0 119905 1 119905 + 1 2 119905 + 1 119905 119905 + 11 2 2 3 119905 minus 1 119905 (3) 120588 = 119903119894 gt 119903119895 | 119894 lt 119895 (4) 1205900 =(11990800 1198770) 1205901 = (11990810 1198771) 1205902 = (11990820 1198772) 120590119905 = (1199081199050 119877119905)(5) 119894out = 119905 + 1

In 1205900 = (11990800 1198770) 11990800 = 120582 and119877011990301 = 119886119894119895 rarr 119886119894119895go cup 120579119896 rarr 120579119896go

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205901 = (11990810 1198771) 11990810 = 120582 and119877111990311 = 120579119896 rarr 1205731198961 sdot sdot sdot 120573

119896119905

11990312 = 120575119901119895 120573119896minus119901119895 rarr 120582 cup (120575119896119895 )not120573119895 rarr 120572119895go

for 1 le 119895 le 119905 and 1 le 119901 le 11989611990313 = 119886119894119895120573119895 rarr 120575119895

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205902 = (11990820 1198772) 11990820 = 120582 and119877211990321 = ()12057211989511205721198952120579

119896not12057311989611989511198952rarr 12057311989611989511198952

for 1 le 1198951 lt 1198952 le 11990511990322 = 120572119895 rarr 120582 (1 le 119895 le 119905)

11990323 = 12057511990111989511198952120573119896minus11990111989511198952

rarr 120582 cup (12057511989611989511198952)not12057311989511198952rarr 12057211989511198952 go

for 1 le 1198951 lt 1198952 le 119905 and 1 le 119901 le 11989611990324 = (12057311989511198952)11988611989411989511198861198941198952

rarr 12057511989511198952

for 1 le 119894 le 119863 and 1 le 1198951 lt 1198952 le 119905

In 120590ℎ = (119908ℎ0 119877ℎ) 119908ℎ0 = 120582 and119877ℎ

119903ℎ1 = ()1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 1205721198951sdotsdotsdot119895ℎminus2119895ℎ2 120579119896not1205731198961198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr1205731198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le 119895ℎ1 119895ℎ2 le 119905119903ℎ2 = 1205721198951 sdotsdotsdot119895ℎminus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus1 le 119905

119903ℎ3 = 1205751199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2120573119896minus1199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)not1205731198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2 go

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 1 le 119895ℎ1 119895ℎ2 le 119905 and1 le 119901 le 119896

119903ℎ4 = (1205731198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)1198861198941198951 sdotsdotsdot119886119894119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205751198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le119895ℎ1 119895ℎ2 le 119905

In 120590119905 = (1199081199050 119877119905) 1199081199050 = 120582 and1198771199051199031199051 = ()1205721198951sdotsdotsdot119895119905minus21198951199051 1205721198951sdotsdotsdot119895119905minus21198951199052 120579

119896not1205731198961198951sdotsdotsdot119895119905minus21198951199051 1198951199052rarr 1205731198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le 1199051199031199052 = 1205721198951 sdotsdotsdot119895119905minus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus1 le 119905

1199031199053 = 1205751199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052120573119896minus1199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052)not1205731198951sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 1205721198951 sdotsdotsdot119895119905minus21198951199051 1198951199052 go

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 1 le 1198951199051 1198951199052 le 119905 and1 le 119901 le 1198961199031199054 = (1205731198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

)1198861198941198951 sdotsdotsdot119886119894119895119905minus21198951199051 1198951199052rarr 1205751198951 sdotsdotsdot119895tminus21198951199051 1198951199052

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le119905

In 120590119905+1 = (119908119905+10 119877119905+1) 119908119905+10 = 120582 and 119877119905+1 =

The auxiliary objects 120573119896119895 and 120575119895 for 1 le 119895 le 119905 are usedto detect the frequent 1-itemsets The auxiliary objects 120573119896119895store the items in the candidate frequent 1-itemsets using theirsubscripts For example object 1205731198961 means the itemset 1198681 is acandidate frequent 1-itemsetThe auxiliary objects 120575119895 are usedto identify the frequent 1-itemsets The 119896 copies of 120573119895 initiallyin cell 1 indicate that the 119895th item needs to appear in at least119896 records to make the itemset 119868119895 a frequent 1-itemset Oneobject 120573119895 is removed from and one object 120575119895 is generated incell 1 when one more of the 119895th item is detected in a recordTherefore if there is no object 120573119895 left and 119896 objects 120575119895 havebeen generated in cell 1 at least 119896 records have been found tocontain the 119895th item The functions of 12057311989611989511198952 in cells 2 120573119896119895111989521198953in cell 3 and 1205731198961198951 sdotsdotsdot119895119905 in cell 119905 are similar to that of 120573119896119895 in cell1 The functions of 12057511989511198952 in cell 2 120575119895111989521198953 in cell 3 and 1205751198951 sdotsdotsdot119895119905in cell 119905 are similar to that of 120575119895 in cell 1 The objects 120572119895 for1 le 119895 le 119905 are used to store the items in the frequent 1-itemsetsusing their subscript For example 1205721 means the itemset 1198681is a frequent 1-itemset The functions of 12057211989511198952 in cell 2 120572119895111989521198953in cell 3 and 1205721198951sdotsdotsdot119895119905 in cell 119905 are similar to that of 120572119895 in cell1

The evolution rules are object rewriting rules similar tochemical reactions They take objects transform them intoother objects and may transport them to other cells

32 Computing Process

Input Cell 0 is the input cell The objects 119886119894119895 encoded fromthe transactional database and objects 120579119896 representing the

Discrete Dynamics in Nature and Society 5

support count threshold 119896 are entered into cell 0 to activatethe computation process Rule 11990301 is executed to put copies of119886119894119895 and 120579119896 to cells 1 2 119905

Frequent 1-ItemsetsGenerationFrequent 1-itemsets are gener-ated in cell 1 Rule 11990311 is executed to generate 120573

119896119895 for 1 le 119895 le 119905

Rule 11990313 is executed to detect all frequent 1-itemsets using theinternal flat maximally parallel mechanism in the P systemRule 11990312 cannot be executed because no object 120575119895 is in cell 1at this time The detection process of the candidate frequent1-itemset 1198681 is taken as an example The detection processesof other candidate frequent 1-itemsets are performed in thesame way Rule 11990313 is actually composed of multiple subrulesworking on objects with different subscripts If object 1198861198941 is incell 1 which means the 119894th record contains the first item thesubrule 11988611989411205731 rarr 1205751meets the execution condition and canbe executed If object 1198861198941 is not in cell 1 which means the 119894threcord does not contain the first item the subrule 11988611989411205731 rarr1205751 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 119896 records for theitemset 1198681 to be a frequent 1-itemset Each execution of asubrule consumes one 1205731 Therefore at most 119896 subrules ofthe form 11988611989411205731 rarr 1205751 can be executed in nondeterministicflat maximally parallel The checking process continues untilall objects 119886119894119895 have been checked or all of the 119896 copies of 1205731have been consumed If all of the 119896 copies of 1205731 have beenconsumed the first item appeared in at least 119896 records andthe itemset 1198681 is a frequent 1-itemset If some copies of 1205731are still in this cell after all objects 119886119894119895 have been checked theitemset 1198681 is not a frequent 1-itemset

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example If 119901copies of 1205731 have been consumed by rule 11990313 and 119896minus119901 copiesof 1205731 are still in this cell subrule 1205751199011120573

119896minus1199011 rarr 120582 is executed to

delete the objects 1205751199011 and 120573119896minus1199011 If all of the 119896 copies of 1205731 have

been consumed subrule (1205751198961)not1205731 rarr 1205721go is executed to putan object 1205721 to cells 2 and 119905 + 1 to indicate that the itemset1198681 is a frequent 1-itemset and to activate the computation incell 2 If no 1-itemset is a frequent 1-itemset the computationhalts

Frequent 2-Itemsets Generation The frequent 2-itemsets aregenerated in cell 2 Rule 11990321 is executed to obtain all candidatefrequent 2-itemsets using the internal flat maximally parallelmechanism in the P system The pair of empty parenthesesin this subrule indicates that no objects are consumedwhen this rule is executed The detection process of thecandidate frequent 2-itemset 1198681 1198682 is taken as an exampleThe detection processes of the other candidate frequent 2-itemsets are performed in the same way Rule 11990321 is actuallycomposed of multiple subrules working on objects withdifferent subscripts If objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets subrule()12057211205722120579119896not12057311989612 rarr 12057311989612 is executed to generate 12057311989612 The presenceof 12057311989612 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset

Rule 11990322 is executed to delete the redundant objects120572119895 thatwere used by rule 11990321 but are not needed anymore

Rule 11990324 is executed to detect all frequent 2-itemsetsusing the internal flat maximally parallel mechanism in theP system Rule 11990323 cannot be executed because no object12057511989511198952 is in cell 2 at this time The detection process of thefrequent 2-itemset 1198681 1198682 is taken as an example Rule 11990324 isactually composed of multiple subrules working on objectswith different subscripts If objects 1198861198941 and 1198861198942 are in cell2 which means the 119894th record contains the first and thesecond items subrule (12057312)11988611989411198861198942 rarr 12057512meets the executioncondition and can be executed If objects 1198861198941 and 1198861198942 are notboth in cell 2 which means the 119894th record does not containboth the first and the second items subrule (12057312)11988611989411198861198942 rarr12057512 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 12057312 are in cell 2 indicating thatboth the first and the second items need to appear togetherin at least 119896 records for the itemset 1198681 1198682 to be a frequent 2-itemset Each execution of these subrules consumes one 12057312Therefore at most 119896 subrules of the form (12057312)11988611989411198861198942 rarr 12057512can be executed in nondeterministic flat maximally parallelThe checking process continues until all objects 119886119894119895 have beenchecked or all of the 119896 copies of 12057312 have been consumed Ifall of the 119896 copies of 12057312 have been consumed the first andthe second items appeared together in at least 119896 records andthe itemset 1198681 1198682 is a frequent 2-itemset If some copies of12057312 are still in this cell after rule 11990324 is executed the itemset1198681 1198682 is not a frequent 2-itemset

Rule 11990323 is executed to process the results obtained by rule11990324 The 2-itemset 1198681 1198682 is again taken as an example If 119901copies of 12057312 have been consumed by rule 11990324 and 119896minus119901 copiesof objects 12057312 are still in this cell subrule 12057511990112120573

119896minus11990112 rarr 120582

is executed to delete the objects 12057511990112 and 120573119896minus11990112 If all of the119896 copies of 12057312 have been consumed subrule (12057511989612)not12057312 rarr12057212go is executed to put an object 12057212 to cells 3 and 119905 + 1 toindicate that the itemset 1198681 1198682 is a frequent 2-itemset and toactivate the computation in cell 3 If no 2-itemset is a frequent2-itemset the computation halts

Each cell 119895 for 3 le 119895 le 119905 has 4 rules which are similar tothose in cell 2 Each cell 119895 performs similar functions as cell 2does but for frequent 119895-itemsets

After the computation halts all the results that is objectsrepresenting the identified frequent itemsets are stored in cell119905 + 1

33 AlgorithmFlow TheconventionalApriori algorithmexe-cutes sequentially ECTPPI-Apriori uses the parallel mecha-nism of the ECPI tissue-like P system to execute in parallel Apseudocode of ECTPPI-Apriori is shown as in Algorithm 1

34 Time Complexity The time complexity of ECTPPI-Apri-ori in theworst case is analyzed Initially 1 computational stepis needed to put copies of 119886119894119895 and 120579119896 to cells 1 2 119905

Generating the frequent 1-itemsets needs 3 computationalsteps Generating the candidate frequent 1-itemsets 1198621 needs1 computational step Finding the support counts of thecandidate frequent 1-itemsets needs 1 computational step All

6 Discrete Dynamics in Nature and Society

InputThe objects 119886119894119895 encoded from the transactional database and objects 120579119896 representing the support countthreshold 119896Rule 11990301Copy all 119886119894119895 and 120579119896 to cells 1 to 119905 + 1

MethodRule 11990311Generate 120573119896119895 for 1 le 119895 le 119905 to form the candidate frequent 1-itemsets 1198621Rule 11990313Scan each object 119886119894119895 in cell 1 to count the frequency of each item If 119886119894119895 is in cell 1 consume one 120573119895 andgenerate one 120575119895 Continue until all 119896 copies of 120573119895 have been consumed or all objects 119886119894119895 have been scannedRule 11990312If all 119896 copies of 120573119895 have been consumed generate an object 120572119895 to add 119868119895 to 1198711 as a frequent 1-itemsetand pass 120572119895 to cells 2 and 119905 + 1 Delete all remaining copies of 120573119895 and delete all copies of 120575119895

For (2 le ℎ le 119905 and 119871ℎminus1 = ) do the following in cell ℎRule 119903ℎ1Scan the objects 1205721198951 sdotsdotsdot119895ℎminus1 representing the frequent (ℎ minus 1)-itemsets 119871ℎminus1 to generate the objects 120573

1198961198951 sdotsdotsdot119895ℎ

representing the candidate frequent ℎ-itemsets 119862ℎRule 119903ℎ2Delete all objects 1205721198951 sdotsdotsdot119895ℎminus1 after they have been used by rule 119903ℎ1Rule 119903ℎ4Scan the objects 119886119894119895 representing the database to count the frequency of each candidate frequent ℎ-itemset119862ℎ If the objects 1198861198941198951 119886119894119895ℎminus1 and 119886119894119895ℎ are all in cell ℎ consume one 1205731198951 sdotsdotsdot119895ℎ and generate one 1205751198951 sdotsdotsdot119895ℎ Continue until all copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed or all objects 119886119894119895 have been scannedRule 119903ℎ3If all 119896 copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed generate 1205721198951 sdotsdotsdot119895ℎ to add 1198681198951 119868119895ℎ to 119871ℎ as a frequentℎ-itemset and put 1205721198951 sdotsdotsdot119895ℎ in cells ℎ + 1 and 119905 + 1 Delete all remaining copies of 1205731198951 sdotsdotsdot119895ℎ and delete all copiesof 1205751198951 sdotsdotsdot119895ℎ Let ℎ = ℎ + 1

OutputThe collection of all frequent itemsets 119871 encoded by objects 120572119895 12057211989511198952 1205721198951 sdotsdotsdot119895119899

Algorithm 1 ECTP-Apriori algorithm

candidate frequent 1-itemsets in the database are checked inthe flat maximally parallel Passing the results of the frequent1-itemsets to cells 2 and 119905 + 1 needs 1 computational step

Generating the frequent ℎ-itemsets (1 lt ℎ le 119905) needs4 computational steps Generating the candidate frequent ℎ-itemsets119862ℎ needs 1 computational step Cleaning thememoryused by the objects needs 1 computational step Finding thesupport counts of the candidate frequent ℎ-itemsets needs 1computational step All candidate frequent ℎ-itemsets in thedatabase are checked in flat maximally parallel Passing theresults of the frequent ℎ-itemsets to cells ℎ+1 and 119905 + 1 needs1 computational step

Therefore the time complexity of ECTPPI-Apriori is 1 +3 + 4(119905 minus 1) = 4119905 which gives 119874(119905) Note that 119874 is usedtraditionally to indicate the time complexity of an algorithmand the 119874 used here has a different meaning from that usedearlier when ECTPPI-Apriori is described

Some comparison results between ECTPPI-Apriori andthe original as well as some other improved parallel Apriorialgorithms are shown in Table 1 where |119862119896| is the number

Table 1 Time complexities of some Apriori algorithms

Algorithm Time complexityApriori [2] 119874 (119863119905 1003816100381610038161003816119862119896

1003816100381610038161003816 + 119905 1003816100381610038161003816119871119896minus110038161003816100381610038161003816100381610038161003816119871119896minus1

1003816100381610038161003816)

Apriori based on the booleanmatrix and Hadoop [3] 119874(119863119905)

Multi-GPU-based Apriori [4] 119874(119905|119871119896minus1||119871119896minus1|)

PApriori [5] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

Parallel Apriori algorithm onHadoop cluster [6] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

ECTPPI-Apriori 119874(119905)

of candidate frequent 119896-itemsets and |119871119896minus1| is the number offrequent 119896 minus 1-itemsets

4 An Illustrative Example

An illustrative example is presented in this section todemonstrate how ECTPPI-Apriori works Table 2 shows the

Discrete Dynamics in Nature and Society 7

Table 2 The transactional database of the illustrative example

TID ItemsT100 1198681 1198682 1198685T200 1198682 1198684T300 1198682 1198683T400 1198681 1198682 1198684T500 1198681 1198683T600 1198682 1198683T700 1198681 1198683T800 1198681 1198682 1198683 1198685T900 1198681 1198682 1198683

transactional database of one branch office of All Electronics[1] There are 9 transactions and 5 fields in this database thatis 119863 = 9 and 119905 = 5 Suppose the support count threshold is119896 = 2 The computational processes are as follows

Input The database is transformed into objects 11988611 11988612 1198861511988622 11988624 11988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 1198868111988682 11988683 11988685 11988691 11988692 and 11988693 in a form that the P systemcan recognize These objects and objects 1205792 representing thesupport count threshold 119896 = 2 are entered into cell 0 toactivate the computation process Rule 11990301 = 119886119894119895 rarr 119886119894119895go cup

120579119896 rarr 120579119896go is executed to put copies of 11988611 11988612 11988615 11988622 1198862411988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 11988681 11988682 11988683 1198868511988691 11988692 11988693 and 1205792 to cells 1 sdot sdot sdot 5

Frequent 1-Itemsets Generation Within cell 1 the auxiliaryobjects 1205732119895 for 1 le 119895 le 5 are created by rule 11990311 to indicatethat each item 119895 needs to appear in at least 119896 = 2 records forit to be a frequent 1-itemset Rule 11990313 is executed to detect allfrequent 1-itemsets in flat maximally parallel The detectionprocess of the candidate frequent 1-itemset 1198681 is taken asan example Objects 11988611 11988641 11988651 11988671 11988681 and 11988691 are in cell1 which means the first the fourth the fifth the sevenththe eighth and the ninth records contain 1198681 The subrules119886111205731 rarr 1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751119886811205731 rarr 1205751 and 119886911205731 rarr 1205751meet the execution conditionand can be executed Objects 11988621 11988631 and 11988661 are not in cell 1which means the second the third and the sixth records donot contain 1198681 The subrules 119886211205731 rarr 1205751 119886311205731 rarr 1205751 and119886611205731 rarr 1205751donotmeet the execution condition and cannotbe executed Initially 2 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 2 records to make theitemset 1198681 a frequent 1-itemset Each execution of a subruleconsumes one 1205731 Therefore 2 of subrules among 119886111205731 rarr1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751 119886811205731 rarr 1205751and 119886911205731 rarr 1205751 can be executed in nondeterministic flatmaximally parallelThrough the execution of 2 such subrulesboth of the 2 copies of 1205731 are consumed and 2 copies of1205751 are generated The detection processes of other candidatefrequent 1-itemsets are performed in the same way After thedetection processes 12057321 120573

22 12057323 12057324 and 120573

25 are consumed and

12057521 12057522 12057523 12057524 and 12057525 are generated

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example All of

the 2 copies of 1205731 have been consumed subrule (12057521)not1205731 rarr1205721go is executed to put an object 1205721 to cells 2 and 6 toindicate that the itemset 1198681 is a frequent 1-itemset and toactivate the computation in cell 2 Subrules (12057522)not1205733 rarr 1205722go(12057523)not1205733 rarr 1205723go (120575

24)not1205734 rarr 1205724go and (12057525)not1205735 rarr 1205725go

are also executed to put objects 1205722 1205723 1205724 and 1205725 to cells 2and 6 to indicate that the itemsets 1198682 1198683 1198684 and 1198685 arefrequent 1-itemsets and to activate the computation in cell 2

Frequent 2-Itemsets Generation Within cell 2 rule 11990321 isexecuted to obtain all candidate frequent 2-itemsets Thedetection process of the candidate frequent 2-itemset 1198681 1198682is taken as an example Objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets Subrule()120572112057221205792not120573212 rarr 120573212 is executed to generate 120573212 The presenceof 120573212 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset and both 1198681 and 1198682 need to appear together in at least2 records for the itemset 1198681 1198682 to be a frequent 2-itemsetThedetection processes of the other candidate frequent 2-itemsetsare performed in the same way After the detection processesobjects 120573212 120573

213 120573214 120573215 120573223 120573224 120573225 120573234 120573235 and 120573245 are

generatedRule 11990322 is executed to delete the objects 1205721 1205722 1205723 1205724 and

1205725 that are not needed anymoreRule 11990324 is executed to detect all frequent 2-itemsets

Objects 11988611 and 11988612 11988641 and 11988642 11988681 and 11988682 and 11988691 and 11988692are in cell 2 which means the first the fourth the eighthand the ninth records contain both 1198681 and 1198682 The subrules(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 meet the execution condition andcan be executed Objects 11988621 and 11988622 11988631 and 11988632 11988651 and11988652 11988661 and 11988662 or 11988671 and 11988672 are not both in cell 2 whichmeans the second the third the fifth the sixth and theseventh records do not contain both 1198681 and 1198682 The subrules(12057312)1198862111988622 rarr 12057512 (12057312)1198863111988632 rarr 12057512 (12057312)1198865111988652 rarr 12057512(12057312)1198866111988662 rarr 12057512 and (12057312)1198867111988672 rarr 12057512 do not meetthe execution condition and cannot be executed Initially2 copies of 12057312 are in cell 2 indicating that both 1198681 and 1198682need to appear together in at least 2 records for the itemset1198681 1198682 to be a frequent 2-itemset Each execution of thesesubrules consumes one 12057312 Therefore 2 subrules among(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 can be executed After the execution2 copies of 12057312 are consumed and 2 copies of 12057512 are gener-ated The detection processes of other candidate frequent 2-itemsets are performed in the same way After the detectionprocesses objects 120573212 120573

213 12057314 120573

215 120573223 120573224 120573225 and 12057335 are

consumed and objects 120575212 120575213 12057514 120575

215 120575223 120575224 120575225 and 12057535

are generatedRule 11990323 is executed to process the results obtained by rule

11990324 The 2-itemset 1198681 1198682 is again taken as an example All ofthe 2 copies of 12057312 have been consumed by rule 11990324 subrule(120575212)not12057312 rarr 12057212go is executed to put an object 12057212 to cells3 and 6 to indicate that the itemset 1198681 1198682 is a frequent 2-itemset and to activate the computation in cell 3 Subrules(120575213)not12057313 rarr 12057213go (120575

215)not12057315 rarr 12057215go (120575

223)not12057323 rarr

12057223go (120575224)not12057324 rarr 12057224go and (120575225)not12057325 rarr 12057225go are also

executed which put objects 12057213 12057215 12057223 12057224 and 12057225 to cells

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

4 Discrete Dynamics in Nature and Society

The ECPI tissue-like P system for ECTPPI-Apriori is asfollows

ΠApriori = (119874 1205900 1205901 120590119905+1 syn 120588 119894out) (3)

where (1) 119874 = 119886119894119895 120573119895 12057311989511198952 1205731198951 sdotsdotsdot119895119905 120575119895 12057511989511198952 1205751198951 sdotsdotsdot119895119905 12057211989512057211989511198952 1205721198951 sdotsdotsdot119895119905 for 1 le 119894 le 119863 1 le 119895 1198951 119895119905 le 119905 (2)syn = 0 1 0 2 0 119905 1 119905 + 1 2 119905 + 1 119905 119905 + 11 2 2 3 119905 minus 1 119905 (3) 120588 = 119903119894 gt 119903119895 | 119894 lt 119895 (4) 1205900 =(11990800 1198770) 1205901 = (11990810 1198771) 1205902 = (11990820 1198772) 120590119905 = (1199081199050 119877119905)(5) 119894out = 119905 + 1

In 1205900 = (11990800 1198770) 11990800 = 120582 and119877011990301 = 119886119894119895 rarr 119886119894119895go cup 120579119896 rarr 120579119896go

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205901 = (11990810 1198771) 11990810 = 120582 and119877111990311 = 120579119896 rarr 1205731198961 sdot sdot sdot 120573

119896119905

11990312 = 120575119901119895 120573119896minus119901119895 rarr 120582 cup (120575119896119895 )not120573119895 rarr 120572119895go

for 1 le 119895 le 119905 and 1 le 119901 le 11989611990313 = 119886119894119895120573119895 rarr 120575119895

for 1 le 119894 le 119863 and 1 le 119895 le 119905

In 1205902 = (11990820 1198772) 11990820 = 120582 and119877211990321 = ()12057211989511205721198952120579

119896not12057311989611989511198952rarr 12057311989611989511198952

for 1 le 1198951 lt 1198952 le 11990511990322 = 120572119895 rarr 120582 (1 le 119895 le 119905)

11990323 = 12057511990111989511198952120573119896minus11990111989511198952

rarr 120582 cup (12057511989611989511198952)not12057311989511198952rarr 12057211989511198952 go

for 1 le 1198951 lt 1198952 le 119905 and 1 le 119901 le 11989611990324 = (12057311989511198952)11988611989411989511198861198941198952

rarr 12057511989511198952

for 1 le 119894 le 119863 and 1 le 1198951 lt 1198952 le 119905

In 120590ℎ = (119908ℎ0 119877ℎ) 119908ℎ0 = 120582 and119877ℎ

119903ℎ1 = ()1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 1205721198951sdotsdotsdot119895ℎminus2119895ℎ2 120579119896not1205731198961198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr1205731198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le 119895ℎ1 119895ℎ2 le 119905119903ℎ2 = 1205721198951 sdotsdotsdot119895ℎminus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus1 le 119905

119903ℎ3 = 1205751199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2120573119896minus1199011198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)not1205731198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205721198951sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2 go

for 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 1 le 119895ℎ1 119895ℎ2 le 119905 and1 le 119901 le 119896

119903ℎ4 = (1205731198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2)1198861198941198951 sdotsdotsdot119886119894119895ℎminus2119895ℎ1 119895ℎ2

rarr 1205751198951 sdotsdotsdot119895ℎminus2119895ℎ1 119895ℎ2

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895ℎminus2 le 119905 and 1 le119895ℎ1 119895ℎ2 le 119905

In 120590119905 = (1199081199050 119877119905) 1199081199050 = 120582 and1198771199051199031199051 = ()1205721198951sdotsdotsdot119895119905minus21198951199051 1205721198951sdotsdotsdot119895119905minus21198951199052 120579

119896not1205731198961198951sdotsdotsdot119895119905minus21198951199051 1198951199052rarr 1205731198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le 1199051199031199052 = 1205721198951 sdotsdotsdot119895119905minus1 rarr 120582

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus1 le 119905

1199031199053 = 1205751199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052120573119896minus1199011198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 120582 cup

(1205751198961198951 sdotsdotsdot119895119905minus21198951199051 1198951199052)not1205731198951sdotsdotsdot119895119905minus21198951199051 1198951199052

rarr 1205721198951 sdotsdotsdot119895119905minus21198951199051 1198951199052 go

for 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 1 le 1198951199051 1198951199052 le 119905 and1 le 119901 le 1198961199031199054 = (1205731198951 sdotsdotsdot119895119905minus21198951199051 1198951199052

)1198861198941198951 sdotsdotsdot119886119894119895119905minus21198951199051 1198951199052rarr 1205751198951 sdotsdotsdot119895tminus21198951199051 1198951199052

for 1 le 119894 le 119863 1 le 1198951 lt sdot sdot sdot lt 119895119905minus2 le 119905 and 1 le 1198951199051 1198951199052 le119905

In 120590119905+1 = (119908119905+10 119877119905+1) 119908119905+10 = 120582 and 119877119905+1 =

The auxiliary objects 120573119896119895 and 120575119895 for 1 le 119895 le 119905 are usedto detect the frequent 1-itemsets The auxiliary objects 120573119896119895store the items in the candidate frequent 1-itemsets using theirsubscripts For example object 1205731198961 means the itemset 1198681 is acandidate frequent 1-itemsetThe auxiliary objects 120575119895 are usedto identify the frequent 1-itemsets The 119896 copies of 120573119895 initiallyin cell 1 indicate that the 119895th item needs to appear in at least119896 records to make the itemset 119868119895 a frequent 1-itemset Oneobject 120573119895 is removed from and one object 120575119895 is generated incell 1 when one more of the 119895th item is detected in a recordTherefore if there is no object 120573119895 left and 119896 objects 120575119895 havebeen generated in cell 1 at least 119896 records have been found tocontain the 119895th item The functions of 12057311989611989511198952 in cells 2 120573119896119895111989521198953in cell 3 and 1205731198961198951 sdotsdotsdot119895119905 in cell 119905 are similar to that of 120573119896119895 in cell1 The functions of 12057511989511198952 in cell 2 120575119895111989521198953 in cell 3 and 1205751198951 sdotsdotsdot119895119905in cell 119905 are similar to that of 120575119895 in cell 1 The objects 120572119895 for1 le 119895 le 119905 are used to store the items in the frequent 1-itemsetsusing their subscript For example 1205721 means the itemset 1198681is a frequent 1-itemset The functions of 12057211989511198952 in cell 2 120572119895111989521198953in cell 3 and 1205721198951sdotsdotsdot119895119905 in cell 119905 are similar to that of 120572119895 in cell1

The evolution rules are object rewriting rules similar tochemical reactions They take objects transform them intoother objects and may transport them to other cells

32 Computing Process

Input Cell 0 is the input cell The objects 119886119894119895 encoded fromthe transactional database and objects 120579119896 representing the

Discrete Dynamics in Nature and Society 5

support count threshold 119896 are entered into cell 0 to activatethe computation process Rule 11990301 is executed to put copies of119886119894119895 and 120579119896 to cells 1 2 119905

Frequent 1-ItemsetsGenerationFrequent 1-itemsets are gener-ated in cell 1 Rule 11990311 is executed to generate 120573

119896119895 for 1 le 119895 le 119905

Rule 11990313 is executed to detect all frequent 1-itemsets using theinternal flat maximally parallel mechanism in the P systemRule 11990312 cannot be executed because no object 120575119895 is in cell 1at this time The detection process of the candidate frequent1-itemset 1198681 is taken as an example The detection processesof other candidate frequent 1-itemsets are performed in thesame way Rule 11990313 is actually composed of multiple subrulesworking on objects with different subscripts If object 1198861198941 is incell 1 which means the 119894th record contains the first item thesubrule 11988611989411205731 rarr 1205751meets the execution condition and canbe executed If object 1198861198941 is not in cell 1 which means the 119894threcord does not contain the first item the subrule 11988611989411205731 rarr1205751 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 119896 records for theitemset 1198681 to be a frequent 1-itemset Each execution of asubrule consumes one 1205731 Therefore at most 119896 subrules ofthe form 11988611989411205731 rarr 1205751 can be executed in nondeterministicflat maximally parallel The checking process continues untilall objects 119886119894119895 have been checked or all of the 119896 copies of 1205731have been consumed If all of the 119896 copies of 1205731 have beenconsumed the first item appeared in at least 119896 records andthe itemset 1198681 is a frequent 1-itemset If some copies of 1205731are still in this cell after all objects 119886119894119895 have been checked theitemset 1198681 is not a frequent 1-itemset

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example If 119901copies of 1205731 have been consumed by rule 11990313 and 119896minus119901 copiesof 1205731 are still in this cell subrule 1205751199011120573

119896minus1199011 rarr 120582 is executed to

delete the objects 1205751199011 and 120573119896minus1199011 If all of the 119896 copies of 1205731 have

been consumed subrule (1205751198961)not1205731 rarr 1205721go is executed to putan object 1205721 to cells 2 and 119905 + 1 to indicate that the itemset1198681 is a frequent 1-itemset and to activate the computation incell 2 If no 1-itemset is a frequent 1-itemset the computationhalts

Frequent 2-Itemsets Generation The frequent 2-itemsets aregenerated in cell 2 Rule 11990321 is executed to obtain all candidatefrequent 2-itemsets using the internal flat maximally parallelmechanism in the P system The pair of empty parenthesesin this subrule indicates that no objects are consumedwhen this rule is executed The detection process of thecandidate frequent 2-itemset 1198681 1198682 is taken as an exampleThe detection processes of the other candidate frequent 2-itemsets are performed in the same way Rule 11990321 is actuallycomposed of multiple subrules working on objects withdifferent subscripts If objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets subrule()12057211205722120579119896not12057311989612 rarr 12057311989612 is executed to generate 12057311989612 The presenceof 12057311989612 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset

Rule 11990322 is executed to delete the redundant objects120572119895 thatwere used by rule 11990321 but are not needed anymore

Rule 11990324 is executed to detect all frequent 2-itemsetsusing the internal flat maximally parallel mechanism in theP system Rule 11990323 cannot be executed because no object12057511989511198952 is in cell 2 at this time The detection process of thefrequent 2-itemset 1198681 1198682 is taken as an example Rule 11990324 isactually composed of multiple subrules working on objectswith different subscripts If objects 1198861198941 and 1198861198942 are in cell2 which means the 119894th record contains the first and thesecond items subrule (12057312)11988611989411198861198942 rarr 12057512meets the executioncondition and can be executed If objects 1198861198941 and 1198861198942 are notboth in cell 2 which means the 119894th record does not containboth the first and the second items subrule (12057312)11988611989411198861198942 rarr12057512 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 12057312 are in cell 2 indicating thatboth the first and the second items need to appear togetherin at least 119896 records for the itemset 1198681 1198682 to be a frequent 2-itemset Each execution of these subrules consumes one 12057312Therefore at most 119896 subrules of the form (12057312)11988611989411198861198942 rarr 12057512can be executed in nondeterministic flat maximally parallelThe checking process continues until all objects 119886119894119895 have beenchecked or all of the 119896 copies of 12057312 have been consumed Ifall of the 119896 copies of 12057312 have been consumed the first andthe second items appeared together in at least 119896 records andthe itemset 1198681 1198682 is a frequent 2-itemset If some copies of12057312 are still in this cell after rule 11990324 is executed the itemset1198681 1198682 is not a frequent 2-itemset

Rule 11990323 is executed to process the results obtained by rule11990324 The 2-itemset 1198681 1198682 is again taken as an example If 119901copies of 12057312 have been consumed by rule 11990324 and 119896minus119901 copiesof objects 12057312 are still in this cell subrule 12057511990112120573

119896minus11990112 rarr 120582

is executed to delete the objects 12057511990112 and 120573119896minus11990112 If all of the119896 copies of 12057312 have been consumed subrule (12057511989612)not12057312 rarr12057212go is executed to put an object 12057212 to cells 3 and 119905 + 1 toindicate that the itemset 1198681 1198682 is a frequent 2-itemset and toactivate the computation in cell 3 If no 2-itemset is a frequent2-itemset the computation halts

Each cell 119895 for 3 le 119895 le 119905 has 4 rules which are similar tothose in cell 2 Each cell 119895 performs similar functions as cell 2does but for frequent 119895-itemsets

After the computation halts all the results that is objectsrepresenting the identified frequent itemsets are stored in cell119905 + 1

33 AlgorithmFlow TheconventionalApriori algorithmexe-cutes sequentially ECTPPI-Apriori uses the parallel mecha-nism of the ECPI tissue-like P system to execute in parallel Apseudocode of ECTPPI-Apriori is shown as in Algorithm 1

34 Time Complexity The time complexity of ECTPPI-Apri-ori in theworst case is analyzed Initially 1 computational stepis needed to put copies of 119886119894119895 and 120579119896 to cells 1 2 119905

Generating the frequent 1-itemsets needs 3 computationalsteps Generating the candidate frequent 1-itemsets 1198621 needs1 computational step Finding the support counts of thecandidate frequent 1-itemsets needs 1 computational step All

6 Discrete Dynamics in Nature and Society

InputThe objects 119886119894119895 encoded from the transactional database and objects 120579119896 representing the support countthreshold 119896Rule 11990301Copy all 119886119894119895 and 120579119896 to cells 1 to 119905 + 1

MethodRule 11990311Generate 120573119896119895 for 1 le 119895 le 119905 to form the candidate frequent 1-itemsets 1198621Rule 11990313Scan each object 119886119894119895 in cell 1 to count the frequency of each item If 119886119894119895 is in cell 1 consume one 120573119895 andgenerate one 120575119895 Continue until all 119896 copies of 120573119895 have been consumed or all objects 119886119894119895 have been scannedRule 11990312If all 119896 copies of 120573119895 have been consumed generate an object 120572119895 to add 119868119895 to 1198711 as a frequent 1-itemsetand pass 120572119895 to cells 2 and 119905 + 1 Delete all remaining copies of 120573119895 and delete all copies of 120575119895

For (2 le ℎ le 119905 and 119871ℎminus1 = ) do the following in cell ℎRule 119903ℎ1Scan the objects 1205721198951 sdotsdotsdot119895ℎminus1 representing the frequent (ℎ minus 1)-itemsets 119871ℎminus1 to generate the objects 120573

1198961198951 sdotsdotsdot119895ℎ

representing the candidate frequent ℎ-itemsets 119862ℎRule 119903ℎ2Delete all objects 1205721198951 sdotsdotsdot119895ℎminus1 after they have been used by rule 119903ℎ1Rule 119903ℎ4Scan the objects 119886119894119895 representing the database to count the frequency of each candidate frequent ℎ-itemset119862ℎ If the objects 1198861198941198951 119886119894119895ℎminus1 and 119886119894119895ℎ are all in cell ℎ consume one 1205731198951 sdotsdotsdot119895ℎ and generate one 1205751198951 sdotsdotsdot119895ℎ Continue until all copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed or all objects 119886119894119895 have been scannedRule 119903ℎ3If all 119896 copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed generate 1205721198951 sdotsdotsdot119895ℎ to add 1198681198951 119868119895ℎ to 119871ℎ as a frequentℎ-itemset and put 1205721198951 sdotsdotsdot119895ℎ in cells ℎ + 1 and 119905 + 1 Delete all remaining copies of 1205731198951 sdotsdotsdot119895ℎ and delete all copiesof 1205751198951 sdotsdotsdot119895ℎ Let ℎ = ℎ + 1

OutputThe collection of all frequent itemsets 119871 encoded by objects 120572119895 12057211989511198952 1205721198951 sdotsdotsdot119895119899

Algorithm 1 ECTP-Apriori algorithm

candidate frequent 1-itemsets in the database are checked inthe flat maximally parallel Passing the results of the frequent1-itemsets to cells 2 and 119905 + 1 needs 1 computational step

Generating the frequent ℎ-itemsets (1 lt ℎ le 119905) needs4 computational steps Generating the candidate frequent ℎ-itemsets119862ℎ needs 1 computational step Cleaning thememoryused by the objects needs 1 computational step Finding thesupport counts of the candidate frequent ℎ-itemsets needs 1computational step All candidate frequent ℎ-itemsets in thedatabase are checked in flat maximally parallel Passing theresults of the frequent ℎ-itemsets to cells ℎ+1 and 119905 + 1 needs1 computational step

Therefore the time complexity of ECTPPI-Apriori is 1 +3 + 4(119905 minus 1) = 4119905 which gives 119874(119905) Note that 119874 is usedtraditionally to indicate the time complexity of an algorithmand the 119874 used here has a different meaning from that usedearlier when ECTPPI-Apriori is described

Some comparison results between ECTPPI-Apriori andthe original as well as some other improved parallel Apriorialgorithms are shown in Table 1 where |119862119896| is the number

Table 1 Time complexities of some Apriori algorithms

Algorithm Time complexityApriori [2] 119874 (119863119905 1003816100381610038161003816119862119896

1003816100381610038161003816 + 119905 1003816100381610038161003816119871119896minus110038161003816100381610038161003816100381610038161003816119871119896minus1

1003816100381610038161003816)

Apriori based on the booleanmatrix and Hadoop [3] 119874(119863119905)

Multi-GPU-based Apriori [4] 119874(119905|119871119896minus1||119871119896minus1|)

PApriori [5] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

Parallel Apriori algorithm onHadoop cluster [6] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

ECTPPI-Apriori 119874(119905)

of candidate frequent 119896-itemsets and |119871119896minus1| is the number offrequent 119896 minus 1-itemsets

4 An Illustrative Example

An illustrative example is presented in this section todemonstrate how ECTPPI-Apriori works Table 2 shows the

Discrete Dynamics in Nature and Society 7

Table 2 The transactional database of the illustrative example

TID ItemsT100 1198681 1198682 1198685T200 1198682 1198684T300 1198682 1198683T400 1198681 1198682 1198684T500 1198681 1198683T600 1198682 1198683T700 1198681 1198683T800 1198681 1198682 1198683 1198685T900 1198681 1198682 1198683

transactional database of one branch office of All Electronics[1] There are 9 transactions and 5 fields in this database thatis 119863 = 9 and 119905 = 5 Suppose the support count threshold is119896 = 2 The computational processes are as follows

Input The database is transformed into objects 11988611 11988612 1198861511988622 11988624 11988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 1198868111988682 11988683 11988685 11988691 11988692 and 11988693 in a form that the P systemcan recognize These objects and objects 1205792 representing thesupport count threshold 119896 = 2 are entered into cell 0 toactivate the computation process Rule 11990301 = 119886119894119895 rarr 119886119894119895go cup

120579119896 rarr 120579119896go is executed to put copies of 11988611 11988612 11988615 11988622 1198862411988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 11988681 11988682 11988683 1198868511988691 11988692 11988693 and 1205792 to cells 1 sdot sdot sdot 5

Frequent 1-Itemsets Generation Within cell 1 the auxiliaryobjects 1205732119895 for 1 le 119895 le 5 are created by rule 11990311 to indicatethat each item 119895 needs to appear in at least 119896 = 2 records forit to be a frequent 1-itemset Rule 11990313 is executed to detect allfrequent 1-itemsets in flat maximally parallel The detectionprocess of the candidate frequent 1-itemset 1198681 is taken asan example Objects 11988611 11988641 11988651 11988671 11988681 and 11988691 are in cell1 which means the first the fourth the fifth the sevenththe eighth and the ninth records contain 1198681 The subrules119886111205731 rarr 1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751119886811205731 rarr 1205751 and 119886911205731 rarr 1205751meet the execution conditionand can be executed Objects 11988621 11988631 and 11988661 are not in cell 1which means the second the third and the sixth records donot contain 1198681 The subrules 119886211205731 rarr 1205751 119886311205731 rarr 1205751 and119886611205731 rarr 1205751donotmeet the execution condition and cannotbe executed Initially 2 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 2 records to make theitemset 1198681 a frequent 1-itemset Each execution of a subruleconsumes one 1205731 Therefore 2 of subrules among 119886111205731 rarr1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751 119886811205731 rarr 1205751and 119886911205731 rarr 1205751 can be executed in nondeterministic flatmaximally parallelThrough the execution of 2 such subrulesboth of the 2 copies of 1205731 are consumed and 2 copies of1205751 are generated The detection processes of other candidatefrequent 1-itemsets are performed in the same way After thedetection processes 12057321 120573

22 12057323 12057324 and 120573

25 are consumed and

12057521 12057522 12057523 12057524 and 12057525 are generated

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example All of

the 2 copies of 1205731 have been consumed subrule (12057521)not1205731 rarr1205721go is executed to put an object 1205721 to cells 2 and 6 toindicate that the itemset 1198681 is a frequent 1-itemset and toactivate the computation in cell 2 Subrules (12057522)not1205733 rarr 1205722go(12057523)not1205733 rarr 1205723go (120575

24)not1205734 rarr 1205724go and (12057525)not1205735 rarr 1205725go

are also executed to put objects 1205722 1205723 1205724 and 1205725 to cells 2and 6 to indicate that the itemsets 1198682 1198683 1198684 and 1198685 arefrequent 1-itemsets and to activate the computation in cell 2

Frequent 2-Itemsets Generation Within cell 2 rule 11990321 isexecuted to obtain all candidate frequent 2-itemsets Thedetection process of the candidate frequent 2-itemset 1198681 1198682is taken as an example Objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets Subrule()120572112057221205792not120573212 rarr 120573212 is executed to generate 120573212 The presenceof 120573212 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset and both 1198681 and 1198682 need to appear together in at least2 records for the itemset 1198681 1198682 to be a frequent 2-itemsetThedetection processes of the other candidate frequent 2-itemsetsare performed in the same way After the detection processesobjects 120573212 120573

213 120573214 120573215 120573223 120573224 120573225 120573234 120573235 and 120573245 are

generatedRule 11990322 is executed to delete the objects 1205721 1205722 1205723 1205724 and

1205725 that are not needed anymoreRule 11990324 is executed to detect all frequent 2-itemsets

Objects 11988611 and 11988612 11988641 and 11988642 11988681 and 11988682 and 11988691 and 11988692are in cell 2 which means the first the fourth the eighthand the ninth records contain both 1198681 and 1198682 The subrules(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 meet the execution condition andcan be executed Objects 11988621 and 11988622 11988631 and 11988632 11988651 and11988652 11988661 and 11988662 or 11988671 and 11988672 are not both in cell 2 whichmeans the second the third the fifth the sixth and theseventh records do not contain both 1198681 and 1198682 The subrules(12057312)1198862111988622 rarr 12057512 (12057312)1198863111988632 rarr 12057512 (12057312)1198865111988652 rarr 12057512(12057312)1198866111988662 rarr 12057512 and (12057312)1198867111988672 rarr 12057512 do not meetthe execution condition and cannot be executed Initially2 copies of 12057312 are in cell 2 indicating that both 1198681 and 1198682need to appear together in at least 2 records for the itemset1198681 1198682 to be a frequent 2-itemset Each execution of thesesubrules consumes one 12057312 Therefore 2 subrules among(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 can be executed After the execution2 copies of 12057312 are consumed and 2 copies of 12057512 are gener-ated The detection processes of other candidate frequent 2-itemsets are performed in the same way After the detectionprocesses objects 120573212 120573

213 12057314 120573

215 120573223 120573224 120573225 and 12057335 are

consumed and objects 120575212 120575213 12057514 120575

215 120575223 120575224 120575225 and 12057535

are generatedRule 11990323 is executed to process the results obtained by rule

11990324 The 2-itemset 1198681 1198682 is again taken as an example All ofthe 2 copies of 12057312 have been consumed by rule 11990324 subrule(120575212)not12057312 rarr 12057212go is executed to put an object 12057212 to cells3 and 6 to indicate that the itemset 1198681 1198682 is a frequent 2-itemset and to activate the computation in cell 3 Subrules(120575213)not12057313 rarr 12057213go (120575

215)not12057315 rarr 12057215go (120575

223)not12057323 rarr

12057223go (120575224)not12057324 rarr 12057224go and (120575225)not12057325 rarr 12057225go are also

executed which put objects 12057213 12057215 12057223 12057224 and 12057225 to cells

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

Discrete Dynamics in Nature and Society 5

support count threshold 119896 are entered into cell 0 to activatethe computation process Rule 11990301 is executed to put copies of119886119894119895 and 120579119896 to cells 1 2 119905

Frequent 1-ItemsetsGenerationFrequent 1-itemsets are gener-ated in cell 1 Rule 11990311 is executed to generate 120573

119896119895 for 1 le 119895 le 119905

Rule 11990313 is executed to detect all frequent 1-itemsets using theinternal flat maximally parallel mechanism in the P systemRule 11990312 cannot be executed because no object 120575119895 is in cell 1at this time The detection process of the candidate frequent1-itemset 1198681 is taken as an example The detection processesof other candidate frequent 1-itemsets are performed in thesame way Rule 11990313 is actually composed of multiple subrulesworking on objects with different subscripts If object 1198861198941 is incell 1 which means the 119894th record contains the first item thesubrule 11988611989411205731 rarr 1205751meets the execution condition and canbe executed If object 1198861198941 is not in cell 1 which means the 119894threcord does not contain the first item the subrule 11988611989411205731 rarr1205751 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 119896 records for theitemset 1198681 to be a frequent 1-itemset Each execution of asubrule consumes one 1205731 Therefore at most 119896 subrules ofthe form 11988611989411205731 rarr 1205751 can be executed in nondeterministicflat maximally parallel The checking process continues untilall objects 119886119894119895 have been checked or all of the 119896 copies of 1205731have been consumed If all of the 119896 copies of 1205731 have beenconsumed the first item appeared in at least 119896 records andthe itemset 1198681 is a frequent 1-itemset If some copies of 1205731are still in this cell after all objects 119886119894119895 have been checked theitemset 1198681 is not a frequent 1-itemset

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example If 119901copies of 1205731 have been consumed by rule 11990313 and 119896minus119901 copiesof 1205731 are still in this cell subrule 1205751199011120573

119896minus1199011 rarr 120582 is executed to

delete the objects 1205751199011 and 120573119896minus1199011 If all of the 119896 copies of 1205731 have

been consumed subrule (1205751198961)not1205731 rarr 1205721go is executed to putan object 1205721 to cells 2 and 119905 + 1 to indicate that the itemset1198681 is a frequent 1-itemset and to activate the computation incell 2 If no 1-itemset is a frequent 1-itemset the computationhalts

Frequent 2-Itemsets Generation The frequent 2-itemsets aregenerated in cell 2 Rule 11990321 is executed to obtain all candidatefrequent 2-itemsets using the internal flat maximally parallelmechanism in the P system The pair of empty parenthesesin this subrule indicates that no objects are consumedwhen this rule is executed The detection process of thecandidate frequent 2-itemset 1198681 1198682 is taken as an exampleThe detection processes of the other candidate frequent 2-itemsets are performed in the same way Rule 11990321 is actuallycomposed of multiple subrules working on objects withdifferent subscripts If objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets subrule()12057211205722120579119896not12057311989612 rarr 12057311989612 is executed to generate 12057311989612 The presenceof 12057311989612 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset

Rule 11990322 is executed to delete the redundant objects120572119895 thatwere used by rule 11990321 but are not needed anymore

Rule 11990324 is executed to detect all frequent 2-itemsetsusing the internal flat maximally parallel mechanism in theP system Rule 11990323 cannot be executed because no object12057511989511198952 is in cell 2 at this time The detection process of thefrequent 2-itemset 1198681 1198682 is taken as an example Rule 11990324 isactually composed of multiple subrules working on objectswith different subscripts If objects 1198861198941 and 1198861198942 are in cell2 which means the 119894th record contains the first and thesecond items subrule (12057312)11988611989411198861198942 rarr 12057512meets the executioncondition and can be executed If objects 1198861198941 and 1198861198942 are notboth in cell 2 which means the 119894th record does not containboth the first and the second items subrule (12057312)11988611989411198861198942 rarr12057512 does not meet the execution condition and cannot beexecuted Initially 119896 copies of 12057312 are in cell 2 indicating thatboth the first and the second items need to appear togetherin at least 119896 records for the itemset 1198681 1198682 to be a frequent 2-itemset Each execution of these subrules consumes one 12057312Therefore at most 119896 subrules of the form (12057312)11988611989411198861198942 rarr 12057512can be executed in nondeterministic flat maximally parallelThe checking process continues until all objects 119886119894119895 have beenchecked or all of the 119896 copies of 12057312 have been consumed Ifall of the 119896 copies of 12057312 have been consumed the first andthe second items appeared together in at least 119896 records andthe itemset 1198681 1198682 is a frequent 2-itemset If some copies of12057312 are still in this cell after rule 11990324 is executed the itemset1198681 1198682 is not a frequent 2-itemset

Rule 11990323 is executed to process the results obtained by rule11990324 The 2-itemset 1198681 1198682 is again taken as an example If 119901copies of 12057312 have been consumed by rule 11990324 and 119896minus119901 copiesof objects 12057312 are still in this cell subrule 12057511990112120573

119896minus11990112 rarr 120582

is executed to delete the objects 12057511990112 and 120573119896minus11990112 If all of the119896 copies of 12057312 have been consumed subrule (12057511989612)not12057312 rarr12057212go is executed to put an object 12057212 to cells 3 and 119905 + 1 toindicate that the itemset 1198681 1198682 is a frequent 2-itemset and toactivate the computation in cell 3 If no 2-itemset is a frequent2-itemset the computation halts

Each cell 119895 for 3 le 119895 le 119905 has 4 rules which are similar tothose in cell 2 Each cell 119895 performs similar functions as cell 2does but for frequent 119895-itemsets

After the computation halts all the results that is objectsrepresenting the identified frequent itemsets are stored in cell119905 + 1

33 AlgorithmFlow TheconventionalApriori algorithmexe-cutes sequentially ECTPPI-Apriori uses the parallel mecha-nism of the ECPI tissue-like P system to execute in parallel Apseudocode of ECTPPI-Apriori is shown as in Algorithm 1

34 Time Complexity The time complexity of ECTPPI-Apri-ori in theworst case is analyzed Initially 1 computational stepis needed to put copies of 119886119894119895 and 120579119896 to cells 1 2 119905

Generating the frequent 1-itemsets needs 3 computationalsteps Generating the candidate frequent 1-itemsets 1198621 needs1 computational step Finding the support counts of thecandidate frequent 1-itemsets needs 1 computational step All

6 Discrete Dynamics in Nature and Society

InputThe objects 119886119894119895 encoded from the transactional database and objects 120579119896 representing the support countthreshold 119896Rule 11990301Copy all 119886119894119895 and 120579119896 to cells 1 to 119905 + 1

MethodRule 11990311Generate 120573119896119895 for 1 le 119895 le 119905 to form the candidate frequent 1-itemsets 1198621Rule 11990313Scan each object 119886119894119895 in cell 1 to count the frequency of each item If 119886119894119895 is in cell 1 consume one 120573119895 andgenerate one 120575119895 Continue until all 119896 copies of 120573119895 have been consumed or all objects 119886119894119895 have been scannedRule 11990312If all 119896 copies of 120573119895 have been consumed generate an object 120572119895 to add 119868119895 to 1198711 as a frequent 1-itemsetand pass 120572119895 to cells 2 and 119905 + 1 Delete all remaining copies of 120573119895 and delete all copies of 120575119895

For (2 le ℎ le 119905 and 119871ℎminus1 = ) do the following in cell ℎRule 119903ℎ1Scan the objects 1205721198951 sdotsdotsdot119895ℎminus1 representing the frequent (ℎ minus 1)-itemsets 119871ℎminus1 to generate the objects 120573

1198961198951 sdotsdotsdot119895ℎ

representing the candidate frequent ℎ-itemsets 119862ℎRule 119903ℎ2Delete all objects 1205721198951 sdotsdotsdot119895ℎminus1 after they have been used by rule 119903ℎ1Rule 119903ℎ4Scan the objects 119886119894119895 representing the database to count the frequency of each candidate frequent ℎ-itemset119862ℎ If the objects 1198861198941198951 119886119894119895ℎminus1 and 119886119894119895ℎ are all in cell ℎ consume one 1205731198951 sdotsdotsdot119895ℎ and generate one 1205751198951 sdotsdotsdot119895ℎ Continue until all copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed or all objects 119886119894119895 have been scannedRule 119903ℎ3If all 119896 copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed generate 1205721198951 sdotsdotsdot119895ℎ to add 1198681198951 119868119895ℎ to 119871ℎ as a frequentℎ-itemset and put 1205721198951 sdotsdotsdot119895ℎ in cells ℎ + 1 and 119905 + 1 Delete all remaining copies of 1205731198951 sdotsdotsdot119895ℎ and delete all copiesof 1205751198951 sdotsdotsdot119895ℎ Let ℎ = ℎ + 1

OutputThe collection of all frequent itemsets 119871 encoded by objects 120572119895 12057211989511198952 1205721198951 sdotsdotsdot119895119899

Algorithm 1 ECTP-Apriori algorithm

candidate frequent 1-itemsets in the database are checked inthe flat maximally parallel Passing the results of the frequent1-itemsets to cells 2 and 119905 + 1 needs 1 computational step

Generating the frequent ℎ-itemsets (1 lt ℎ le 119905) needs4 computational steps Generating the candidate frequent ℎ-itemsets119862ℎ needs 1 computational step Cleaning thememoryused by the objects needs 1 computational step Finding thesupport counts of the candidate frequent ℎ-itemsets needs 1computational step All candidate frequent ℎ-itemsets in thedatabase are checked in flat maximally parallel Passing theresults of the frequent ℎ-itemsets to cells ℎ+1 and 119905 + 1 needs1 computational step

Therefore the time complexity of ECTPPI-Apriori is 1 +3 + 4(119905 minus 1) = 4119905 which gives 119874(119905) Note that 119874 is usedtraditionally to indicate the time complexity of an algorithmand the 119874 used here has a different meaning from that usedearlier when ECTPPI-Apriori is described

Some comparison results between ECTPPI-Apriori andthe original as well as some other improved parallel Apriorialgorithms are shown in Table 1 where |119862119896| is the number

Table 1 Time complexities of some Apriori algorithms

Algorithm Time complexityApriori [2] 119874 (119863119905 1003816100381610038161003816119862119896

1003816100381610038161003816 + 119905 1003816100381610038161003816119871119896minus110038161003816100381610038161003816100381610038161003816119871119896minus1

1003816100381610038161003816)

Apriori based on the booleanmatrix and Hadoop [3] 119874(119863119905)

Multi-GPU-based Apriori [4] 119874(119905|119871119896minus1||119871119896minus1|)

PApriori [5] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

Parallel Apriori algorithm onHadoop cluster [6] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

ECTPPI-Apriori 119874(119905)

of candidate frequent 119896-itemsets and |119871119896minus1| is the number offrequent 119896 minus 1-itemsets

4 An Illustrative Example

An illustrative example is presented in this section todemonstrate how ECTPPI-Apriori works Table 2 shows the

Discrete Dynamics in Nature and Society 7

Table 2 The transactional database of the illustrative example

TID ItemsT100 1198681 1198682 1198685T200 1198682 1198684T300 1198682 1198683T400 1198681 1198682 1198684T500 1198681 1198683T600 1198682 1198683T700 1198681 1198683T800 1198681 1198682 1198683 1198685T900 1198681 1198682 1198683

transactional database of one branch office of All Electronics[1] There are 9 transactions and 5 fields in this database thatis 119863 = 9 and 119905 = 5 Suppose the support count threshold is119896 = 2 The computational processes are as follows

Input The database is transformed into objects 11988611 11988612 1198861511988622 11988624 11988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 1198868111988682 11988683 11988685 11988691 11988692 and 11988693 in a form that the P systemcan recognize These objects and objects 1205792 representing thesupport count threshold 119896 = 2 are entered into cell 0 toactivate the computation process Rule 11990301 = 119886119894119895 rarr 119886119894119895go cup

120579119896 rarr 120579119896go is executed to put copies of 11988611 11988612 11988615 11988622 1198862411988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 11988681 11988682 11988683 1198868511988691 11988692 11988693 and 1205792 to cells 1 sdot sdot sdot 5

Frequent 1-Itemsets Generation Within cell 1 the auxiliaryobjects 1205732119895 for 1 le 119895 le 5 are created by rule 11990311 to indicatethat each item 119895 needs to appear in at least 119896 = 2 records forit to be a frequent 1-itemset Rule 11990313 is executed to detect allfrequent 1-itemsets in flat maximally parallel The detectionprocess of the candidate frequent 1-itemset 1198681 is taken asan example Objects 11988611 11988641 11988651 11988671 11988681 and 11988691 are in cell1 which means the first the fourth the fifth the sevenththe eighth and the ninth records contain 1198681 The subrules119886111205731 rarr 1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751119886811205731 rarr 1205751 and 119886911205731 rarr 1205751meet the execution conditionand can be executed Objects 11988621 11988631 and 11988661 are not in cell 1which means the second the third and the sixth records donot contain 1198681 The subrules 119886211205731 rarr 1205751 119886311205731 rarr 1205751 and119886611205731 rarr 1205751donotmeet the execution condition and cannotbe executed Initially 2 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 2 records to make theitemset 1198681 a frequent 1-itemset Each execution of a subruleconsumes one 1205731 Therefore 2 of subrules among 119886111205731 rarr1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751 119886811205731 rarr 1205751and 119886911205731 rarr 1205751 can be executed in nondeterministic flatmaximally parallelThrough the execution of 2 such subrulesboth of the 2 copies of 1205731 are consumed and 2 copies of1205751 are generated The detection processes of other candidatefrequent 1-itemsets are performed in the same way After thedetection processes 12057321 120573

22 12057323 12057324 and 120573

25 are consumed and

12057521 12057522 12057523 12057524 and 12057525 are generated

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example All of

the 2 copies of 1205731 have been consumed subrule (12057521)not1205731 rarr1205721go is executed to put an object 1205721 to cells 2 and 6 toindicate that the itemset 1198681 is a frequent 1-itemset and toactivate the computation in cell 2 Subrules (12057522)not1205733 rarr 1205722go(12057523)not1205733 rarr 1205723go (120575

24)not1205734 rarr 1205724go and (12057525)not1205735 rarr 1205725go

are also executed to put objects 1205722 1205723 1205724 and 1205725 to cells 2and 6 to indicate that the itemsets 1198682 1198683 1198684 and 1198685 arefrequent 1-itemsets and to activate the computation in cell 2

Frequent 2-Itemsets Generation Within cell 2 rule 11990321 isexecuted to obtain all candidate frequent 2-itemsets Thedetection process of the candidate frequent 2-itemset 1198681 1198682is taken as an example Objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets Subrule()120572112057221205792not120573212 rarr 120573212 is executed to generate 120573212 The presenceof 120573212 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset and both 1198681 and 1198682 need to appear together in at least2 records for the itemset 1198681 1198682 to be a frequent 2-itemsetThedetection processes of the other candidate frequent 2-itemsetsare performed in the same way After the detection processesobjects 120573212 120573

213 120573214 120573215 120573223 120573224 120573225 120573234 120573235 and 120573245 are

generatedRule 11990322 is executed to delete the objects 1205721 1205722 1205723 1205724 and

1205725 that are not needed anymoreRule 11990324 is executed to detect all frequent 2-itemsets

Objects 11988611 and 11988612 11988641 and 11988642 11988681 and 11988682 and 11988691 and 11988692are in cell 2 which means the first the fourth the eighthand the ninth records contain both 1198681 and 1198682 The subrules(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 meet the execution condition andcan be executed Objects 11988621 and 11988622 11988631 and 11988632 11988651 and11988652 11988661 and 11988662 or 11988671 and 11988672 are not both in cell 2 whichmeans the second the third the fifth the sixth and theseventh records do not contain both 1198681 and 1198682 The subrules(12057312)1198862111988622 rarr 12057512 (12057312)1198863111988632 rarr 12057512 (12057312)1198865111988652 rarr 12057512(12057312)1198866111988662 rarr 12057512 and (12057312)1198867111988672 rarr 12057512 do not meetthe execution condition and cannot be executed Initially2 copies of 12057312 are in cell 2 indicating that both 1198681 and 1198682need to appear together in at least 2 records for the itemset1198681 1198682 to be a frequent 2-itemset Each execution of thesesubrules consumes one 12057312 Therefore 2 subrules among(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 can be executed After the execution2 copies of 12057312 are consumed and 2 copies of 12057512 are gener-ated The detection processes of other candidate frequent 2-itemsets are performed in the same way After the detectionprocesses objects 120573212 120573

213 12057314 120573

215 120573223 120573224 120573225 and 12057335 are

consumed and objects 120575212 120575213 12057514 120575

215 120575223 120575224 120575225 and 12057535

are generatedRule 11990323 is executed to process the results obtained by rule

11990324 The 2-itemset 1198681 1198682 is again taken as an example All ofthe 2 copies of 12057312 have been consumed by rule 11990324 subrule(120575212)not12057312 rarr 12057212go is executed to put an object 12057212 to cells3 and 6 to indicate that the itemset 1198681 1198682 is a frequent 2-itemset and to activate the computation in cell 3 Subrules(120575213)not12057313 rarr 12057213go (120575

215)not12057315 rarr 12057215go (120575

223)not12057323 rarr

12057223go (120575224)not12057324 rarr 12057224go and (120575225)not12057325 rarr 12057225go are also

executed which put objects 12057213 12057215 12057223 12057224 and 12057225 to cells

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

6 Discrete Dynamics in Nature and Society

InputThe objects 119886119894119895 encoded from the transactional database and objects 120579119896 representing the support countthreshold 119896Rule 11990301Copy all 119886119894119895 and 120579119896 to cells 1 to 119905 + 1

MethodRule 11990311Generate 120573119896119895 for 1 le 119895 le 119905 to form the candidate frequent 1-itemsets 1198621Rule 11990313Scan each object 119886119894119895 in cell 1 to count the frequency of each item If 119886119894119895 is in cell 1 consume one 120573119895 andgenerate one 120575119895 Continue until all 119896 copies of 120573119895 have been consumed or all objects 119886119894119895 have been scannedRule 11990312If all 119896 copies of 120573119895 have been consumed generate an object 120572119895 to add 119868119895 to 1198711 as a frequent 1-itemsetand pass 120572119895 to cells 2 and 119905 + 1 Delete all remaining copies of 120573119895 and delete all copies of 120575119895

For (2 le ℎ le 119905 and 119871ℎminus1 = ) do the following in cell ℎRule 119903ℎ1Scan the objects 1205721198951 sdotsdotsdot119895ℎminus1 representing the frequent (ℎ minus 1)-itemsets 119871ℎminus1 to generate the objects 120573

1198961198951 sdotsdotsdot119895ℎ

representing the candidate frequent ℎ-itemsets 119862ℎRule 119903ℎ2Delete all objects 1205721198951 sdotsdotsdot119895ℎminus1 after they have been used by rule 119903ℎ1Rule 119903ℎ4Scan the objects 119886119894119895 representing the database to count the frequency of each candidate frequent ℎ-itemset119862ℎ If the objects 1198861198941198951 119886119894119895ℎminus1 and 119886119894119895ℎ are all in cell ℎ consume one 1205731198951 sdotsdotsdot119895ℎ and generate one 1205751198951 sdotsdotsdot119895ℎ Continue until all copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed or all objects 119886119894119895 have been scannedRule 119903ℎ3If all 119896 copies of 1205731198951 sdotsdotsdot119895ℎ have been consumed generate 1205721198951 sdotsdotsdot119895ℎ to add 1198681198951 119868119895ℎ to 119871ℎ as a frequentℎ-itemset and put 1205721198951 sdotsdotsdot119895ℎ in cells ℎ + 1 and 119905 + 1 Delete all remaining copies of 1205731198951 sdotsdotsdot119895ℎ and delete all copiesof 1205751198951 sdotsdotsdot119895ℎ Let ℎ = ℎ + 1

OutputThe collection of all frequent itemsets 119871 encoded by objects 120572119895 12057211989511198952 1205721198951 sdotsdotsdot119895119899

Algorithm 1 ECTP-Apriori algorithm

candidate frequent 1-itemsets in the database are checked inthe flat maximally parallel Passing the results of the frequent1-itemsets to cells 2 and 119905 + 1 needs 1 computational step

Generating the frequent ℎ-itemsets (1 lt ℎ le 119905) needs4 computational steps Generating the candidate frequent ℎ-itemsets119862ℎ needs 1 computational step Cleaning thememoryused by the objects needs 1 computational step Finding thesupport counts of the candidate frequent ℎ-itemsets needs 1computational step All candidate frequent ℎ-itemsets in thedatabase are checked in flat maximally parallel Passing theresults of the frequent ℎ-itemsets to cells ℎ+1 and 119905 + 1 needs1 computational step

Therefore the time complexity of ECTPPI-Apriori is 1 +3 + 4(119905 minus 1) = 4119905 which gives 119874(119905) Note that 119874 is usedtraditionally to indicate the time complexity of an algorithmand the 119874 used here has a different meaning from that usedearlier when ECTPPI-Apriori is described

Some comparison results between ECTPPI-Apriori andthe original as well as some other improved parallel Apriorialgorithms are shown in Table 1 where |119862119896| is the number

Table 1 Time complexities of some Apriori algorithms

Algorithm Time complexityApriori [2] 119874 (119863119905 1003816100381610038161003816119862119896

1003816100381610038161003816 + 119905 1003816100381610038161003816119871119896minus110038161003816100381610038161003816100381610038161003816119871119896minus1

1003816100381610038161003816)

Apriori based on the booleanmatrix and Hadoop [3] 119874(119863119905)

Multi-GPU-based Apriori [4] 119874(119905|119871119896minus1||119871119896minus1|)

PApriori [5] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

Parallel Apriori algorithm onHadoop cluster [6] 119874(119863119905|119862119896| + 119905|119871119896minus1||119871119896minus1|)

ECTPPI-Apriori 119874(119905)

of candidate frequent 119896-itemsets and |119871119896minus1| is the number offrequent 119896 minus 1-itemsets

4 An Illustrative Example

An illustrative example is presented in this section todemonstrate how ECTPPI-Apriori works Table 2 shows the

Discrete Dynamics in Nature and Society 7

Table 2 The transactional database of the illustrative example

TID ItemsT100 1198681 1198682 1198685T200 1198682 1198684T300 1198682 1198683T400 1198681 1198682 1198684T500 1198681 1198683T600 1198682 1198683T700 1198681 1198683T800 1198681 1198682 1198683 1198685T900 1198681 1198682 1198683

transactional database of one branch office of All Electronics[1] There are 9 transactions and 5 fields in this database thatis 119863 = 9 and 119905 = 5 Suppose the support count threshold is119896 = 2 The computational processes are as follows

Input The database is transformed into objects 11988611 11988612 1198861511988622 11988624 11988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 1198868111988682 11988683 11988685 11988691 11988692 and 11988693 in a form that the P systemcan recognize These objects and objects 1205792 representing thesupport count threshold 119896 = 2 are entered into cell 0 toactivate the computation process Rule 11990301 = 119886119894119895 rarr 119886119894119895go cup

120579119896 rarr 120579119896go is executed to put copies of 11988611 11988612 11988615 11988622 1198862411988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 11988681 11988682 11988683 1198868511988691 11988692 11988693 and 1205792 to cells 1 sdot sdot sdot 5

Frequent 1-Itemsets Generation Within cell 1 the auxiliaryobjects 1205732119895 for 1 le 119895 le 5 are created by rule 11990311 to indicatethat each item 119895 needs to appear in at least 119896 = 2 records forit to be a frequent 1-itemset Rule 11990313 is executed to detect allfrequent 1-itemsets in flat maximally parallel The detectionprocess of the candidate frequent 1-itemset 1198681 is taken asan example Objects 11988611 11988641 11988651 11988671 11988681 and 11988691 are in cell1 which means the first the fourth the fifth the sevenththe eighth and the ninth records contain 1198681 The subrules119886111205731 rarr 1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751119886811205731 rarr 1205751 and 119886911205731 rarr 1205751meet the execution conditionand can be executed Objects 11988621 11988631 and 11988661 are not in cell 1which means the second the third and the sixth records donot contain 1198681 The subrules 119886211205731 rarr 1205751 119886311205731 rarr 1205751 and119886611205731 rarr 1205751donotmeet the execution condition and cannotbe executed Initially 2 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 2 records to make theitemset 1198681 a frequent 1-itemset Each execution of a subruleconsumes one 1205731 Therefore 2 of subrules among 119886111205731 rarr1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751 119886811205731 rarr 1205751and 119886911205731 rarr 1205751 can be executed in nondeterministic flatmaximally parallelThrough the execution of 2 such subrulesboth of the 2 copies of 1205731 are consumed and 2 copies of1205751 are generated The detection processes of other candidatefrequent 1-itemsets are performed in the same way After thedetection processes 12057321 120573

22 12057323 12057324 and 120573

25 are consumed and

12057521 12057522 12057523 12057524 and 12057525 are generated

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example All of

the 2 copies of 1205731 have been consumed subrule (12057521)not1205731 rarr1205721go is executed to put an object 1205721 to cells 2 and 6 toindicate that the itemset 1198681 is a frequent 1-itemset and toactivate the computation in cell 2 Subrules (12057522)not1205733 rarr 1205722go(12057523)not1205733 rarr 1205723go (120575

24)not1205734 rarr 1205724go and (12057525)not1205735 rarr 1205725go

are also executed to put objects 1205722 1205723 1205724 and 1205725 to cells 2and 6 to indicate that the itemsets 1198682 1198683 1198684 and 1198685 arefrequent 1-itemsets and to activate the computation in cell 2

Frequent 2-Itemsets Generation Within cell 2 rule 11990321 isexecuted to obtain all candidate frequent 2-itemsets Thedetection process of the candidate frequent 2-itemset 1198681 1198682is taken as an example Objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets Subrule()120572112057221205792not120573212 rarr 120573212 is executed to generate 120573212 The presenceof 120573212 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset and both 1198681 and 1198682 need to appear together in at least2 records for the itemset 1198681 1198682 to be a frequent 2-itemsetThedetection processes of the other candidate frequent 2-itemsetsare performed in the same way After the detection processesobjects 120573212 120573

213 120573214 120573215 120573223 120573224 120573225 120573234 120573235 and 120573245 are

generatedRule 11990322 is executed to delete the objects 1205721 1205722 1205723 1205724 and

1205725 that are not needed anymoreRule 11990324 is executed to detect all frequent 2-itemsets

Objects 11988611 and 11988612 11988641 and 11988642 11988681 and 11988682 and 11988691 and 11988692are in cell 2 which means the first the fourth the eighthand the ninth records contain both 1198681 and 1198682 The subrules(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 meet the execution condition andcan be executed Objects 11988621 and 11988622 11988631 and 11988632 11988651 and11988652 11988661 and 11988662 or 11988671 and 11988672 are not both in cell 2 whichmeans the second the third the fifth the sixth and theseventh records do not contain both 1198681 and 1198682 The subrules(12057312)1198862111988622 rarr 12057512 (12057312)1198863111988632 rarr 12057512 (12057312)1198865111988652 rarr 12057512(12057312)1198866111988662 rarr 12057512 and (12057312)1198867111988672 rarr 12057512 do not meetthe execution condition and cannot be executed Initially2 copies of 12057312 are in cell 2 indicating that both 1198681 and 1198682need to appear together in at least 2 records for the itemset1198681 1198682 to be a frequent 2-itemset Each execution of thesesubrules consumes one 12057312 Therefore 2 subrules among(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 can be executed After the execution2 copies of 12057312 are consumed and 2 copies of 12057512 are gener-ated The detection processes of other candidate frequent 2-itemsets are performed in the same way After the detectionprocesses objects 120573212 120573

213 12057314 120573

215 120573223 120573224 120573225 and 12057335 are

consumed and objects 120575212 120575213 12057514 120575

215 120575223 120575224 120575225 and 12057535

are generatedRule 11990323 is executed to process the results obtained by rule

11990324 The 2-itemset 1198681 1198682 is again taken as an example All ofthe 2 copies of 12057312 have been consumed by rule 11990324 subrule(120575212)not12057312 rarr 12057212go is executed to put an object 12057212 to cells3 and 6 to indicate that the itemset 1198681 1198682 is a frequent 2-itemset and to activate the computation in cell 3 Subrules(120575213)not12057313 rarr 12057213go (120575

215)not12057315 rarr 12057215go (120575

223)not12057323 rarr

12057223go (120575224)not12057324 rarr 12057224go and (120575225)not12057325 rarr 12057225go are also

executed which put objects 12057213 12057215 12057223 12057224 and 12057225 to cells

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

Discrete Dynamics in Nature and Society 7

Table 2 The transactional database of the illustrative example

TID ItemsT100 1198681 1198682 1198685T200 1198682 1198684T300 1198682 1198683T400 1198681 1198682 1198684T500 1198681 1198683T600 1198682 1198683T700 1198681 1198683T800 1198681 1198682 1198683 1198685T900 1198681 1198682 1198683

transactional database of one branch office of All Electronics[1] There are 9 transactions and 5 fields in this database thatis 119863 = 9 and 119905 = 5 Suppose the support count threshold is119896 = 2 The computational processes are as follows

Input The database is transformed into objects 11988611 11988612 1198861511988622 11988624 11988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 1198868111988682 11988683 11988685 11988691 11988692 and 11988693 in a form that the P systemcan recognize These objects and objects 1205792 representing thesupport count threshold 119896 = 2 are entered into cell 0 toactivate the computation process Rule 11990301 = 119886119894119895 rarr 119886119894119895go cup

120579119896 rarr 120579119896go is executed to put copies of 11988611 11988612 11988615 11988622 1198862411988632 11988633 11988641 11988642 11988644 11988651 11988653 11988662 11988663 11988671 11988673 11988681 11988682 11988683 1198868511988691 11988692 11988693 and 1205792 to cells 1 sdot sdot sdot 5

Frequent 1-Itemsets Generation Within cell 1 the auxiliaryobjects 1205732119895 for 1 le 119895 le 5 are created by rule 11990311 to indicatethat each item 119895 needs to appear in at least 119896 = 2 records forit to be a frequent 1-itemset Rule 11990313 is executed to detect allfrequent 1-itemsets in flat maximally parallel The detectionprocess of the candidate frequent 1-itemset 1198681 is taken asan example Objects 11988611 11988641 11988651 11988671 11988681 and 11988691 are in cell1 which means the first the fourth the fifth the sevenththe eighth and the ninth records contain 1198681 The subrules119886111205731 rarr 1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751119886811205731 rarr 1205751 and 119886911205731 rarr 1205751meet the execution conditionand can be executed Objects 11988621 11988631 and 11988661 are not in cell 1which means the second the third and the sixth records donot contain 1198681 The subrules 119886211205731 rarr 1205751 119886311205731 rarr 1205751 and119886611205731 rarr 1205751donotmeet the execution condition and cannotbe executed Initially 2 copies of 1205731 are in cell 1 indicating thatthe first item needs to appear in at least 2 records to make theitemset 1198681 a frequent 1-itemset Each execution of a subruleconsumes one 1205731 Therefore 2 of subrules among 119886111205731 rarr1205751 119886411205731 rarr 1205751 119886511205731 rarr 1205751 119886711205731 rarr 1205751 119886811205731 rarr 1205751and 119886911205731 rarr 1205751 can be executed in nondeterministic flatmaximally parallelThrough the execution of 2 such subrulesboth of the 2 copies of 1205731 are consumed and 2 copies of1205751 are generated The detection processes of other candidatefrequent 1-itemsets are performed in the same way After thedetection processes 12057321 120573

22 12057323 12057324 and 120573

25 are consumed and

12057521 12057522 12057523 12057524 and 12057525 are generated

Rule 11990312 is then executed to process the results obtained byrule 11990313 The 1-itemset 1198681 is again taken as an example All of

the 2 copies of 1205731 have been consumed subrule (12057521)not1205731 rarr1205721go is executed to put an object 1205721 to cells 2 and 6 toindicate that the itemset 1198681 is a frequent 1-itemset and toactivate the computation in cell 2 Subrules (12057522)not1205733 rarr 1205722go(12057523)not1205733 rarr 1205723go (120575

24)not1205734 rarr 1205724go and (12057525)not1205735 rarr 1205725go

are also executed to put objects 1205722 1205723 1205724 and 1205725 to cells 2and 6 to indicate that the itemsets 1198682 1198683 1198684 and 1198685 arefrequent 1-itemsets and to activate the computation in cell 2

Frequent 2-Itemsets Generation Within cell 2 rule 11990321 isexecuted to obtain all candidate frequent 2-itemsets Thedetection process of the candidate frequent 2-itemset 1198681 1198682is taken as an example Objects 1205721 and 1205722 are in cell 2 whichmeans itemsets 1198681 and 1198682 are frequent 1-itemsets Subrule()120572112057221205792not120573212 rarr 120573212 is executed to generate 120573212 The presenceof 120573212 means the 2-itemset 1198681 1198682 is a candidate frequent 2-itemset and both 1198681 and 1198682 need to appear together in at least2 records for the itemset 1198681 1198682 to be a frequent 2-itemsetThedetection processes of the other candidate frequent 2-itemsetsare performed in the same way After the detection processesobjects 120573212 120573

213 120573214 120573215 120573223 120573224 120573225 120573234 120573235 and 120573245 are

generatedRule 11990322 is executed to delete the objects 1205721 1205722 1205723 1205724 and

1205725 that are not needed anymoreRule 11990324 is executed to detect all frequent 2-itemsets

Objects 11988611 and 11988612 11988641 and 11988642 11988681 and 11988682 and 11988691 and 11988692are in cell 2 which means the first the fourth the eighthand the ninth records contain both 1198681 and 1198682 The subrules(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 meet the execution condition andcan be executed Objects 11988621 and 11988622 11988631 and 11988632 11988651 and11988652 11988661 and 11988662 or 11988671 and 11988672 are not both in cell 2 whichmeans the second the third the fifth the sixth and theseventh records do not contain both 1198681 and 1198682 The subrules(12057312)1198862111988622 rarr 12057512 (12057312)1198863111988632 rarr 12057512 (12057312)1198865111988652 rarr 12057512(12057312)1198866111988662 rarr 12057512 and (12057312)1198867111988672 rarr 12057512 do not meetthe execution condition and cannot be executed Initially2 copies of 12057312 are in cell 2 indicating that both 1198681 and 1198682need to appear together in at least 2 records for the itemset1198681 1198682 to be a frequent 2-itemset Each execution of thesesubrules consumes one 12057312 Therefore 2 subrules among(12057312)1198861111988612 rarr 12057512 (12057312)1198864111988642 rarr 12057512 (12057312)1198868111988682 rarr 12057512and (12057312)1198869111988692 rarr 12057512 can be executed After the execution2 copies of 12057312 are consumed and 2 copies of 12057512 are gener-ated The detection processes of other candidate frequent 2-itemsets are performed in the same way After the detectionprocesses objects 120573212 120573

213 12057314 120573

215 120573223 120573224 120573225 and 12057335 are

consumed and objects 120575212 120575213 12057514 120575

215 120575223 120575224 120575225 and 12057535

are generatedRule 11990323 is executed to process the results obtained by rule

11990324 The 2-itemset 1198681 1198682 is again taken as an example All ofthe 2 copies of 12057312 have been consumed by rule 11990324 subrule(120575212)not12057312 rarr 12057212go is executed to put an object 12057212 to cells3 and 6 to indicate that the itemset 1198681 1198682 is a frequent 2-itemset and to activate the computation in cell 3 Subrules(120575213)not12057313 rarr 12057213go (120575

215)not12057315 rarr 12057215go (120575

223)not12057323 rarr

12057223go (120575224)not12057324 rarr 12057224go and (120575225)not12057325 rarr 12057225go are also

executed which put objects 12057213 12057215 12057223 12057224 and 12057225 to cells

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

8 Discrete Dynamics in Nature and Society

Table 3 Generation of frequent 1-itemsets

119903119894119895 Cell 0 Cell 1 Cell 6

01198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

1119886111198861211988615119886221198862411988632119886331198864111988642

(11990301) 1198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120579

2

21198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693120573

2112057322120573231205732412057325(11990311)

311988632119886421198865111988662119886631198867111988673119886811198868211988683119886911198869211988693120575

2112057522120575231205752412057525(11990313)

4119886321198864211988651119886621198866311988671119886731198868111988682 1205721120572212057231205724120572511988683119886911198869211988693(11990312)

Table 4 Generation of frequent 2-itemsets

119903119894119895 Cell 2 Cell 6

4 120572112057221205723120572412057251205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693

5 120572112057221205723120572412057251205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990321)

6 1205732121205732131205732141205732151205732231205732241205732251205732341205732351205732451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990322)

7 1205752121205752131205751412057314120575

2151205752231205752241205752251205732341205753512057335120573

2451205792

120572112057221205723120572412057251198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990324)

8 1205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990323) 1205721312057215120572231205722412057225

3 and 6 to indicate that the itemsets 1198681 1198683 1198681 1198685 1198682 11986831198682 1198684 and 1198682 1198685 are frequent 2-itemsets and to activate thecomputation in cell 3

The 4 rules in each cell ℎ for 3 le ℎ le 5 are executedin ways similar to those in cell 2 The rules in these cellsdetect the frequent ℎ-itemsets for 3 le ℎ le 5 After thecomputation halts the objects 1205721 1205722 1205723 1205724 1205725 12057212 12057213 1205721512057223 12057224 12057225 120572123 and 120572125 are stored in cell 6 which means1198681 1198682 1198683 1198684 1198685 1198681 1198682 1198681 1198683 1198681 1198683 1198682 1198683 1198682 11986841198682 1198685 1198681 1198682 1198683 and 1198681 1198682 1198685 are all frequent itemsets inthis database

The change of objects in the computation processes islisted in Tables 3ndash6

5 Computational Experiments

Two databases from the UCI Machine Learning Repository[23] are used to conduct computational experiments Com-putational results on these two databases are reported in thissection

51 Results on the Congressional Voting Records DatabaseThe Congressional Voting Records database [23] is usedto test the performance of ECTPPI-Apriori This database

contains 435 records and 17 attributes (fields) The firstattribute is the party that the voter voted for and the 2ndto the 17th attributes are sixteen characteristics of each voteridentified by the Congressional Quarterly Almanac The firstattribute has two values Democrats or Republican and eachof the 2nd to the 17th attributes has 3 values yea nay andunknown dispositionThe frequent itemsets of these attributevalues need to be identified that is the problem is to find theattribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record in the database has 2 times 1 + 3 times (17 minus 1) = 50attributes ECTPPI-Apriori then can be used to discover thefrequent itemsets In this experiment one itemset is calleda frequent itemset if it appeared in more than 40 of allrecords that is the support count threshold is 119896 = 174(435 times 40) The frequent itemsets obtained by ECTPPI-Apriori are listed in Table 7

52 Results on the Mushroom Database The Mushroomdatabase [23] is also used to test ECTPPI-Apriori Thisdatabase contains 8124 records The 8124 records are num-bered orderly from 1 to 8124 Each record represents one

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

Discrete Dynamics in Nature and Society 9

Table 5 Generation of frequent 3-itemsets

119903119894119895 Cell 3 Cell 6

8 1205721212057213120572151205722312057224120572251205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

9 1205721212057213120572151205722312057224120572251205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990331) 1205721312057215120572231205722412057225

10 1205732123120573212512057321351205732234120573223512057322451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990332) 1205721312057215120572231205722412057225

11 12057521231205752125120575135120573135120573

2234120575235120573235120573

22451205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990334) 1205721312057215120572231205722412057225

121205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990333) 1205721312057215120572231205722412057225

120572123120572125

Table 6 Generation of frequent 4-itemsets

119903119894119895 Cell 4 Cell 6

12120572123120572125120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 1205721312057215120572231205722412057225

120572123120572125

13120572123120572125120573

212351205792 1205721120572212057231205724120572512057212

1198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990341) 1205721312057215120572231205722412057225120572123120572125

1412057321235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990342) 1205721312057215120572231205722412057225

120572123120572125

1512057512351205731235120579

2 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990344) 1205721312057215120572231205722412057225

120572123120572125

161205792 12057211205722120572312057241205725120572121198861111988612119886151198862211988624119886321198863311988641119886421198864411988651119886531198866211988663119886711198867311988681119886821198868311988685119886911198869211988693 (11990343) 1205721312057215120572231205722412057225

120572123120572125

Table 7 The frequent itemsets identified by ECTPPI-Apriori for the Congressional Voting Records database

Size of frequentitemset The corresponding frequent itemsets

12 3 5 6 8 9 12 14 15 17 18 21 2324 26 27 29 30 32 35 38 39 41 42 4547 48

2

2 9 2 14 2 17 2 21 2 24 2 27 2 38 2 415 18 9 14 9 17 9 21 9 24 9 27 9 38 14 1714 21 14 24 14 27 14 38 14 41 15 18 15 2915 42 17 21 17 24 17 27 17 38 18 29 18 3918 42 18 47 21 24 21 27 21 38 24 27 24 3824 41 29 42 39 42 42 47

3

2 9 14 2 9 17 2 9 21 2 9 24 2 9 38 2 14 172 14 21 2 14 24 2 14 27 2 14 38 2 14 412 17 21 2 17 24 2 21 24 2 24 27 2 24 38 9 14 179 14 21 9 14 24 9 14 38 9 17 24 9 21 24 9 24 3814 17 21 14 17 24 14 21 24 14 24 27 14 24 3815 18 29 15 18 42 17 21 24 17 24 27 17 24 38

4

2 9 14 17 2 9 14 21 2 9 14 24 2 9 14 38 2 9 17 242 9 21 24 2 14 17 21 2 14 17 24 2 14 21 242 14 24 38 2 17 21 24 9 14 17 24 9 14 21 2414 17 21 24 2 9 14 17 24 2 9 14 21 24

5 2 14 17 21 246

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

10 Discrete Dynamics in Nature and Society

Table 8The frequent itemsets identified by ECTPPI-Apriori for theMushroom database

Size of frequentitemset The corresponding frequent itemsets

1 1 2 3 4 5

21 2 1 3 1 4 1 5 2 3 2 4 2 5 3 43 5 4 5

3 1 2 3 1 2 5 1 3 5 2 3 4 2 3 5

4 1 2 3 5

5

mushroom and has 23 attributes (fields) The first attribute isthe poisonousness of the mushroom and the 2nd to the 23rdattributes are 22 characteristics of themushrooms Each of theattributes has 2 to 12 values The frequent itemsets of theseattribute values need to be found that is the problem is tofind the attribute values which always appear together

Initially the database is preprocessed Each attributevalue is taken as a new attribute In this way each newattribute has only two values yes or no After preprocessingeach record has 118 attributes ECTPPI-Apriori then can beused to discover the frequent itemsets In this experimentone itemset is a frequent itemset if it appears in more than40 percent of all records that is the support count thresholdis 119896 = 3250 (8124 times 40) The frequent itemsets obtained byECTPPI-Apriori are listed in Table 8

6 Conclusions

An improved Apriori algorithm called ECTPPI-Apriori isproposed for frequent itemsets mining The algorithm uses aparallelmechanism in the ECPI tissue-like P systemThe timecomplexity of ECTPPI-Apriori is improved to119874(119905) comparedto other parallel Apriori algorithms Experimental resultsusing the Congressional Voting Records database and theMushroom database show that ECTPPI-Apriori performswell in frequent itemsets mining The results give some hintsto improve conventional algorithms by using the parallelmechanism of membrane computing models

For further research it is of interests to use some otherinteresting neural-likemembrane computingmodels such asthe spiking neural P systems (SN P systems) [8] to improvethe Apriori algorithm SN P systems are inspired by themechanism of the neurons that communicate by transmittingspikes The cells in SN P systems are neurons that haveonly one type of objects called spikes Zhang et al [2425] Song et al [26] and Zeng et al [27] provided goodexamples Also some other data mining algorithms can beimproved by using parallel evolution mechanisms and graphmembrane structures such as spectral clustering supportvector machines and genetic algorithms [1]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

Project is supported by National Natural Science Foundationof China (nos 61472231 61502283 61640201 61602282 andZR2016AQ21)

References

[1] J Han M Kambr and J Pei Data Mining Concepts andTechniques Elsevier Amsterdam Netherlands 2012

[2] R Agrawal and R Srikant ldquoFast algorithms for mining associa-tion rules in large databasesrdquo in Proceedings of the InternationalConference on Very Large Data Bases vol 1 pp 487ndash499September 1994

[3] H Yu J Wen H Wang and J Li ldquoAn improved Apriorialgorithm based on the boolean matrix and Hadooprdquo ProcediaEngineering vol 15 no 1 pp 1827ndash1831 2011

[4] J Li F Sun X Hu and W Wei ldquoA multi-GPU implementationof apriori algorithm for mining association rules in medicaldatardquo ICIC Express Letters vol 9 no 5 pp 1303ndash1310 2015

[5] N Li L Zeng Q He and Z Shi ldquoParallel implementationof apriori algorithm based on MapReducerdquo in Proceedings ofthe 13th ACIS International Conference on Software Engineer-ing Artificial Intelligence Networking and ParallelDistributedComputing (SNPD rsquo12) pp 236ndash241 Kyoto Japan August 2012

[6] A Ezhilvathani and K Raja ldquoImplementation of parallelApriori algorithm on Hadoop clusterrdquo International Journal ofComputer Science and Mobile Computing vol 2 no 4 pp 513ndash516 2013

[7] G Paun ldquoComputing with membranesrdquo Journal of Computerand System Sciences vol 61 no 1 pp 108ndash143 2000

[8] Gh Paun G Rozenberg andA SalomaaTheOxfordHandbookof Membrane Computing Oxford University Press Oxford UK2010

[9] L Pan G Paun and B Song ldquoFlat maximal parallelism in Psystemswith promotersrdquoTheoretical Computer Science vol 623pp 83ndash91 2016

[10] B Song L Pan andM J Perez-Jimenez ldquoTissue P systemswithprotein on cellsrdquo Fundamenta Informaticae vol 144 no 1 pp77ndash107 2016

[11] X Zhang L Pan and A Paun ldquoOn the universality of axon Psystemsrdquo IEEE Transactions on Neural Networks and LearningSystems vol 26 no 11 pp 2816ndash2829 2015

[12] J Wang P Shi and H Peng ldquoMembrane computing model forIIR filter designrdquo Information Sciences vol 329 pp 164ndash1762016

[13] G Singh and K Deep ldquoA new membrane algorithm using therules of Particle Swarm Optimization incorporated within theframework of cell-like P-systems to solve Sudokurdquo Applied SoftComputing Journal vol 45 pp 27ndash39 2016

[14] G Zhang H Rong J Cheng and Y Qin ldquoA Population-membrane-system-inspired evolutionary algorithm for distri-bution network reconfigurationrdquo Chinese Journal of Electronicsvol 23 no 3 pp 437ndash441 2014

[15] H Peng J Wang M J Perez-Jimenez and A Riscos-NunezldquoAn unsupervised learning algorithm for membrane comput-ingrdquo Information Sciences vol 304 pp 80ndash91 2015

[16] X Zeng L Xu X Liu and L Pan ldquoOn languages generatedby spiking neural P systems with weightsrdquo Information Sciencesvol 278 pp 423ndash433 2014

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

Discrete Dynamics in Nature and Society 11

[17] X Liu Z Li J Liu L Liu and X Zeng ldquoImplementation ofarithmetic operations with time-free spiking neural P systemsrdquoIEEE Transactions on Nanobioscience vol 14 no 6 pp 617ndash6242015

[18] T Song P Zheng M L Dennis Wong and X Wang ldquoDesignof logic gates using spiking neural P systems with homogeneousneurons and astrocytes-like controlrdquo Information Sciences vol372 pp 380ndash391 2016

[19] L Pan and G Paun ldquoOn parallel array P systems automatardquoin Universality Computation vol 12 pp 171ndash181 SpringerInternational New York NY USA 2015

[20] T Song H Zheng and J He ldquoSolving vertex cover problem bytissue P systems with cell divisionrdquo Applied Mathematics andInformation Sciences vol 8 no 1 pp 333ndash337 2014

[21] Y Zhao X Liu and W Wang ldquoROCK clustering algorithmbased on the P system with active membranesrdquoWSEAS Trans-actions on Computers vol 13 pp 289ndash299 2014

[22] C Martın-Vide G Paun J Pazos and A Rodrıguez-PatonldquoTissue P systemsrdquo Theoretical Computer Science vol 296 no2 pp 295ndash326 2003

[23] M Lichman UCI Machine Learning Repository Universityof California School of Information and Computer ScienceIrvine Calif USA 2013 httparchiveicsucieduml

[24] X Zhang B Wang and L Pan ldquoSpiking neural P systems witha generalized use of rulesrdquo Neural Computation vol 26 no 12pp 2925ndash2943 2014

[25] X Zhang X Zeng B Luo and L Pan ldquoOn some classes ofsequential spiking neural P systemsrdquo Neural Computation vol26 no 5 pp 974ndash997 2014

[26] T Song L Pan and Gh Paun ldquoAsynchronous spiking neural Psystems with local synchronizationrdquo Information Sciences vol219 pp 197ndash207 2012

[27] X Zeng X Zhang T Song and L Pan ldquoSpiking neural Psystems with thresholdsrdquoNeural Computation vol 26 no 7 pp1340ndash1361 2014

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 12: An Improved Apriori Algorithm Based on an Evolution ...downloads.hindawi.com/journals/ddns/2017/6978146.pdf · ResearchArticle An Improved Apriori Algorithm Based on an Evolution-Communication

Submit your manuscripts athttpswwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of