4
A Parallel Genetic Algorithms Framework based on Hadoop MapReduce Filomena Ferrucci Pasquale Salza University of Salerno, Italy {fferrucci,psalza}@unisa.it M-Tahar Kechadi University College Dublin Ireland [email protected] Federica Sarro University College London United Kingdom [email protected] ABSTRACT This paper describes a framework for developing parallel Ge- netic Algorithms (GAs ) on the Hadoop platform, following the paradigm of MapReduce. The framework allows devel- opers to focus on the aspects of GA that are specific to the problem to be addressed. Using the framework a GA appli- cation has been devised to address the Feature Subset Selec- tion problem. A preliminary performance analysis showed promising results. Categories and Subject Descriptors C.2.4 [Computer-communication Networks]: Dis- tributed Systems—Distributed applications ; D.1.3 [Programming Techniques]: Concurrent Program- ming—Distributed programming General Terms Algorithms, Experimentation, Performance Keywords Hadoop, MapReduce, Parallel Genetic Algorithms 1. INTRODUCTION Genetic Algorithms (GAs ) are powerful metaheuristic techniques used to address many software engineering prob- lems [7] that involve finding a suitable balance between com- peting and potentially conflicting goals, such as looking for the set of requirements that balance software development cost and customer satisfaction or searching for the best al- location of resources to a software development project. GAs simulate several aspects of the “Darwin’s Evolution Theory” to converge towards optimal or near-optimal solu- tions [5]. Starting from an initial population of individu- als, each represented as a string (chromosome ) over a finite alphabet (“genes”), every iteration of the algorithm gener- ates a new population executing genetic operations on se- lected individuals, such as crossover and mutation. Along Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. ACM SAC’15 April 13-17, 2015, Salamanca, Spain. Copyright 2015 ACM 978-1-4503-3196-8/15/04...$15.00. http://dx.doi.org/10.1145/2695664.2696060 ...$15.00. the chain of generations, the population tends to improve its individuals by keeping the strongest individuals (charac- terised by the best fitness value) and rejecting the weakest ones. The generations continue until a specified stopping condition is verified. The problem solution is given by the individual with the best fitness value in the last population. GAs are usually executed on single machines as sequential programs, so scalability issues prevent that they are effec- tively applied to real-world problems. However, these algo- rithms are naturally parallelisable, as it is possible to use more than one population and execute the operators in par- allel. Parallel systems are becoming commonplace mainly due to the increasing popularity of Cloud Systems and the availability of distributed platforms, such as Apache Hadoop [12], characterized by easy scalability, reliability and fault- tolerance. It is characterized by the use of the MapReduce algorithm and the distributed file system HDFS, which run on large clusters of commodity machines. In this paper, it is presented a framework for developing GAs that can be executed on the Hadoop platform, follow- ing the paradigm of MapReduce. The main purpose of the framework is to completely hide the inner aspects of Hadoop and allow users to focus on the definition of their problems in terms of GAs. An example of use of the framework for the “Feature Subset Selection”problem is also provided together with preliminary performance results. 2. RELATED WORK As described by Di Martino et al. [3], there exist tree pos- sible main grain parallelism implementations of GAs by ex- ploiting the MapReduce paradigm: Fitness Evaluation Level (Global Parallelisation Model ); Population Level (Coarse- grained Parallelisation Model or Island Model ); Individual Level (Fine-grain Parallelisation Model or Grid Model ) [3]. The Global Parallelisation Model was implemented by Di Martino et al. [3] with three different Google App Engine MapReduce platform configurations to address automatic test data generation. For the same purpose, Di Geronimo et al. [2] developed Coarse-grained Parallelisation Model on the Hadoop MapRe- duce platform. “MRPGA” is a parallel GAs framework based on .Net and “Aneka”, a platform which simplifies the creation and com- munication of nodes in a Grid environment [8]. It exploits the MapReduce paradigm, but with a second reduce phase. A master coordinator manages the parallel GA execution among nodes, following the islands parallelisation model, in which each node computes the GA operators for a portion

A Parallel Algorithms Framework Based on Hadoop Mapreduce

Embed Size (px)

DESCRIPTION

Genetic

Citation preview

A Parallel Genetic Algorithms Framework based onHadoop MapReduceFilomena FerrucciPasquale SalzaUniversity of Salerno, Italy{fferrucci,psalza}@unisa.itM-Tahar KechadiUniversity College [email protected] SarroUniversity College LondonUnited [email protected] paper describes a framework for developing parallel Ge-neticAlgorithms (GAs)ontheHadoopplatform, followingtheparadigmof MapReduce. Theframeworkallowsdevel-operstofocusontheaspectsofGAthatarespecictotheproblemtobeaddressed. UsingtheframeworkaGAappli-cation has been devised to address the FeatureSubsetSelec-tionproblem. Apreliminaryperformanceanalysisshowedpromisingresults.Categories and Subject DescriptorsC.2.4 [Computer-communication Networks]: Dis-tributed SystemsDistributed applications; D.1.3[Programming Techniques]: Concurrent Program-mingDistributedprogrammingGeneral TermsAlgorithms,Experimentation,PerformanceKeywordsHadoop,MapReduce,ParallelGeneticAlgorithms1. INTRODUCTIONGenetic Algorithms (GAs) are powerful metaheuristictechniques used to address many software engineering prob-lems [7] that involve nding a suitable balance between com-petingandpotentiallyconictinggoals,suchaslookingforthesetof requirementsthatbalancesoftwaredevelopmentcostandcustomersatisfactionorsearchingforthebestal-locationofresourcestoasoftwaredevelopmentproject.GAs simulateseveral aspectsofthe DarwinsEvolutionTheory toconvergetowardsoptimal ornear-optimal solu-tions [5]. Startingfromaninitial populationof individu-als,eachrepresentedasastring(chromosome)overanitealphabet(genes), everyiterationof thealgorithmgener-atesanewpopulationexecutinggeneticoperationsonse-lectedindividuals, suchascrossover andmutation. AlongPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for prot or commercial advantage and that copiesbear this notice and the full citation on the rst page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specicpermission and/or a fee.ACM SAC15 April 13-17, 2015, Salamanca, Spain.Copyright 2015 ACM 978-1-4503-3196-8/15/04...$15.00.http://dx.doi.org/10.1145/2695664.2696060...$15.00.thechainof generations, thepopulationtends toimproveitsindividualsbykeepingthestrongestindividuals(charac-terisedbythebesttnessvalue)andrejectingtheweakestones. The generations continue until aspeciedstoppingconditionisveried. Theproblemsolutionisgivenbytheindividual withthe best tness value in the lastpopulation.GAsare usually executed on single machines as sequentialprograms, soscalabilityissuespreventthattheyareeec-tivelyappliedtoreal-worldproblems. However,thesealgo-rithms arenaturallyparallelisable, as it is possibletousemore than one population and execute the operators in par-allel. Parallel systems arebecomingcommonplacemainlyduetotheincreasingpopularityof CloudSystems andtheavailability of distributed platforms, such as ApacheHadoop[12], characterizedbyeasyscalability, reliabilityandfault-tolerance. Itischaracterizedbytheuseof theMapReducealgorithmandthedistributedlesystemHDFS,whichrunonlargeclustersofcommoditymachines.Inthispaper, itispresentedaframeworkfordevelopingGAsthatcanbeexecutedontheHadoopplatform,follow-ingtheparadigmof MapReduce. Themainpurposeof theframework is to completely hide the inner aspects of Hadoopandallowuserstofocusonthedenitionoftheirproblemsin terms of GAs. An example of use of the framework for theFeature Subset Selectionproblem is also provided togetherwithpreliminaryperformanceresults.2. RELATED WORKAs described by Di Martino et al. [3], there exist tree pos-siblemaingrainparallelismimplementationsofGAsbyex-ploiting the MapReduceparadigm: FitnessEvaluationLevel(Global ParallelisationModel ); PopulationLevel (Coarse-grainedParallelisationModel or IslandModel ); IndividualLevel (Fine-grainParallelisationModel orGridModel )[3].The Global ParallelisationModel was implementedbyDiMartinoetal. [3] withthreedierentGoogleAppEngineMapReduce platformcongurations to address automatictestdatageneration.Forthesamepurpose, Di Geronimoetal. [2] developedCoarse-grained Parallelisation Modelon the Hadoop MapRe-duceplatform.MRPGAis a parallel GAsframework based on .NetandAneka,aplatformwhichsimpliesthecreationandcom-municationofnodesinaGridenvironment[8]. ItexploitstheMapReduceparadigm, butwithasecondreducephase.Amaster coordinator manages the parallel GAexecutionamongnodes,followingtheislandsparallelisationmodel,inwhicheachnodecomputestheGAoperatorsforaportionofthepopulationonly. Duringtherststep, everymappernode reads a portion of individuals and computes the tnessvalues. Then, in the rst reducephase, each node applies theselectionoperatorfortheinputindividuals, givingalocaloptimum relative to each island. The eventual single reducercomputes the global optimum and applies the remaining GAoperators. ComparingtheresultsfromMRPGAwithase-quential versionof thesamealgorithm, theyhavehadaninterestingdegreeof scalabilityoverhead factorbutalsoahighoverhead.The approachbyVerma et al. [11] was developedonHadoop. The number of Mapperand Reducernodes are un-related. The main dierence with MRPGA lies in the fact asortof migration isperformedamongtheindividualsasitrandomlysendstheoutcomeoftheMappernodestodier-entReducernodes. ResultsshowedthepotentialofparallelGA approach for the solution of big computation problems.3. THE PROPOSED FRAMEWORKThe proposed framework aims to reduce the eort for im-plementingparallel GAs, exploitingthefeaturesofHadoopin terms of scalability, reliability and fault tolerance, thus al-lowingdevelopersdonotworryabouttheinfrastructurefora distributed computation. One of the problems with the ex-ecutionofGAsinadistributedsystemistheoverheadthatmight be reducedif the networkmovements are reduced.Soitisimportanttominimisethenumberofphases,when-evertheyareassociatedtoanexecutionwhichpassesfromamachinetoanewone. For this reason, theframeworkimplementstheislands model (mostlyfollowingthedesignprovided from Di Martino et al. [3]), forcing the machines toworkalwayswiththesameportionofpopulationandusingjustoneReducerphase.Inthefollowingitisdescribedthedesignoftheinvolvedcomponents,startingfromtheDriver,whichisthefulcrumoftheframework.3.1 The DriverItmanagestheinteractionwiththeuserandlaunchjobsonthecluster. Itisexecutedonamachineevenseparatedfrom the distributed cluster and it also computes some func-tionsinordertomanagethereceivedresults,generationbygeneration.TheelementsinvolvedintotheDriverprocessare:Initialiser: theusercandenehowtogeneratetherstindividualsforeachisland, butthedefaultimplementationmakes it randomly. The Initialiser is a MapReduce jobwhichproduces theindividuals intheformof asequenceofles,storedinaserialisedformdirectlyinHDFS;Generator: it executes one generation on the cluster andproducesthenewpopulation,storingeachindividualagainin HDFSwith the tness values and a ag indicating if an in-dividual satises the stoppingconditiondened by the user;Terminator: attheendof eachgeneration, theTermi-nator componentchecksthestoppingconditions: thestan-dard one of if the maximum number of generations has beenreached oradenedbyuserone,bycheckingthepresenceof the ag in the HDFS. Once terminated, the population isdirectlysubmittedtotheSolutionsFilterjob;Migrator: thisjoballowsmovingindividualsfromanis-land to another, according to the criteria dened by the usersuch as the period of migration,number and destinations ofmigrants andthe selectedmethodfor choosing migrants.Thisphaseisoptionalanditcanoccurbysettinga migra-tionperiod parameter;SolutionsFilter: whenthejobterminates, all theindi-vidualsofthelastgenerationarelteredaccordingtothosethatsatisfythestoppingconditionandthosewhichdonot.Then,theresultsofthewholeprocessisstoredinHDFS.3.2 The Generator and the other componentsEach generatorjob makes the population evolve. In ordertodevelopthe complexstructure describedbelow, it wasneededtousemultipleMapTasks andReduceTasks. Thisispossibleusingaparticularversionof ChainMapper andChainReducer classesofHadoop, slightlymodiedinordertotreatAvroobjectsratherthantherawserialisationusedbyHadoop as default method. Thus, usingAvro [1] it iseasier to store objects and to save space on the disk, also al-lowing any external treatment of them and a quick exchangeofdataamongthepartiesinvolvedintheMapReducecom-munication.Achainallowstomanagethetasksintheformdescribedbythepattern:(MAP)+(REDUCE) (MAP)whichmeansoneormoreMapTasks,followedbyoneRe-duceTaskandotherpossibleMapTasks.ThegenerationworkisdistributedonnodesoftheCloudasfollows:Splitter: the Splittertakes as input the population, dese-rialisesitandsplitstheindividualsintoJgroups(islands),accordingtotheorderofindividuals. Eachsplitcontainsalistofrecords,onepereachindividual:split : individuals list (individual, null)Duringthedeserialisation, thesplitter adds someeldstotheobjectswhichwillbeusefulforthenextstepsofthecomputation,suchasthetnessfunction;Fitness: here according to Jislands, the JMapperscom-putethetnessvaluesforeachindividualofitscorrespond-ingisland:map : (individual, null) (individualF, null)The user denes how to evaluate the tnessand the valuesarestoredinsidethecorrespondingeldoftheobjects;TerminationCheck: withoutleavingthesamemachineof the previous map, the secondmap acts inachain. Itchecks if the current individuals satises the stoppingcondi-tion. This is useful, for instance, when a lower limit target ofthetnessvalueisknown. Ifatleastoneindividualgivesapositive answer to the test, the event is notied to the otherphases andislands byaagstoredinHDFS. This avoidstheexecutionsofthenextphasesandgenerations;Selection: if the stopping conditionhas not been satisedyet, thisisthemomenttochoosetheindividualsthatwillbetheparentsduringthecrossover forthenextiteration.The users can dene this phase in their own algorithms. Thecoupleswhichhavebeenselectedareallstoredinthekey:map : (individualF, null) (coupleinformation, individualF)If anindividual hasbeenchosenmorethanonetime, itisreplicated. Thenall theindividuals, includingthosenotchosen if the elitismis active, leave the current machine andgotothecorrespondentReducerforthenextstep;Crossover: inthis phase, individuals are groupedbythecouplesestablishedduringtheselection. TheneachRe-ducerapplies the criteria dened by the user and makes thecrossover:reduce : (coupleinformation, list (individualF)) (individual, true)Thisproduces theospring(markedwiththevaluetruein the value eld) that is read during the next step, togetherwiththepreviouspopulation;Mutation: during this phase, the chained Mappersman-agetomakethemutationofthegenesdenedbytheuser.Onlytheospringcanbemutated:map : (individual, true) (individualM, null)Elitism: inthelastphase. Iftheuserchoosestousetheoptional elitism, thedenitivepopulationischosenamongthe individuals of the ospring and the previous population:map : (individual, null) (individual, null)At this point, the islands are readytobe writtenintoHDFS.The architecture of the generator component provides twolevelsofabstraction:Therstone, whichiscalledthe core level, allowsthewholejobtowork;The second one, which is called the user level, allowstheusertodevelophisownGeneticAlgorithmandtoexecuteitontheCloud.The core is the base of the frameworkwithwhichtheend-userdoesnotneedtointeract. Indeed, thefactthataMapReducejob is executed is totally invisible to the user. Itconsistsof everythingthatisneededtorunanapplicationonHadoop. Thenal usercandevelophisownGAsimplyby implementing the relative classes, without having to dealwith mapor reducedetails. If the user does not extend theseclasses, a simple behaviour is implemented by doing nothingelsethanforwardingtheinputasoutput. Theframeworkalsomakessomedefaultclassesavailable,thustheusercanusethemformostcases.TheTerminator component plays tworoles indierenttimes: aftertheexecutionof everygenerationjob, bycall-ingthemethodsof theTerminator classonthesamema-chine where the Driveris running; during the generator job,throughtheuseoftheTerminationCheckMapper. Itchecksif thestoppingconditionshaveoccurred, i.e. thecountofthemaximumnumber of generations has beenreachedorat least one individual has beenmarkedof satisfyingthestoppingconditionduring the most recent generation phase.Thecountismaintainedbystoringalocalcountervariablethatisupdatedaftertheendofeachgeneration. Thecheckfor the presence of marked individuals is done by looking forpossible ags in HDFS. If it terminates, the execution of theSolutionsFiltercomponentwilleventuallyfollow.TheInitialiser computesaninitial populationaccordingtothedenitionoftheuser.AnotheroptionalcomponentistheMigrator whichshuf-es individuals among the islands according to the denitionof theuser. Itisatthesametimealocal componentandajobexecutedonthecluster. ItisstartedbytheDriveraccordingtoaperiodcounter.Thecomponent SolutionsFilter isinvokedonlywhenatleastoneindividual hasprovokedtheterminationbysatis-fying the stopping condition. It simply lters the individualsof the last population by dividing those that satisfy the con-ditionfromthose whichdonot. More details about theproposedframeworkcanbefoundin[4].4. PRELIMINARY ANALYSISMany software engineering problems involve predictionandclassicationtasks(e.g. eortestimation[9] andandfault prediction [10]). Classication consists of learning fromexample data, calledtrainingdataset, withthepurposeof properlyclassifyinganynewinput data. The trainingdatasetincludes a list of records, also calledinstances, con-sistingofalistof attributes (features). Several classiersexist, amongthemC4.5 builds aDecisionTree, whichisabletogivealikelyclass of membershipfor everyrecordinanewdataset. Theeectivenessof theclassiercanbemeasuredbytheaccuracy of theclassicationof thenewdata:accuracy =correctclassicationstotal ofclassicationsFeature Subset Selection(FSS) helps to improve predic-tion/classication accuracy by reducing noise in the trainingdataset [6],selectingtheoptimalsubsetoffeatures. Never-theless the problem is NP-hard. GAscan be used to identifyanear-optimalsubsetoffeaturesinatrainingdataset. Theframework was exploited in order to develop a GA to addressthe problem. For the development of the application of FSS,only535linesofcodewerewrittenagainstthe3908oftheframeworkinfrastructure. Intermsof theframework, theDriveristhemainpartofthealgorithmandisexecutedononemachine. Itneedssomeadditional informationbeforeexecution: theparametersfortheGAenvironment,suchasthe number of individuals in the initial population, the max-imumnumberofgenerations,etc.;thetrainingdataset;thetestdataset.Everyindividual (subset) is encodedas anarrayof mbit, whereeachbitshowsif thecorrespondingenumeratedattributeispresentintothesubset(value1)ornot(value0). Duringeachgeneration, everysubset is evaluatedbycomputingtheaccuracy valueandall theGAs operationsareapplieduntil target accuracy isachievedor maximumnumber of generations is reached, according to the followingsteps:Fitness: for each subset of attributes, the training datasetisreducedwithrespectoftheattributesinthesubset. Thetnessvalueiscomputedbyapplyingthesteps:1. BuildtheDecisionTree throughtheC4.5 algorithmand computing the accuracyby submitting the currentdataset;2. The operations are repeatedaccordingtothe cross-validationfoldsparameter;3. Themeanofaccuraciesisreturned.4.1 SubjectTheChicago Crimedataset (fromthe UCI MachineLearningRepositorywasused. Thedatasethas13featuresand 10000 instances. It was divided into two parts: the rst60 %astrainingdataset andthelast40 %astestdataset.Thetestbenchwascomposedbythreeversions:Sequential: asinglemachineexecutesaGAversionsimilartotheframeworkonebyusingtheWekaJavalibrary,fortheUniversityofWaikato;Pseudo-distributed: heretheframeworkversionofthealgorithmisused. Itisexecutedonasinglema-chineagain, settingthenumberof islandstoone. ItrequiresHadoopinordertobeexecuted;Distributed: theframeworkisexecutedonaclusterof computersonHadoopplatform. Of course, thisistheobjectofinterest.All the versions runonthe same remote AmazonEC2clusterandall themachinesinvolvedhadthesamecong-uration, inorder togiveastrongfactor of fairness. Thisconguration is named by Amazonas m1.large, a machinewith2CPUs,7.5 GBofRAMand840 GBforstorage.Thesequential versionwasexecutedonasinglemachineandalsothepseudo-distributed,butoveraHadoopinstalla-tion. Instead, thedistributedversionneededafull Hadoopclustercomposedby1masterand4computingslaves.Thechosencongurationforthetestconsistedof10gen-erations, amigrationperiodof 5with10migrants. Itwasspecied as a stopping conditiononly themaximum numberofgenerations.Thenalaccuracyiscomputedontestdataset usingtheclassierbuiltwiththebestindividualidentied.4.2 ResultsAs for the accuracy, the results were 89.55 % for sequential,89.40 %for pseudo-distributed and89.45 %for distributedversion. Thisconrmsthattheapplicationachievesitsob-jective,bygivingasubsetthathasbothareducednumberofattributesandasuitableaccuracyvalue.As for the running time, the distributed version wonagainst other versions, with 1384 s against 2993 s for pseudo-distributed and2179 s for sequential. It is clear that thedistributed version must face some considerable intrinsicHadoop bottlenecks suchas map initialisation, dataread-ing, copying and memorisation etc. and it has an additionalphase of Migration not present inthe sequential version.Theproposedframeworkdoesnotconsidereverysingleas-pectofHadoopandthissuggestthatafurtheroptimisationmightincreasetheperformanceoftheparallelversioncom-paredwiththe sequential one. It is predictable that thedistributedversionperformsbetterwhentheamountofre-quired computation is large [4]. Since the pseudo-distributedis second to the distributed, this suggests that is worth split-ting the work among multiple nodes. More details about theanalysiscarriedoutcanbefoundin[4].5. CONCLUSIONS AND FUTURE WORKThisworkproposedaframeworkforGeneticAlgorithmswith the aim of simplifying the development of Genetic Algo-rithmson Cloud. The obtained results showed an interestingeort ratio between the number of lines of codefor the appli-cationagainsttheonesfortheframework. Thepreliminarytests about the performance conrmed the natural tendencyofGAs toparallelisationbutthethreatofoverheadisstillaroundthecorner. Indeed, theuseofparallelisationseemstobeworthonlyinpresenceof computationallyintensiveandnottrivialproblems[4]butscalabityissuesneedtobedeeper addressed. Because of the use of specic components(e.g. MapTasks andReduceTasks chains), itdoesnotseemunreasonabletosuggest that givingan ad-hoc optimisa-tionofHadoopmightimprovetheperformanceandthereisalsoapossibilitythatthechangeofmodelofparallelisation[3]couldgivebetterresults.6. REFERENCES[1] D.Cutting.Datainteroperabilitywithapacheavro.ClouderaBlog,2011.[2] L.DiGeronimo,F.Ferrucci,A.Murolo,andF.Sarro.Aparallelgeneticalgorithmbasedonhadoopmapreducefortheautomaticgenerationofjunittestsuites.InSoftwareTesting,VericationandValidation(ICST),2012IEEE5thInternational Conferenceon,pages785793.IEEE,2012.[3] S.DiMartino,F.Ferrucci,V.Maggio,andF.Sarro.TowardsMigratingGeneticAlgorithmsforTestDataGenerationtotheCloud,chapter6,pages113135.IGIGlobal,2012.[4] F.Ferrucci,M.-T.Kechadi,P.Salza,andF.Sarro.Aframeworkforgeneticalgorithmsbasedonhadoop.arXivpreprintarXiv:1312.0086,2013.[5] D.E.Goldberg.GeneticAlgorithmsinSearch,OptimizationandMachineLearning.Addison-WesleyLongmanPublishingCo.,Inc.,1989.[6] M.A.Hall.Correlation-basedFeatureSelectionforMachineLearning.PhDthesis,TheUniversityofWaikato,1999.[7] M.HarmanandB.F.Jones.Search-basedsoftwareengineering.InformationandSoftwareTechnology,43(14):833839,2001.[8] C.Jin,C.Vecchiola,andR.Buyya.Mrpga: Anextensionofmapreduceforparallelizinggeneticalgorithms.InE-Science(e-Science),2008IEEE4thInternational Conferenceon,pages214221.IEEE,2008.[9] F.Sarro,S.DiMartino,F.Ferrucci,andC.Gravino.Afurtheranalysisontheuseofgeneticalgorithmtoconguresupportvectormachinesforinter-releasefaultprediction.InAppliedComputing(SAC),2012ACM27thSymposiumon,pages12151220.ACM,2012.[10] F.Sarro,F.Ferrucci,andC.Gravino.Singleandmultiobjectivegeneticprogrammingforsoftwaredevelopmenteortestimation.InAppliedComputing(SAC),2012ACM27thSymposiumon,pages12211226.ACM,2012.[11] A.Verma,X.Llor` a,D.E.Goldberg,andR.H.Campbell.Scalinggeneticalgorithmsusingmapreduce.InIntelligentSystemsDesignandApplications(ISDA),20099thInternationalConferenceon,pages1318.IEEE,2009.[12] T.White.Hadoop: TheDenitiveGuide.OReiily,thirdedition,2012.