31
Xavier Llorà National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Urbana, Illinois, 61801 [email protected] http://www.ncsa.illinois.edu/~xllora Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

Embed Size (px)

DESCRIPTION

Data-intensive computing has positioned itself as a valuable programming paradigm to efficiently approach problems requiring processing very large volumes of data. This paper presents a pilot study about how to apply the data-intensive computing paradigm to evolutionary computation algorithms. Two representative cases (selectorecombinative genetic algorithms and estimation of distribution algorithms) are presented, analyzed, and discussed. This study shows that equivalent data-intensive computing evolutionary computation algorithms can be easily developed, providing robust and scalable algorithms for the multicore-computing era. Experimental results show how such algorithms scale with the number of available cores without further modification.

Citation preview

Page 1: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Xavier Llorà

National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignUrbana, Illinois, 61801

[email protected]://www.ncsa.illinois.edu/~xllora

Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre

Page 2: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Outline

• Data-intensive computing and HPC?• Is this related at all to evolutionary computation?• Data-intensive computing with Meandre• GAs and competent GAs• Data-intensive computing for GAs

Page 3: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

2 Minute HPC History

• The eighties and early nineties picture • Commodity hardware rare, slow, and costly• Supercomputers were extremely expensive• Most of them hand crafted and only few units• Two competing families

• CISC (e.g. Cray C90 with up to 16 processors)• RISC (e.g. Connection Machine CM-5 with up 4,096 processors)

• Late nineties commodity hardware hit main stream• Start becoming popular, cheaper, and faster• Economy of scale• Massive parallel computers build from commodity components become a

viable option

Page 4: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Two Visions

• C90 like supercomputers were like a comfy pair of trainers • Oriented to scientific computing• Complex vector oriented supercomputers• Shared memory (lots of them)• Multiprocessor enabled via some intercommunication networks• Single system image

• CM5 like computers did not get massive traction, but a bit• General purpose (as long as you can chop the work in simple units)• Lots of simple processors available• Distributed memory pushed new programming models (message passing)• Complex interconnection networks

• NCSA have shared memory, distributed memory, and gpgpu based

Page 5: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Miniaturization Building Bridges

• Multicores and gpgpus are reviving the C90 flavor• The CM-5 flavor now survives as distributed clusters of not so

simple units

Page 6: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Control Models of Parallelization in EC

Run 2 Run 6

Run 3 Run 7

Run 1 Run 5

Run 10

Run 11

Run 9 Master

Slave

Individual Evaluation

Slave Slave

Migration

Page 7: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

But Data is also Part of the Equation

• Google and Yahoo! revived an old route• Usually refers to:

• Infrastructure• Programming techniques/paradigms

• Google made it main stream after their MapReduce model• Yahoo! provides and open source implementation

• Hadoop (MapReduce)• HDFS (Hadoop distributed filesystem)

• Store petabytes reliably on commodity hardware (fault tolerant)• Programming model

• Map: Equivalent to the map operation on functional programming• Reduce: The reduction phase after maps are computed

Page 8: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

n!

i=0

x2 ! reduce(map(x, sqr), sum)

A Simple Example

x x x x

x2 x2 x2 x2

sum

map map map map

reduce reducereduce reduce

Page 9: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Is This Related to EC?

• How can we easily benefit of the current core race painlessly?• NCSA’s Blue Waters estimated may top on 100K• Yes on several facets

• Large optimization problems need to deal with large population sizes (Sastry, Goldberg & Llorà, 2007)

• Large-scale data mining using genetic-based machine learning (Llorà et al. 2007)

• Competent GAs model building extremely costly and data rich (Pelikan et al. 2001)

• The goal?• Rethink parallelization as data flow processes• Show that traditional models can be map to data-intensive computing

models• Foster you curiosity

Page 10: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Data-Intensive Computing with Meandre

Page 11: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

The Meandre Infrastructure Challenges

• NCSA infrastructure effort on data-intensive computing• Transparency

• From a single laptop to a HPC cluster• Not bound to a particular computation fabric• Allow heterogeneous development

• Intuitive programming paradigm • Modular Components assembled into Flows• Foster Collaboration and Sharing

• Open Source• Service Orientated Architecture (SOA)

Page 12: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Basic Infrastructure Philosophy

• Dataflow execution paradigm• Semantic-web driven• Web oriented• Facilitate distributed computing• Support publishing services• Promote reuse, sharing, and collaboration• More information at http://seasr.org/meandre

Page 13: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Data Flow Execution in Meandre

• A simple example c ← a+b• A traditional control-driven language

a = 1b = 2c = a+b

• Execution following the sequence of instructions • One step at a time

• a+b+c+d requires 3 steps • Could be easily parallelized

Page 14: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Data Flow Execution in Meandre

• Data flow execution is driven by data• The previous example may have 2 possible data flow versions

value(a)value(c)+

value(b)

Stateless data flow

value(a) value(c)value(b)

State-based data flow

?+

Page 15: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

The Basic Building Blocks: Components

Component

RDF descriptor of thecomponents behavior

The componentimplementation

Page 16: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Go with the Flow: Creating Complex Tasks

• Directed multigraph of components creates a flow

PushText

PushText

ConcatenateText

To UpperCase Text

PrintText

Page 17: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Automatic Parallelization:Speed and Robustness

• Meandre ZigZag language allow automatic parallelization

PushText

PushText

ConcatenateText

To UpperCase Text

PrintText

To UpperCase Text

To UpperCase Text

Page 18: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

GAs and Competent GAs

Page 19: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Selectorecombinative GAs

1. Initialize the population with random individuals

2. Evaluate the fitness value of the individuals

3. Select good solutions by using s-wise tournament selection

without replacement (Goldberg, Korb & Deb, 1989)

4. Create new individuals by recombining the selected population

using uniform crossover (Sywerda, 1989)

5. Evaluate the fitness valueof all offspring

6. Repeat steps 3-5 until convergence criteria are met

Page 20: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Extended Compact Genetic Algorithm

• Harik et al. 2006

• Initialize the population (usually random initialization)

• Evaluate the fitness of individuals

• Select promising solutions (e.g., tournament selection)

• Build the probabilistic model• Optimize structure & parameters to best fit selected individuals

• Automatic identification of sub-structures

• Sample the model to create new candidate solutions• Effective exchange of building blocks

• Repeat steps 2–7 till some convergence criteria are met

Page 21: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

eCGA Model Building Process

• Use model-building procedure of extended compact GA• Partition genes into (mutually) independent groups• Start with the lowest complexity model• Search for a least-complex, most-accurate model

Model Structure Metric[X0] [X1] [X2] [X3] [X4] [X5] [X6] [X7] [X8] [X9] [X10] [X11] 1.0000

[X0] [X1] [X2] [X3] [X4X5] [X6] [X7] [X8] [X9] [X10] [X11] 0.9933

[X0] [X1] [X2] [X3] [X4X5X7] [X6] [X8] [X9] [X10] [X11] 0.9819

[X0] [X1] [X2] [X3] [X4X5X6X7] [X8] [X9] [X10] [X11] 0.9644

… [X0] [X1] [X2] [X3] [X4X5X6X7] [X8X9X10X11] 0.9273

… [X0X1X2X3] [X4X5X6X7] [X8X9X10X11] 0.8895

Page 22: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Data-Intensive Flows for Competent GAs

Page 23: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Selectorecombinative GA

Page 24: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

ucbps

noit

twrops

eps

soed

sbp

0

5000

10000

15000

20000

sGAs Execution Profile and Parallelization

noitucbps/reducer

ucbps/paralell/3ucbps/paralell/2ucbps/paralell/1ucbps/paralell/0ucbps/mapper

twropseps/reducer

eps/paralell/3eps/paralell/2eps/paralell/1eps/paralell/0eps/mapper

soedsbp

0

5000

10000

15000

20000

• Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.

Page 25: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

eCGA Model Model building

Page 26: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

print_model

greedy_ecga_mb

update_partitions

init_ecga

0

10000

20000

30000

40000

50000

eCGA Execution Profile and Parallelization

print_model

greedy_ecga_mb

update_partitions/reduce

update_partitions/paralell/3

update_partitions/paralell/2

update_partitions/paralell/1

update_partitions/paralell/0

update_partitions/mapper

update_partitions/mapper

update_partitions/mapper

init_ecga

0

10000

20000

30000

40000

50000

• Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.

Page 27: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

●1

23

45

Number of cores

Spee

dup

vs. O

rigin

al e

CGA

Mod

el B

uild

ing

1 2 3 4

eCGA Model Building Speedup

• Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.• Speedup against original eCGA model building

Page 28: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Scalability on NUMA Systems

• Run on NCSA’s SGI Altix Cobalt• 1,120 processors and up to 5 TB of RAM• SGI NUMAlink• NUMA architecture• Test for speedup behavior• Average of 20 independent runs• Automatic parallelization of the partition evaluation• Results still show the linear trend (despite the NUMA)

• 16 processors, speedup = 14.01• 32 processors, speedup = 27.96

Page 29: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Wrapping Up

Page 30: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Summary

• Evolutionary computation is data rich• Data-intensive computing can provide to EC:

• Tap into parallelism quite painless• Provide a simple programming and modeling• Boost reusability• Tackle otherwise intractable problems

• Shown that equivalent data-intensive computing versions of traditional algorithms exist

• Linear parallelism can be tap transparently

Page 31: Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study using Meandre

Xavier Llorà

National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-ChampaignUrbana, Illinois, 61801

[email protected]://www.ncsa.illinois.edu/~xllora

Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre