Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Multi-GPU Island-Based Genetic Algorithm for Solving the Knapsack Problem
Jiri Jaros
1 Overview
The Genetic Algorithms (GAs) have become widely applied
optimization tools since their development by Holland in 1975 [1].
One of the famous NP-hard problems successfully solved by
GAs is the knapsack. However, millions of candidate solutions
have to be created and evaluated for large problem instances
rising the execution time up to hours and days [2]. The latest
GPUs are about 15 times faster than six-core Intel CPUs which
opens new possibilities for massive acceleration of GAs [3].
2 Multi-GPU Cluster Systems
The availability of multiple PCI-Express buses, even on very low
cost commodity computers, means that it is possible to construct
cluster nodes with multiple GPUs. Inter-node communications
are done via MPI over a high speed network while intra-node
communications exploit CPU shared memory.
3 Multi-GPU Island-Based Genetic Algorithm
The population of the GA is distributed over multiple GPUs.
Every GPU, controlled by a single MPI process, entirely evolves
a single island. Migration of individuals occurs after a predefined
number of generations exchanging the best local solution and an
optional number of randomly selected individuals over a ring
topology.
4 Local GPU Island Implementation Details [4]
• 32 knapsack items packed into a single integer value
• Individuals processed by CUDA WARPS in multiple rounds
• Most CUDA block barriers removed
• Negligible thread divergence ( < 0.5%)
• Blocks of knapsack data shared within the block
• Uniform crossover, bit-flip mutation, binary tournament
5 Experimental Results
• Highly optimized CPU implementation running on 4 6-core
Intel Xeon processors with 40Gb infiniband interconnection
• CUDA implementation running on 14 NVIDIA GTX 580
Knapsack problem with 10,000 items
6 Conclusions
The proposed multi-GPU island-based GA allows the solution of
large-scale instances of the knapsack problem.
The significant benefits:
• Speedups up to 35, 194, 781 (14 GPUs vs. 24, 6, 1 cores)
• Overall performance of 5.67 TFLOPS (14 GPUs)
• Overall efficiency of 26%
The codes will be released as an open-source software
(http://www.fit.vutbr.cz/~jarosjir).
[1] J. H. Holland, “Adaptation in Natural and Artificial Systems”, Ann Arbor, no. 53. University of Michigan Press, 1975, p. 211
[2] Z. Michalewicz and J. Arabas, “Genetic algorithms for the 0/1 knapsack problem”, in Lecture Notes in Computer Science, 1994, vol. 869/1994, 134-143
[3] V. W. Lee et al., “Debunking the 100X GPU vs. CPU myth,” in Proceedings of the 37th annual international symposium on Computer architecture - ISCA ’10, 2010, p. 451
[4] J. Jaros and P. Pospichal, “A Fair Comparison of Modern CPUs and GPUs Running the Genetic Algorithm under the Knapsack Benchmark”, in Applications of Evolutionary Computation, Heidelberg, DE, Springer, 2012, p. 426-435
College of Engineering and Computer Science, Australian National University
This research has been partially supported by the
research grant "Natural Computing on
Unconventional Platforms", GP103/10/1517,
Czech Science Foundation (2010-13).
Evaluate the local GPU island
Select emigrants in the local GPU island
Transfer the emigrants from GPU to CPU and then over
network
Receive immigrants from network by CPU and upload them to
GPU
Incorporate the immigrants into the local GPU island
MPI
process
2820
2825
2830
2835
2840
128 256 512 1024 2048
Fitn
ess
val
ue
x1
00
0
Individuals per local island
Solution quality
1 GPU 6 GPUs7 GPUs 12 GPUs14 GPUs
0
20
40
60
80
100
120
140
128 256 512 1024 2048
Exe
cuti
on
tim
e [
s]
Individuals per local island
Total execution time
1 GPU 6 GPUs 7 GPUs 12 GPUs 14 GPUs
0
10
20
30
40
50
60
128 256 512 1024 2048
Spe
ed
up
vs.
sin
gle
-th
read
CP
U
Individuals per local island
Speedup on a single island reached by multicore CPU and GPU
1xGPU
2x6 CPU threads
6 CPU threads
048
1216202428323640
128 256 512 1024 2048
Spe
ed
up
vs.
4 6
-co
re X
eo
ns
Individuals per local island
GPU Speedup vs. 4 6-core Xeons
1 GPU
6 GPUs
7 GPUs
12 GPUs
14 GPUs
Inte
rcon
ne
ction
Netw
ork