A Parallel Architecture for the Generalized Traveling Salesman Problem Max Scharrenbroich AMSC 664 Final Presentation 05/12/2009 Advisor: Dr. Bruce L

1

A Parallel Architecture for the Generalized Traveling Salesman Problem

Max ScharrenbroichAMSC 664 Final Presentation 05/12/2009Advisor: Dr. Bruce L. GoldenR. H. Smith School of Business

2

Presentation Overview

• Background and Review– GTSP Review– Review of Project Objectives

• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture• Final Results• Status Summary

3

Review of the GTSP

• The Generalized Traveling Salesman Problem (GTSP):– Variation of the well-known traveling salesman

problem.– A set of nodes is partitioned into a number of clusters.– Objective: Find a minimum-cost tour visiting exactly

one node in each cluster.– Example on the following slides…

4

GTSP Example

• Given a set of locations (nodes).

5

GTSP Example (continued)

• Partition the nodes into clusters.

1

2

3

4

5

6

6

GTSP Example (continued)

• Find the minimum tour visiting each cluster.

1

2

3

4

5

6

Algorithms >

7

Algorithms for the GTSP

• Like the TSP, the GTSP is NP-hard.• There exist exact algorithms that rely on smart

enumeration techniques:– Brand-and-cut (B&C) algorithm (M. Fischetti, 1997)– Provided exact solutions to reasonably sized GTSP

problems (48 ≤ n ≤ 442 and 10 ≤ m ≤ 89 ).– For problems with larger than 90 clusters the run

time of the B&C algorithm began approaching one day.

Heuristic Algorithms >

8

Algorithms for the GTSP (continued)

• Heuristic approaches to the GTSP:– Generalized Nearest Neighbor Heuristic (C.E.

Noon, 1988)– Random-key Genetic Algorithm (L. Snyder and M.

Daskin, 2006)– mrOX Genetic Algorithm (J. Silberholz and B.L.

Golden, 2007)*

Project Objectives >

9

Review of Project Objectives

• Develop a generic software architecture and framework for parallelizing serial heuristics for combinatorial optimization (in a minimally invasive way).

• Extend the framework to host the serial mrOX GA and the GTSP problem class.

• Investigate the performance of the parallel implementation of the mrOX GA on large instances of the GTSP (> 90 clusters).

Overview >

10


• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture• Final Results• Status Summary

11

Review of mrOX GA

Pop 1

Merge

…

Start

End

Pop 2

Pop 7

Post-Merge

Initialization Phase (Light-Weight):• Sequentially generates and evolves 7 isolated populations of 50

individuals each.• Only use the rOX for crossover.• Apply one cycle of 2-opt followed by 1-swap improvement heuristic

only to new best solution.• 5% chance of mutation.• Terminate each population after no new best solution is found for 10

consecutive generations.

Post-Merge Phase:• Use the full mrOX for crossover.• Apply full cycles of 2-opt followed by 1-swap improvements until no improvements

are found to:1. Child solutions that are better than both parents.2. Random 5% of population (preserve diversity).

• 5% chance of mutation.• Terminate after no new best solution is found for 150 consecutive generations.

Select the best 50 individuals from the union of the isolated populations to continue on.

Overview >

12


• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA– Concurrent Exploration– Cellular GA Inspired Parallel Cooperation

• Parallel Architecture• Final Results• Status Summary

13

Type 3 Parallelism in the mrOX GA• Genetic algorithms are amenable to parallelism via concurrent

exploration.• Cooperation between processes (migration) can be implemented

to ensure diversity while maintaining intensification.

P1 P2 Pn…

start

end

P1 P2 Pn…

P1 P2 Pn

start

end

P1 P2 Pn

P1 P2 Pn

MigrationPeriod

Parallel Cooperation w/ Mesh >

14

Parallel Cooperation with Mesh Topology

• Inspired by cellular genetic algorithms (cGAs), where individuals in a population only interact with nearest neighbors.

• Processes cooperate over a toroidal mesh topology.• Ensures diversity while maintaining intensification.

Each process has four neighbors.

Processes periodically exchange the best solutions with neighbors.

High-quality solutions diffuse through the population.

Overview >

15


• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture– Overview of Parallel Architecture– Parallel mrOX GA

• Final Results• Status Summary

16

Overview of Parallel Architecture• Processes are arranged as nodes in a grid topology.• Each node runs iterations of a serial stochastic search algorithm

(e.g. mrOX GA).• A node updates its state after each iteration.• The parallel architecture uses the MPI Cartesian coordinate

communication topology to send state updates from a node to its nearest neighbors.

• All communications are handled off of the main processing thread to prevent blocking.

• All state updates are handled periodically (e.g. 100 updates/sec).• Updates are only sent to neighbors if the node’s state has

changed (limits unnecessary I/O).

Parallel mrOX GA. >

17

Parallel mrOX GA

Overview >

Pop 1

Merge

…

Start

End

Pop 7

Post-Merge

• The initialization phase is the same as in the serial case.• There is no cooperation among processes in this phase.

• After each iteration of the mrOX GA, a node updates its state with the best solution found thus far.

• Migrate: After K generations with no improvement a node incorporates its neighbor’s solutions if they are better than the worst in the population.

• An MxN grid of processes is started with mpirun.• Random seeds are broadcast to each process.

• The parallel mrOX GA terminates when all processes have had no improvements after 150 generations.

18


• Background and Review• Review of mrOX GA• Parallelism in the mrOX GA• Parallel Architecture• Final Results– Performance of parallel mrOX GA

• Status Summary

19

Migration Parameter in Parallel mrOX GA• Vary the migration parameter for three problem instances.

Objective Value Performance >

0 20 40 60 80 100 1200.000.050.100.150.200.250.300.350.400.450.50 Solution Quality vs. Migration Parameter: 5x5 Grid, Population Size = 50

132d657157rat783207si1032

Migration Parameter

% F

rom

Opti

mal

0 20 40 60 80 100 1200.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00Solution Time vs. Migration Parameter: 5x5 Grid, Population Size = 50

132d657157rat783207si1032

Migration Paramter

Tim

e (s

ec)

Results averaged over 20 runs.Quad Core AMD Opteron (8360 SE), 2511 MHz, 512K Cache.


20

• Objective value performance of cooperative vs. non-cooperative parallel mrOX GA on five problem instances.

Speedup Performance >

Performance of Parallel mrOX GA

132d657 157rat783 207si1032 217vm1084 421d21030.00

0.20

0.40

0.60

0.80

1.00

1.20

1.40

1.60

Cooperative vs. Non-Cooperative Parallel Implementation: # Processors = 32, Population Size = 50, Migrate = 10

No Co-op

4x8 Grid, Migrate = 10

Problem Instance

% F

rom

Opti

mal


21

Performance of Parallel mrOX GA• Speedup performance of cooperative vs. non-cooperative parallel mrOX GA on five

problem.

Vary Grid Size >

132d657 157rat783 207si1032 217vm1084 421d21030.90

1.10

1.30

1.50

1.70

1.90

2.10

Cooperative vs. Non-Cooperative Parallel Implementation: # Processors = 32, Population Size = 50, Migrate = 10

No Co-op

4x8 Grid, Migrate = 10

Problem Instance

Spee

dup


22

Performance of Parallel mrOX GA• Performance of parallel mrOX GA using different grid configurations.

Speedup vs. Solution Quality >

1x32 2x16 4x8 No Co-op0.000.020.040.060.080.100.120.140.16

Grid Shape vs. Solution Quality: Population Size = 50, Migrate = 10207si1032217vm1084

Grid Shape

% F

rom

Opti

mal


1x32 2x16 4x8 No Co-op1.00

1.20

1.40

1.60

1.80

2.00

2.20Grid Shape vs. Speedup: Population Size = 50, Migrate = 10

207si1032

217vm1084

Grid Shape

Spee

dup


23


Vary # Processors >

• Tradeoff between speedup and solution quality of parallel mrOX GA by varying grid shape.

1.00 1.20 1.40 1.60 1.80 2.00 2.200.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

Speedup vs. Solution Quality: # Processors = 32, Population Size = 50, Migrate = 10

207si1032

217vm1084

Speedup

% F

rom

Opti

mal


24


Vary # Processors >

• Percentage of runs where optimum was found.

132d657 157rat783 207si1032 217vm1084 421d21030

10

20

30

40

50

60

70

80

90

% of Runs where Optimum was Found for Different Grid Shapes

no co-op

1x32

2x16

4x8

Problem Instance

% o

f Run

s O

ptim

um w

as F

ound

Percentages taken from 20 runs.

25


Conclusions >

• Number of processors vs. solution quality of parallel mrOX GA (all grids are square except for 32 processor case).

• Single processor case is equivalent to the serial version.

0 5 10 15 20 25 30 350.00

0.50

1.00

1.50

2.00

2.50

# Processors vs. Solution Quality132d657157rat783207si1032

# Processors(all grids are square except for 32 processor case)

% F

rom

Opti

mal


26

Conclusions

• Cooperation among processes improves both the solution quality and convergence time of the parallel implementation.

• Changing the grid shape affects the diffusion of solutions through the process population:– Thinner grids lead to slower convergence and improved

solution quality.– Square grids give the fastest convergence but at the cost of

solution quality.

Status Summary >

27

Status Summary• October 16-30: Start design of the parallel architecture.• November: Finish design and start coding and testing of the parallel

architecture.• December and January: Continue coding parallel architecture and

extend the framework for the mrOX GA algorithm and the GTSP problem class.

• February 1-15: Begin test and validation on multi-core PC.• February 16-March 1: Move testing to Deepthought cluster*.• March: Perform final testing on full data sets and collect results.• April-May: Generate parallel architecture API documentation**, write

final report.* Was not pursued because of success on 32-core genome5.** Future work.

References >

28

References• Crainic, T.G. and Toulouse, M. Parallel Strategies for Meta-Heuristics. Fleet Management and

Logistics, 205-251. • L. Davis. Applying Adaptive Algorithms to Epistatic Domains. Proceeding of the International

Joint Conference on Artificial Intelligence, 162-164, 1985.• M. Fischetti, J.J. Salazar-Gonzalez, P. Toth. A branch-and-cut algorithm for the symmetric

generalized traveling salesman problem. Operations Research 45 (3): 378–394, 1997.• G. Laporte, A. Asef-Vaziri, C. Sriskandarajah. Some Applications of the Generalized Traveling

Salesman Problem. Journal of the Operational Research Society 47: 1461-1467, 1996.• C.E. Noon. The generalized traveling salesman problem. Ph. D. Dissertation, University of

Michigan, 1988.• C.E. Noon. A Lagrangian based approach for the asymmetric generalized traveling salesman

problem. Operations Research 39 (4): 623-632, 1990.• J.P. Saksena. Mathematical model of scheduling clients through welfare agencies. CORS

Journal 8: 185-200, 1970.• J. Silberholz and B.L. Golden. The Generalized Traveling Salesman Problem: A New Genetic

Algorithm Approach. Operations Research/Computer Science Interfaces Series 37: 165-181, 2007.

• L. Snyder and M. Daskin. A random-key genetic algorithm for the generalized traveling salesman problem. European Journal of Operational Research 17 (1): 38-53, 2006.

29

Acknowledgements

Chris Groer, University of MarylandWilliam Mennell, University of MarylandJohn Silberholz, University of MarylandAleksey Zimin, University of Maryland

Documents

A Parallel Architecture for the Generalized Traveling Salesman Problem Max Scharrenbroich AMSC 664 Final Presentation 05/12/2009 Advisor: Dr. Bruce L