1
Parallelization of the Grid Search Method using MATLAB The voyage planning problem is the problem of finding the best path from point A to point B. This journey can be divided into multiple stages with each stage containing a number of possible spatial positions (nodes) for the vehicle in question. Figure 1: A downscaled schematic of the voyage algorithm. A dot represents a possible spatial position within a stage and the arrows represent the calculations made between each node in the current and the next step The best paths between the first node and the last node is decided by the properties of the paths between the nodes in neighboring stages and is calculated by an algorithm written in MATLAB provided by ABB. Our task is to improve the parallel performance of this algorithm. In order to measure the performance of any change to the algorithm 4 problem sets were given with different routes and distances. Using these ABB had found that the speed-up levels off at approximately 10 workers. A profiler was run on the serial version of the algorithm. It found that approximately 85% of the execution time was spent exclusively measuring the time spent traveling from all the nodes in the current stage to all the nodes in the next. Unnesting The algorithm came with an initial parallelization attempt by ABB, which parallelizes the algorithm over the nodes, sending one node and the measurements that come with it to each worker. This is problematic, as the performance of this approach is determined by the characteristics of the problem that is to be solved. We introduce two load balancing methods; each using the newly developed model in combination with data on how many dead ends each node contains. The first method sorts the tasks in descending order based on the expected running time and gives each processor a set number of tasks from this sorted list. When a processor is done with it's tasks it gets new tasks. The idea is to avoid having one of the most resource-heavy tasks left at the end. The other method tries to solve the Multiprocessor Scheduling Problem using an improved version of the greedy algorithm called Longest Running Time Algorithm on the sorted list. This means that, in theory, all processors should have approximately the same running time. Figure 3: This is the speedup comparison of problem set 4. Results Above are the results from the experiments concerning the largest of the problem sets. The speed-up is compared to a serial version of the algorithm supplied by ABB. Apart from the original code (in blue) the speed-up differs only slightly. The best one seems to be the unnested version that uses parfor and persistent variables. This is most likely owing to the fact that parfor has the ability to broadcast variables and thus reducing the communication time. ___________________________________________________ As none of the example problems given contained more than 17 nodes in any stage, there is a hard limit for the speedup when approaching 17 workers. Another approach is to parallelize over the calculations of the node-to-node voyage instead, since they are independent of each other. This removes the nodally bound limitation but increases the overhead penalty, as there are now many more parallel tasks to perform. Persistent Variables The same function is called each time a node is evaluated, and some of its input arguments are the same each time. In order to send these multiple times, they were declared persistent. This means that once the values are loaded into a processors memory they stay there until removed. Load Balancing The automatic load balancing in MATLAB for tools such as parfor is decent, but allows for no input from the user. That is why we wrote our own load balancing method. The execution time of ABB's path evaluation function was measured for all stages for all four of our problem sets. This function is atomic and should not be further parallelized. The result for problem set 4 can be seen below. For each stage there are oscillations. We wish to develop a model that catches their behavior. The model was one dimensional and only meant to describe the behavior along the node- axis. A linear combination of basis functions was used as a model. The basis functions were normal distributions, and the parameters of the basis were found using an evolutionary algorithm with problem set 2 as input data. Figure 2: The execution time of the atomic function for all stages and node-to-node combination. Problem set 4. Anton Sundin Viktor Wase Supervisor: Kateryna Mishchenko The parallel performance of a MATLAB implementation of a Grid Search Algorithm is investigated and improved. The major performance obstacles are found to be the need for large amounts of data transfer as well as some structural issues in the algorithm. A load balancing model specific to the characteristics of the problem is produced. Viktor Wase Anton Sundin

Parallelization of the Grid Search Method using MATLAB · Parallelization of the Grid Search Method using MATLAB ... algorithm written in MATLAB provided by ABB. ... time and gives

  • Upload
    vannhu

  • View
    231

  • Download
    0

Embed Size (px)

Citation preview

Parallelization of the Grid Search Method using MATLAB

The voyage planning problem is the problem of finding the best path from point A to point B. This journey can be divided into multiple stages with each stage containing a number of possible spatial positions (nodes) for the vehicle in question.

Figure 1: A downscaled schematic of the voyage algorithm. A dot represents a possible spatial position within a stage and the arrows represent the calculations made between each node in the current

and the next step

The best paths between the first node and the last node is decided by the properties of the paths between the nodes in neighboring stages and is calculated by an algorithm written in MATLAB provided by ABB.

Our task is to improve the parallel performance of this algorithm. In order to measure the performance of any change to the algorithm 4 problem sets were given with different routes and distances. Using these ABB had found that the speed-up levels off at approximately 10 workers. A profiler was run on the serial version of the algorithm. It found that approximately 85% of the execution time was spent exclusively measuring the time spent traveling from all the nodes in the current stage to all the nodes in the next.

Unnesting

The algorithm came with an initial parallelization attempt by ABB, which parallelizes the algorithm over the nodes, sending one node and the measurements that come with it to each worker. This is problematic, as the performance of this approach is determined by the characteristics of the problem that is to be solved.

We introduce two load balancing methods; each using the newly developed model in combination with data on how many dead ends each node contains. The first method sorts the tasks in descending order based on the expected running time and gives each processor a set number of tasks from this sorted list. When a processor is done with it's tasks it gets new tasks. The idea is to avoid having one of the most resource-heavy tasks left at the end.

The other method tries to solve the Multiprocessor Scheduling Problem using an improved version of the greedy algorithm called Longest Running Time Algorithm on the sorted list. This means that, in theory, all processors should have approximately the same running time.

Figure 3: This is the speedup comparison of problem set 4.

Results

Above are the results from the experiments concerning the largest of the problem sets. The speed-up is compared to a serial version of the algorithm supplied by ABB.

Apart from the original code (in blue) the speed-up differs only slightly. The best one seems to be the unnested version that uses parfor and persistent variables. This is most likely owing to the fact that parfor has the ability to broadcast variables and thus reducing the communication time. ___________________________________________________

As none of the example problems given contained more than 17 nodes in any stage, there is a hard limit for the speedup when approaching 17 workers. Another approach is to parallelize over the calculations of the node-to-node voyage instead, since they are independent of each other. This removes the nodally bound limitation but increases the overhead penalty, as there are now many more parallel tasks to perform.

Persistent VariablesThe same function is called each time a node is evaluated, and some of its input arguments are the same each time. In order to send these multiple times, they were declared persistent. This means that once the values are loaded into a processors memory they stay there until removed.

Load BalancingThe automatic load balancing in MATLAB for tools such as parfor is decent, but allows for no input from the user. That is why we wrote our own load balancing method.

The execution time of ABB's path evaluation function was measured for all stages for all four of our problem sets. This function is atomic and should not be further parallelized. The result for problem set 4 can be seen below. For each stage there are oscillations. We wish to develop a model that catches their behavior. The model was one dimensional and only meant to describe the behavior along the node-axis.

A linear combination of basis functions was used as a model. The basis functions were normal distributions, and the parameters of the basis were found using an evolutionary algorithm with problem set 2 as input data.

Figure 2: The execution time of the atomic function for all stages and node-to-node combination. Problem set 4.

Anton Sundin

Viktor Wase

Supervisor:

Kateryna Mishchenko

The parallel performance of a MATLAB implementation of a Grid Search Algorithm is investigated and improved. The major performance obstacles are found to be the need for large amounts of data transfer as well as some structural issues in the algorithm. A load balancing model specific to the characteristics of the problem is produced.

Viktor Wase

Anton Sundin