people.uncw.edupeople.uncw.edu › tagliarinig › Courses › 380 › S2019 pa… · Web viewThe knapsack problem is a combinatorial optimization problem, and has many practical

The 0/1 Knapsack ProblemTeam Algor Rhythm

Alex Bolsoy, Jonathan Suggs, and Casey Wenner

AbstractThe intent of this project is to examine multiple non-trivial algorithms for the knapsack problem. Our project involves testing the effectiveness of simulated annealing, dynamic programming and genetic algorithms. Results will be compared based on value of results and the time and memory consumption of each algorithm.

Key Words: Knapsack, Optimization, Genetic, Simulated annealing, Dynamic programming

1. IntroductionThe 0/1 knapsack problem can be

described as follows; given a hypothetical knapsack with a weight capacity and a set of of items with weights and values find the most valuable combination. The goal is to place items in the knapsack that maximize the sum of the values of the items subject to the weight constraint.

2. Formal Problem StatementGiven a set of n items from 1 to n,

each with a weight wi and value vi, along with a maximum weight capacity W,

where xi represents number of instances of item i to place in knapsack.

= V, where V is the maximum [1]

3. ContextThe knapsack problem, alternatively

the rucksack problem, is an NP-hard combinatorial optimization problem. First coined by Tobiaz Dantzig, the problem dates back to around 1897. [1] The well known problem has been studied extensively.

In a 2004 paper titled “Where are the hard knapsack problems?”, David Pisinger evaluates different instances of the knapsack problem and discusses several more advanced algorithms. Pisinger is a Danish computer science researcher for the University of Copenhagen. [2]

The knapsack problem is a combinatorial optimization problem, and has many practical applications in resource allocation. The computational complexity of the problem is NP-hard. There is no known algorithm that is polynomial in time and gets the correct answer for every instance of the problem.

TestingWe tested the algorithms on both an

uncorrelated instance of the problem and a correlated instance of the problem. The uncorrelated data had no correlation between the value and weight of each item. The correlated data had a positive correlation between the value and weight of each item (in general, higher values had higher weights). Correlated data was retrieved online from David Pisinger’s optimization codes and problem instances. [5] Uncorrelated data was retrieved online from from Kreher and Stinson’s book Combinatorial Algorithms. [9]

Each of these instances contained 100 items with weights and values. The variables we adjusted for were list length, and variable capacity. The list length for both was broken down by increments of 10, and capacity was broken into 10 increments in roughly 100 unit steps.

Each algorithm did each of these 20 tests 1000 times for each data set. Due to the best guess nature of 2 of our 3 algorithms this accounted for deviation. It also was intended to stress test which factors were relevant to each algorithm, item number or capacity.

4. Dynamic ProgrammingA dynamic programming

implementation was the first algorithm that was applied to the 0/1 Knapsack Problem. This algorithm creates a matrix with the values of the items to find the optimal solution. Then, the algorithm backtracks over the matrix to determine which items make up the optimal solution. [3]

The memory requirements to run this algorithm are a multidimensional list of integers of size n*W, where n is the number of items and W is the knapsack capacity. The dynamic programming algorithm finds the optimal solution to the problem every time. This algorithm is relatively easy to understand and implement in code.

The Big-O of this algorithm is O(n*W), where n is the number of items and W is the knapsack capacity. This measurement comes from the function required to build the matrix. This Big-O indicates a pseudo-polynomial time complexity. This means that the time complexity depends on the value of the input instead of number of inputs. [4]

Uncorrelated

Correlated

The experimental results support this Big-O analysis for both the uncorrelated and correlated datasets. In the graphs that include both changing knapsack capacity and changing number of items, a pseudo-polynomial time complexity is evident. The correlation of the items had little to no effect on the performance of the dynamic programming algorithm.

Possible holes in the work of this algorithm was the testing of large datasets. The algorithm should have been tested on larger knapsacks with more items to determine the limits of the algorithm. Eventually, the size of the knapsack or number of items would drastically decrease the effectiveness of the algorithm. This would also be helpful in determining which datasets the simulated annealing algorithm is better suited for.

5. GeneticFor the second approach a genetic

algorithm was applied to the data sets. A genetic algorithm utilizes principles of genealogy and survival of the fittest to produce an ideal solution. For this particular version of a genetic algorithm a population of potential solutions is generated. In order to speed things along the initial population is generated to create solutions with a value

greater than zero.

Then the primary part of the genetic algorithm begins. The population is sorted according to their fitness score, which in this case is their total value. Population members with a weight that exceeds knapsack capacity are given a fitness score of zero. From here there is the cross over, where a percentage of the higher end solutions are placed in a parent pool with a chance for a few of the remaining population to join in for diversity. From here random parents are crossbred where the first half of one solution is combined with the later half of a second. This ‘child’ is then given a chance to mutate. This mutation chance randomly alters some of the values in the solution. The goal of this is to prevent stagnation. Finally all children are added to the parent pool, and this pool becomes the new population/genepool [3][10].

The Big(O) of this genetic algorithm is representative of several independent methods within the algorithm. The fitness checking method, the cross over method, and the mutation method are the primary contributing factors to the algorithms run time. Usually the genetic algorithms big O is represented as follows; “O(P * G * O(Fitness) * ((Pc* O(crossover)) + (Pm * O(mutation)))) P(Population size ),

G(number of Generations), Pc(Crossover probability ), and Pm(Mutation probability )”[11].

For my algorithm it simplifies to the following; O(O(F) * (O(C) + O(M)) where O(F) is the fitness determining method, O(C) is the cross over method and O(M) being the mutation method. O(F) is n, where n is the number of items in the list. O(C) is the crossover algorithm where the largest function is the sorting algorithm is n log(n), since max iterations is constant number of generations doesn’t change with larger datasets.

The largest of these is the crossover method which relies on an n log (n) speed sort that increases based on the number of items, this repeats based on the number of iterations requested. It is the fastest growing component of this particular genetic algorithm so it approaches n log (n) polynomial time.

This is evidenced in the findings. Despite the algorithms overall large run time its proportional rate grows at nlog(n) consistently across the correlated and uncorrelated data sets

Uncorrelated Data; Time Versus List Size

Uncorrelated Data; Time Versus List Size

Correlated Data; Time Versus List Size

Uncorrelated Data; Value Versus Knapsack Capacity

The obtained values for the genetic algorithm varied a fair amount between uncorrelated versus correlated data. The genetic algorithm yields optimal results consistently with the correlated data set.

Correlated data; value versus list size

Correlated data; value versus knapsack capacity

The uncorrelated set of data was much less consistent, meaning the genetic algorithm failed to reach the optimal value consistently.

Uncorrelated data; value versus list size

Uncorrelated data; value versus capacity

This shows that closely correlated data is handled very quickly by my algorithm, and potentially required fewer iterations to reach this max value. A test on the uncorrelated data with a responsive amount of generations that gave the larger values more time.

Uncorrelated Data More generations; Time versus List size

Uncorrelated Data More Generations; Time versus List size

Uncorrelated Data More Generations; Time versus Capacity

Uncorrelated Data More Generations; Value versus Capacity

Overall the complexity and rate of increased values stayed mostly the same, however values did go up significantly. This indicates that given more iterations the uncorrelated data is optimizable by this algorithm.

Downsides to the genetic algorithm include the sheer amount of resources it requires. While the version here ultimately is n log(n) the overall time it takes is deceptively large. This is, in part, due to the fact that the algorithm has no way of determining it has reached the maximum value without already knowing the maximum value. In turn this means there is currently no hard termination limit outside of the arbitrary restriction placed upon it.

However this lines up well with the findings of Hristakeva [3]. The algorithm is

very resource intensive, and all things considered not particularly effective on this problem when compared to non-naive algorithms. Overall results validate the complexity equation, but the algorithm itself is non ideal for the optimization of the knapsack problem.

6. Simulated Annealing

The third and final algorithm applied to the data sets is simulated annealing. It uses hill climbing, but sometimes accepts a worse total knapsack value allowing exploration of other combinations which avoids settling on local maxima. It is used to approximate a global maximum in a fixed amount of time. [7]

A main feature of simulated annealing is the Temperature(T) and cooling schedule. An initial Temperature(T0) and final temperature(Tmin) are chosen. As iterations occur, the temperature at each iteration (Tk) approaches Tmin based on the cooling schedule, which is logarithmic in this case. [6]

During each iteration, an item is randomly selected from all items to put in the knapsack. While new knapsack weight is greater than the capacity of the knapsack, items are randomly removed until the knapsack weight is less than the knapsack capacity. The new knapsack total value then gets compared to the previous knapsack total value.

If the new knapsack total value is better, then the combination is accepted.Else, the new combination is accepted if chance is less than probability(p) or rejected otherwise. p = e(currentTotalValue - previousTotalValue)/Tk)

and chance is a random float between 0 and 1, inclusive. [6]

This ensures that the worse values accepted over time decrease in difference from the current solution value. It is known as cooling. At the end of each iteration, Tk is calculated by Tk = T0/(1 + ln(1+k)). After Tk reaches Tmin, the combination with the highest total knapsack value reached is returned. [8]

The memory requirements to run this algorithm are two arrays of length n, one holding previous item list and the other current item list. Also, three lists, one holding current knapsack items, another holding previous knapsack items, and the third holding function return values.

The solution quality of simulated annealing depends on user preference, as it is used to find a close approximation of global maximum.

For the uncorrelated Data, the simulated annealing algorithm was optimized to ~ 95% of best total knapsack value for maximum capacity and maximum total number of items. T0 was set to 1100 and Tmin set to 136.

For the correlated Data, the simulated annealing algorithm was optimized to ~ 99.7% of best total knapsack value for maximum capacity and maximum total number of items. T0 was set to 1100 and Tmin set to 200.

The Big-O for this algorithm is O(n log k) where n is the number of items and k is the number of iterations. The O(n) factors into the Big-O from item list arrays which are copied each iteration. A logarithmic function is applied to the iteration counter that is checked by a while

loop, hence O(log k). It runs in polynomial time.

Uncorrelated Correlated

Based on the runtime graphs, the Big-O appears more similar to O(log k) rather than to O(n log k). This is likely resulting from a miscalculation, the copying of item lists may in fact be O(1) rather than O(n). It appears runtime is also affected by capacity, which was thought not to affect runtime. Perhaps with a larger capacity, the

algorithm is more likely to have more items in it, thereby increasing the chance of increasing the loop time to remove items from an overfilled knapsack.

Also, there were differences in the performance of the algorithm based on the correlation of the data. The algorithm was faster and had better results on the correlated data, except in the case of changing number of items. This is likely due to cooling too quickly.

Comparisons

The results are so close that only one line shows up.

Interpretation/ConclusionsOur testing and analysis gave us

some valuable insight on the effectiveness of each algorithm.

The genetic algorithm is not ideal for the knapsack problem due to the amount of time that it takes to reach a solution. This is by far the slowest algorithm.

The simulated annealing algorithm is the fastest algorithm* but does not always get the optimal solution. And, the simulated annealing algorithm was the most difficult to implement.

The dynamic programming solution requires significantly more memory than the other algorithms and is slower than the simulated annealing algorithm*. The main benefit of this algorithm is the quality of the solution, as it always gets the optimal solution. And, this algorithm is relatively easy to understand.

Though the genetic algorithm and simulated annealing are somewhat similar, the genetic algorithm may generate around 10,000 or 100,000 new candidates for every 100 new candidates for simulated annealing. This drastically increases the runtime of the genetic algorithm.

For most datasets, the dynamic programming algorithm would be the best solution. And, the simulated annealing solution would be optimal for large datasets. The simulated annealing begins to outperform (with respect to time) the dynamic programming algorithm when the capacity of the knapsack becomes greater than the size of the population. [3] However, it is debatable as to how big of a dataset is worth the difficulty of implementing the simulated annealing algorithm. As discussed in the Dynamic Programming section, more testing could be performed to determine when simulated annealing becomes the better option.

*Except for small datasets, where dynamic programming is faster than simulated annealing.

Future WorkFuture work for this problem would

involve modifying current algorithms, implementing more complex algorithms,

and applying these algorithms to similar problems.

First, the dynamic programming algorithm could be improved by reducing the size of the matrix and removing unnecessary elements. This could be done by only calculating the parts of the matrix that make up the ending solution.

Also, a combination of algorithms could be be implemented to come up with a more complex algorithm. Some of the algorithms in Pisinger’s paper were complex algorithms that implemented features from several different algorithms. [2]

Lastly, the application of these algorithms to similar problems would be interesting. These algorithms could easily be applied to different variations of the knapsack problem as well as different resource allocation problems.

Works Cited[1] en.wikipedia.org/wiki/Knapsack_problem[2] www.dcs.gla.ac.uk/~pat/cpM/jchoco/knapsack/papers/hardInstances.pdf[3]www.micsymposium.org/mics_2005/papers/paper102.pdf[4] www.geeksforgeeks.org/pseudo-polynomial-in-algorithms/[5] hjemmesider.diku.dk/~pisinger/codes.html[6]https://www.ida.liu.se/~zebpe83/heuristic/lectures/SA_lecture.pdf[7]https://en.wikipedia.org/wiki/Simulated_annealing[8]

http://what-when-how.com/artificial-intelligence/a-comparison-of-cooling-schedules-for-simulated-annealing-artificial-intelligence/[9]https://people.sc.fsu.edu/~jburkardt/datasets/knapsack_01/knapsack_01.html[10]https://www.dataminingapps.com/2017/03/solving-the-knapsack-problem-with-a-simple-genetic-algorithm/[11]http://algohub.me/algo/genetic-algorithm.html

Documents

people.uncw.edupeople.uncw.edu › tagliarinig › Courses › 380 › S2019 pa… · Web viewThe knapsack problem is a combinatorial optimization problem, and has many practical