CSC373 - Approximation CSC373 - Approximation Algorithms Vincent Maccio Vincent MaccioCSC373 - Approximation

  • View
    1

  • Download
    0

Embed Size (px)

Text of CSC373 - Approximation CSC373 - Approximation Algorithms Vincent Maccio Vincent MaccioCSC373 -...

  • CSC373 - Approximation Algorithms

    Vincent Maccio

    Vincent MaccioCSC373 - Approximation Algorithms 1 / 22

  • Algorithm Classification

    The following is all informal.

    • Problems can be broken down and classified into different groups based on how hard they are to solve

    • Most problems we’ve seen in this course and most you’ve seen previously are easy problems

    • Let P be the set of problems which can be solved in polynomial time, that is if for some k a problem has an algorithm which solves it in O(nk) then that problem is in P

    • sorting a list, shortest path in a graph, determining primes, etc. are all in P

    • If for a problem one can be given a solution and determine if that solution is correct or not in polynomial time, then that problem is said to be in NP

    Vincent MaccioCSC373 - Approximation Algorithms 2 / 22

  • Algorithm Classification - cont

    • For example, determining if there exists a solution to a Knapsack problem which gives you value V is hard to do, but to determine if a given set of items is a solution or not which gives a value of at least V is easy

    • Just sum up the values to see if it’s greater than V and sum up the weights to see if it’s less than the capacity

    • A problem is said to be NP-complete if it is the hardest problem to solve in NP (if you can solve any NP-complete problem you can solve all NP problems)

    • A problem is said to be NP-hard if it’s at least as hard as NP-complete problems

    Vincent MaccioCSC373 - Approximation Algorithms 3 / 22

  • Algorithm Classification - Observations

    See if these make sense to you from the previous definitions

    • P ⊆ NP • There exist problems which are NP-hard, but not NP • (NP-hard - NP-complete) ∪P = ∅ • P ∩ NP-complete =?

    Vincent MaccioCSC373 - Approximation Algorithms 4 / 22

  • Coping with Complexity

    If tasked to solve an NP-problem what can one do? Solving it is typically too costly.

    1 Sacrifice generality • Limit the solution to a specific case • Assume the input won’t get too large

    2 Sacrifice optimality • Write something which runs quickly but returns a solution which may

    only be “reasonable” but may not be optimal

    3 Sacrifice reliability • Write something which returns the optimal sometimes, but not always

    Or some combination of the above.

    Vincent MaccioCSC373 - Approximation Algorithms 5 / 22

  • Vertex Cover

    • Given a graph G = (V ,E ), a set of vertices C is said to be a vertex cover if C ⊆ V and ∀(u, v) ∈ E : u ∈ C or v ∈ C

    • A popular problem is to find an optimal vertex cover C ∗, such that |C ∗| is as small as possible (while still being a vertex cover)

    • This problem is NP-hard • But we can come up with an approximation

    Vincent MaccioCSC373 - Approximation Algorithms 6 / 22

  • Approximation

    C = ∅ Let E ′ = E while E ′ 6= ∅:

    Choose an arbitrary edge (u, v) from E ′

    C = C ∪ {u, v} # two vertices, not an edge Remove all edges connected to u or v from E ′

    return C

    Vincent MaccioCSC373 - Approximation Algorithms 7 / 22

  • Approximation

    • This is a “2-approximation”, that is, for the vertex cover C returned by this algorithm it holds that |C | ≤ 2|C ∗|

    • Proof: • Let A be the set of edges “chosen” in the previous algorithm • No two edges in A share a start or end node, therefore, for every edge

    (u, v) ∈ A all vertex covers must contain at least u or v (including the optimal vertex cover)

    • Therefore |A| ≤ |C∗|, but each edge that’s added to A adds exactly two vertices to C

    • Therefore C = 2|A| ⇒ C ≤ 2|C∗|

    Vincent MaccioCSC373 - Approximation Algorithms 8 / 22

  • The Travelling Salesman Problem

    • Given a graph G = (V ,E ) and a start node a ∈ V find a path in G which visits each vertex other than a exactly once, and which starts at a and ends at a

    • Such a path is called a tour of G • The length of a tour is the sum of all the edge weights on said tour • In the travelling salesman problem, the goal is to find the shortest

    tour possible

    • This is an NP-hard problem • We will make two assumptions which will allow us to derive a

    reasonable approximation 1 The graph is dense, from each node there is an edge to every other

    node (if the graph is sparse, a tour usually does not exist in that graph) 2 The triangle inequality holds, i.e. w(a, c) ≤ w(a, b) + w(b, c) (often

    times for the travelling salesman problem the distance between nodes is thought of as the Euclidean distance so this assumption makes sense)

    Vincent MaccioCSC373 - Approximation Algorithms 9 / 22

  • TSP Approximation

    • Consider an optimal tour of G denoted by T ∗, let the length or total weight of this tour be c(T ∗)

    • Now consider removing any one edge from T ∗, this results in spanning tree of G , but because we removed an edge from T ∗ to create it, the total weight of that spanning tree is less than or equal to the total weight of T ∗, therefore for any minimum spanning tree denoted by MST , c(MST ) ≤ c(T ∗)

    • The idea is from an MST create a tour T such that c(T ) ≤ 2c(T ∗), or in other words, T is a 2-approximation

    • The general approach will be to derive an MST, from that MST create a walk of the graph, and from that walk create a tour of G

    • The first 2 steps can be seen graphically on the next slide

    Vincent MaccioCSC373 - Approximation Algorithms 10 / 22

  • The Travelling Salesman Problem

    Vincent MaccioCSC373 - Approximation Algorithms 11 / 22

  • TSP Approximation

    • A walk is a sequence of vertices of the graph such that each vertex appears at least once in the walk (it also starts and ends at a)

    • A walk of an MST can be defined recursively where one simply calls “walk” on each child node of a given vertex

    • Informally one can note that while a vertex may be visited an arbitrary number of times (1+ how many children it has), each edge of the MST is travelled across exactly twice (see the previous slide to convince yourself)

    • Therefore, letting W denote a walk of an MST and letting the total weight or length of that walk be denoted by c(W ), it is known that c(W ) = 2c(MST )

    Vincent MaccioCSC373 - Approximation Algorithms 12 / 22

  • TSP Approximation

    • In the previous graph seen 2 slides ago, the walk of the MST would be

    W = abcbhbadefegeda

    • From this we want to create a tour, which means it cannot visit the same node more than once

    • Note that the fourth node we visit on W is b, which is a node we’ve already visited

    • W can be altered to be a new walk, say W ′ such that the second visit to b is removed

    W ′ = abchbadefegeda

    • But from the triangle inequality we know w(c , h) ≤ w(c, b) + w(b, h) • Therefore, c(W ′) ≤ c(W )

    Vincent MaccioCSC373 - Approximation Algorithms 13 / 22

  • TSP Approximation

    • This simplification of W can be iteratively applied until no second visits to nodes exist (except for the start node a), let this altered version of the walk be denoted by T

    • Continuing with our example

    T = abchdefga

    • Note the walk T , is also a tour • We also know from our previous observations

    c(T ) ≤ c(W ) ≤ 2c(MST ) ≤ 2c(T ∗)

    which is what we’re trying to show

    Vincent MaccioCSC373 - Approximation Algorithms 14 / 22

  • Load Balancing

    • Given m identical machines and n job, where job j has size sj , your tasked with scheduling the jobs (determining which jobs are processed by which machines). Let Ti be the total work load of machine Mi . Let the “makespan” be the highest work load of all the machines and be denoted by T i.e. T = max

    1≤i≤m Ti

    • Let T ∗ denote the optimal (minimum) makespan, ideally you’d like to schedule the jobs to be optimal, but this is an NP-hard problem

    • Consider the greedy algorithm which looks at the next job and schedules it on the machine which has the current lowest load

    • This algorithm turns out to be a 2-approximation

    Vincent MaccioCSC373 - Approximation Algorithms 15 / 22

  • Load Balancing - Proof

    • We can note two bounds on the optimal makespan 1 T ∗ ≥ (1/m)

    ∑n j=1 sj

    2 T ∗ ≥ max j

    sj

    • Let Tk be the workload of the kth machine after the greedy algorithm executes

    • Let Ti be the makespan (the greatest workload among the machines), therefore, Mi is the heaviest loaded machine

    • Consider the last job to be scheduled to Mi (let it be job j with size sj) and consider how the system looked the moment before that job is scheduled

    • At that moment in time Mi had a workload of Ti − sj , but because the job was sent to Mi we know all other machines had at least a workload of Ti − sj

    Vincent MaccioCSC373 - Approximation Algorithms 16 / 22

  • Load Balancing - Proof - cont

    • Therefore, ∀k : Tk ≥ (Ti − sj)

    ⇒ m(Ti − sj) ≤ m∑

    k=1

    Tk

    ⇒ (Ti − sj) ≤ (1/m) m∑

    k=1

    Tk

    ⇒ (Ti − sj) ≤ (1/m) n∑

    j=1

    sj

    Therefore, from o