23
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011

Implicit Hitting Set Problems

  • Upload
    naava

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Implicit Hitting Set Problems. Richard M. Karp Harvard University August 29, 2011. Worst-case Analysis of NP-Hard Problems. Exact solution methods: exponential running time in worst case. - PowerPoint PPT Presentation

Citation preview

Page 1: Implicit Hitting Set Problems

Implicit Hitting Set Problems

Richard M. KarpHarvard University August 29, 2011

Page 2: Implicit Hitting Set Problems

Worst-case Analysis of NP-Hard Problems

• Exact solution methods: exponential running time in worst case.

• Polynomial-time approximation algorithms for optimization problems. Approximation ratios are usually unrealistically high.

• Parametrized complexity: polynomial-time complexity for instances with fixed parameter, but dependence on parameter is usually adverse.

Page 3: Implicit Hitting Set Problems

Probabilistic Analysis and Heuristics

• In probabilistic analysis problem instances are drawn from simple probability distributions. Often one can prove excellent performance on the average. However, the probability distributions may not correspond to real-life instances.

• Heuristics are often “unreasonably effective,” for reasons not well understood.

• We seek systematic methods for tuning heuristics and validating them by empirical testing on training sets of representative instances.

Page 4: Implicit Hitting Set Problems

Unreasonably Effective Heuristics

• Large traveling-salesman problems can be solved by quick tour construction methods, local improvement methods or cutting plane methods.

• Local improvement methods find near-optimal solutions to graph bisection problems.

• Huge satisfiability problems are routinely solved rapidly by branch-and-bound methods.

• The greedy set cover algorithm typically gives solutions within a few percent of optimal.

Page 5: Implicit Hitting Set Problems

Implicit Optimization Problems

• Set of constraints defined implicitly by a generation algorithm rather than by an explicit list.

-- Linear and convex programming: equivalence of separation and optimization

-- Integer programming: cutting-plane methods

-- Linear programming: column generation

Page 6: Implicit Hitting Set Problems

Hitting Set Problem

• Ground set V• For every v in V, a positive weight c(v).• C*: collection of subsets of V (circuits)• Goal: Find a set of minimum weight that

hits every set in C*• Equivalent to set cover problem

Page 7: Implicit Hitting Set Problems

Complexity of the Hitting Set Problem

• NP-hard and hard to approximate within ratio o(log | C*|).

• Greedy algorithm achieves approximation ratio O(log | C*|):

Repeat: Choose element v in V that minimizes ratio of c(v) to number of sets hit; Delete sets hit by v.

Page 8: Implicit Hitting Set Problems

Hitting Set Problem in Practice

• Greedy algorithm gives good approximate solutions.

• CPLEX integer programming algorithm often gives optimal solutions rapidly.

Page 9: Implicit Hitting Set Problems

Implicit Hitting Set Problem

• The collection of circuits C* has a compact implicit description.

• There is a polynomial-time separation oracle which, given a subset H of the ground set, either determines that H is a hitting set or produces a circuit that H does not hit.

Example: in the feedback vertex set problem, the separation oracle produces vertex set of a shortest cycle in the subgraph induced by V\H.

Page 10: Implicit Hitting Set Problems

• Feedback vertex set in a graph or digraph: vertex sets of cycles• Feedback edge set in a digraph: edge sets of cycles• Max cut: edge sets of odd cycles• Steiner tree: edge sets of cycles that partition the required

vertices• Maximum 2-sat: minimal contradictory sets of 2-element

clauses• Intersection of k matroids: circuits of each matroid• Maximal feasible subset of set of linear inequalities; minimal

infeasible subsets.

Examples

Page 11: Implicit Hitting Set Problems

Naïve Algorithm for Solving Implicit Hitting Set Problem

Repeat until a feasible hitting set H is found: (1) Given C, a subset of C*, find a minimum-weight

hitting set H for C. (2) Using the separation oracle, find a minimum-

cardinality circuit c not hit by H. (3) Add c to CReturn C

Page 12: Implicit Hitting Set Problems

Circuit-Finding Subroutine

Input: C, a set of circuits and H, a hitting set for C

Repeat until H hits every circuit in C* find a circuit c not hit by H and choose an

element x in c; add c to C and add x to H.

Page 13: Implicit Hitting Set Problems

Refined Algorithm

• Input: set of circuits C and hitting set H for C (1)Execute the circuit-finding subroutine (2) Repeat until k iterations yield no circuits: construct a

greedy hitting set H for C and execute the circuit-finding subroutine.

(3) Using CPLEX, construct an optimal hitting set H for C.

If H is infeasible, go to (1) Return H.

Page 14: Implicit Hitting Set Problems

Metrics

• Number of circuits generated, number of calls to solver, running time of generator.

Page 15: Implicit Hitting Set Problems

Application: Multi-Genome Alignment

• Highly similar sequences in two genomes constitute an anchor pair. The individual sequences are called anchors.

• A genome is a linearly ordered sequence of anchors.• An alignment is a matrix with a row for each genome, and

an assignment of each anchor to a column, respecting the linear orders.

• An anchor pair is synchronized if its two anchors lie in the same column.

• Goal: maximize the sum of the weights of the synchronized anchor pairs.

Page 16: Implicit Hitting Set Problems
Page 17: Implicit Hitting Set Problems

Complexity Bounds

• The 2-genome problem is equivalent to the maximum-weight increasing subsequence problem and is solvable in time O(n log n), where n is the cardinality of the ground set. The k-genome problem can be solved in time O(nk) by dynamic programming.

Page 18: Implicit Hitting Set Problems

Alignment as a Hitting Set Problem

• Ground set: anchor pairs• Goal: delete a minimum-weight set of anchor pairs such

that the remaining anchor pairs can be simultaneously synchronized.

• Directed edge (u,v): u precedes v .• undirected edge (u,v) : u and v are an anchor pair• Mixed cycle: contains directed and undirected edges, but at

least one directed edge.• An edge must be deleted from the set of undirected edges

of each mixed cycle (Kececioglu).

Page 19: Implicit Hitting Set Problems
Page 20: Implicit Hitting Set Problems

Solving the Alignment Problem

• Run the generic implicit hitting set algorithm, with the elements as anchors and the undirected edge sets of mixed

cycles as circuits.• Separation oracles: given a putative hitting set H, search for

a mixed cycle in the graph induced by the edges not in H. Two methods: (1) a variant of depth-first search; (2) attempt to align the remaining edges until blocked by

the occurrence of a mixed cycle.

Page 21: Implicit Hitting Set Problems

Performance on 4085 Problems of Aligning Five Worm Genome

Time (sec.) # solved # edges 0 to 0.01 1311 (1; 52; 399) 0.01 to 0.1 764 (20; 203; 549) 0.1 to 1 1086 (26; 450; 1837) 1 to 10 632 (44; 1104; 4645) 10 to 60 151 (65; 1351; 12313) 60 to 600 75 (103; 1136; 14690) 600 to 3600 36 (166; 1236; 13916)

Page 22: Implicit Hitting Set Problems

Tuning the Algorithm

• Within the general algorithmic strategy there are many possible choices of the separation oracle, greedy algorithm, versions of CPLEX, parameter choices etc. By tuning these choices on a training set of real-world examples we improved the performance by a factor of several hundred.

Page 23: Implicit Hitting Set Problems

Acknowledgment

• This is joint work with Erick Moreno Centeno