33
Automatic Tuning 1/33 Boosting Verification by Automatic Tuning of Decision Procedures Domagoj Babić joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia

Boosting Verification by Automatic Tuning of Decision Procedures

  • Upload
    hinda

  • View
    38

  • Download
    0

Embed Size (px)

DESCRIPTION

Boosting Verification by Automatic Tuning of Decision Procedures. Domagoj Babi ć joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia. Decision procedures. Decision procedure. formula. SAT(solution)/UNSAT. Core technology for formal reasoning - PowerPoint PPT Presentation

Citation preview

Page 1: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 1/33

Boosting Verification by Automatic Tuning ofDecision Procedures

Domagoj Babić

joint work with Frank Hutter, Holger H. Hoos, Alan J. Hu University of British Columbia

Page 2: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 2/33

Decision procedures

• Core technology for formal reasoning

• Trend towards completely automatized verification– Scalability is problematic– Better (more scalable) decision procedures needed– Possible direction: application-specific tuning

Decisionprocedureformula SAT(solution)/UNSAT

Page 3: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 3/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 4: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 4/33

Performance of Decision Procedures

• Heuristics

• Learning (avoiding repeating redundant work)

• Algorithms

Page 5: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 5/33

Heuristics and search parameters

• The brain of every decision procedure– Determine performance

• Numerous heuristics:– Learning, clause database cleanup, variable/phase

decision,...• Numerous parameters:

– Restart period, variable decay, priority increment,...

• Significantly influence the performance• Parameters/heuristics perform differently on

different benchmarks

Page 6: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 6/33

Spear bit-vector decision procedureparameter space

• Large number of combinations:– After limiting the range of double & unsigned

– After discretization of double parameters

3.78£1018

– After exploiting dependencies

8.34£1017 combinations– Finding a good

combination – hard!

Spear 1.9:– 4 heuristics X

22 optimization functions– 2 heuristics X

3 optimization functions– 12 double– 4 unsigned– 4 bool

------------------------ 26 parameters

Page 7: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 7/33

Goal

• Find a good combination of parameters (and heuristics):– Optimize for different problem sets

(minimizing the average runtime)

• Avoid time-consuming manual optimization

• Learn from found parameter sets– Apply that knowledge to design of decision

procedures

Page 8: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 8/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 9: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 9/33

Manual optimization

• Standard way for finding parameter sets

• Developers pick small set of easy benchmarks(Hard benchmarks = slow development cycle)– Hard to achieve robustness– Easy to over-fit (to small and specific benchmarks)

• Spear manual tuning:– Approximately one week of tedious work

Page 10: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 10/33

When to give up manual optimization?

• Depends mainly on sensitivity of the decision procedure to parameter modifications

• Decision procedures for NP-hard problems extremely sensitive to parameter modifications– 1-2 orders of magnitude changes in performance

usual– Sometimes up to 4 orders of magnitude

Page 11: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 11/33

Sensitivity Example

• Example: same instance, same parameters, same machine, same solver– Spear compiled with 80-bit floating-point precision:

0.34 [s] – Spear compiled with 64-bit floating-point precision:

times out after 6000 [s]– First ~55000 decisions equal, one mismatch, next

~100 equal, then complete divergence• Manual optimization for NP-hard problems

ineffective.

Page 12: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 12/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 13: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 13/33

Automatic tuning

• Loop until happy (with found parameters)

– Perturb existing set of parameters

– Perform hill-climbing:• Modify one parameter at the time• Keep modification if improvement• Stop when a local optimum is found

Page 14: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 14/33

Implementation: FocusedILS [Hutter, Hoos, Stutzle, ’07]

• Used for Spear tuning• Adaptively chooses training instances

– Quickly discard poor parameter settings– Evaluate better ones more thoroughly

• Any scalar metric can be optimized– Runtime, precision, number of false

positives,...• Can optimize median, average,...

Page 15: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 15/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 16: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 16/33

Experimental Setup - Benchmarks

• 2 experiments:– General purpose tuning (Spear v0.9)

• Industrial instances from previous SAT competitions– Application-specific tuning (Spear v1.8)

• Bounded model checking instances (BMC)• Calysto software checking instances

• Machines– 55 dual 3.2 GHz Intel Xeon PCs w/ 2 GB RAM cluster

• Benchmark sets divided– Training & test, disjoint– Test timeout: 10 hrs

Page 17: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 17/33

Tuning 1: General-purpose optimization

• Training – Timeout: 10 sec– Risky, but no experimental evidence of over-fitting– 3 days of computation on cluster

• Very heterogeneous training set– Industrial instances from previous competitions

• 21% geometric mean speedup on industrial test set over the manual settings

• ~3X on bounded model checking• ~78X on Calysto software checking

Page 18: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 18/33

Tuning 1: Bounded model checking instances

Page 19: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 19/33

Tuning1: Calysto instances

Page 20: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 20/33

Tuning 2: Application-specific optimization

• Training – Timeout: 300 sec– Bounded model checking optimization – 2 days on the cluster– Calysto instances – 3 days on the cluster

• Homogeneous training set

• Speedups over SAT competition settings:– ~2X on BMC– ~20X on SWV

• Speedups over manual settings:– ~4.5X on BMC– ~500X on SWV

Page 21: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 21/33

Tuning 2:Bounded model checking instances

~4.5X

Page 22: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 22/33

Tuning 2: Calysto instances

~500X

Page 23: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 23/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 24: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 24/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 25: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 25/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 26: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 26/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 27: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 27/33

Overall Results

Solver BMC SWV

#(solved) Avg.runtime (solved)

#(solved) Avg.runtime (solved)

Minisat 289/377 360.9 302/302 161.3

Spear manual

287/377 340.8 298/302 787.1

Spear SAT comp

287/377 223.4 302/302 35.9

Spear auto-tunedapp-specific

291/377 113.7 302/302 1.5

Page 28: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 28/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 29: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 29/33

Software verification parameters

– Greedy activity-based heuristic• Probably helps focusing on the most frequently

used sub-expressions– Aggressive restarts

• Probably standard heuristics and initial ordering do not work well for SWV problems

– Phase selection: always false• Probably related to checked property

(NULL ptr dereference)– No randomness

• Spear & Calysto highly optimized

Page 30: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 30/33

Bounded model checking parameters

– Less aggressive activity heuristic– Infrequent restarts

• Probably initial ordering (as encoded) works well– Phase selection: less watched clauses

• Minimizes the amount of work– Small amount of randomness helps

• 5% random variable and phase decisions– Simulated annealing works well

• Decrease randomness by 30% after each restart• Focuses the solver on hard chunks of the design

Page 31: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 31/33

Outline

• Problem definition• Manual tuning• Automatic tuning• Experimental results• Found parameter sets• Future work

Page 32: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 32/33

Future Work

• Per-instance tuning(machine-learning-based techniques)

• Analysis of relative importance of parameters– Simplify the solver

• Tons of data, little analysis done... Correlations between parameters and stats could reveal important dependencies...

Page 33: Boosting Verification by Automatic Tuning of Decision Procedures

Automatic Tuning 33/33

Take-away messages

• Automatic tuning effective– Especially application-specific

• Avoids time-consuming manual tuning

• Sensitivity to parameter modifications– Few benchmarks = inconclusive results?