39
The Space The Space Efficiency of OSHL Efficiency of OSHL Swaha Miller Swaha Miller David A. Plaisted David A. Plaisted UNC Chapel Hill UNC Chapel Hill

The Space Efficiency of OSHL

Embed Size (px)

DESCRIPTION

The Space Efficiency of OSHL. Swaha Miller David A. Plaisted UNC Chapel Hill. How do humans prove theorems?. Semantics Case analysis Sequential search through space of possible structures Focus on the theorem. - PowerPoint PPT Presentation

Citation preview

Page 1: The Space Efficiency of OSHL

The Space The Space Efficiency of OSHLEfficiency of OSHL

Swaha MillerSwaha Miller

David A. PlaistedDavid A. Plaisted

UNC Chapel HillUNC Chapel Hill

Page 2: The Space Efficiency of OSHL

How do humans prove How do humans prove theorems?theorems?

SemanticsSemantics

Case analysisCase analysis

Sequential search through space Sequential search through space of possible structuresof possible structures

Focus on the theoremFocus on the theorem

Page 3: The Space Efficiency of OSHL

““Systematic methods can Systematic methods can now routinely solve now routinely solve verification problems with verification problems with thousands or tens of thousands or tens of thousands of variables, thousands of variables, while local search methods while local search methods can solve hard random can solve hard random 3SAT problems with 3SAT problems with millions of variables.”millions of variables.”(from a conference (from a conference announcement)announcement)

Page 4: The Space Efficiency of OSHL

DPLL ExampleDPLL Example

{p,r},{p,q,r},{p,r}

{T,r},{T,q,r},{T,r}

{F,r},{F,q,r},{F,r}

p=T p=F

{q,r} {r},{r}

{}

SIMPLIFY

SIMPLIFY

SIMPLIFY

Page 5: The Space Efficiency of OSHL

Hyper LinkingHyper Linking

Problem Input Clauses

OTTER (sec)

Hyper Linking

Ph5 45 38606.76 1.8

Ph9 297 >24 hrs 2266.6

Latinsq 16 >24 hrs 56.4

Salt 44 1523.82 28.0

Zebra 128 >24 hrs 866.2

Page 6: The Space Efficiency of OSHL

Eliminating Duplication with the Eliminating Duplication with the Hyper-Linking Strategy, Shie-Jue Hyper-Linking Strategy, Shie-Jue Lee and David A. Plaisted, Lee and David A. Plaisted, Journal of Automated Reasoning Journal of Automated Reasoning 9 (1992) 25-42.9 (1992) 25-42.

Page 7: The Space Efficiency of OSHL

Later propositional Later propositional strategiesstrategies

Billon’s disconnection calculus, Billon’s disconnection calculus, derived from hyper-linkingderived from hyper-linking

Disconnection calculus theorem Disconnection calculus theorem prover (DCTP), derived from prover (DCTP), derived from Billon’s workBillon’s work

FDPLLFDPLL

Page 8: The Space Efficiency of OSHL

Performance of DCTP on Performance of DCTP on TPTP, 2003TPTP, 2003

DCTP 1.3 first in EPS and EPR DCTP 1.3 first in EPS and EPR (largely propositional)(largely propositional)

DCTP 10.2p third in FNE (first-order, DCTP 10.2p third in FNE (first-order, no equality) solving same number no equality) solving same number as best proversas best provers

DCTP 10.2p fourth in FOF and FEQ DCTP 10.2p fourth in FOF and FEQ (all first-order formulae, and (all first-order formulae, and formulae with equality)formulae with equality)

DCTP 1.3 is a single strategy prover.DCTP 1.3 is a single strategy prover.

Page 9: The Space Efficiency of OSHL

Strategy Selection in Strategy Selection in EE

Page 10: The Space Efficiency of OSHL

Strategy SelectionStrategy Selection

Schulz, Stephan, E-A Brainiac Theorem Schulz, Stephan, E-A Brainiac Theorem Prover, Journal of AI Communications Prover, Journal of AI Communications 15(2/3):111-126, 2002. 15(2/3):111-126, 2002.

Page 11: The Space Efficiency of OSHL

Strategy SelectionStrategy SelectionThe Vampire kernel provides a fairly large The Vampire kernel provides a fairly large

number of features for strategy selection. number of features for strategy selection. The most important ones are: The most important ones are:

Choice of the main saturation procedure : (i) Choice of the main saturation procedure : (i) OTTER loop, with or without the Limited OTTER loop, with or without the Limited Resource Strategy, (ii) DISCOUNT loop. Resource Strategy, (ii) DISCOUNT loop.

A variety of optional simplifications. A variety of optional simplifications. Parameterised reduction orderings. Parameterised reduction orderings. A number of built-in literal selection A number of built-in literal selection

functions and different modes of functions and different modes of comparing literals. comparing literals.

Age-weight ratio that specifies how strongly Age-weight ratio that specifies how strongly lighter clauses are preferred for inference lighter clauses are preferred for inference selection. selection.

Set-of-support strategy. Set-of-support strategy.

Page 12: The Space Efficiency of OSHL

Strategy SelectionStrategy Selection

The automatic mode of Vampire 7.0 is The automatic mode of Vampire 7.0 is derived from extensive experimental data derived from extensive experimental data obtained on problems from TPTP v2.6.0. obtained on problems from TPTP v2.6.0. Input problems are classified taking into Input problems are classified taking into account simple syntactic properties, such account simple syntactic properties, such as being Horn or non-Horn, presence of as being Horn or non-Horn, presence of equality, etc. Additionally, we take into equality, etc. Additionally, we take into account the presence of some important account the presence of some important kinds of axioms, such as set theory kinds of axioms, such as set theory axioms, associativity and commutativity. axioms, associativity and commutativity. Every class of problems is assigned a Every class of problems is assigned a fixed schedule consisting of a number of fixed schedule consisting of a number of kernel strategies called one by one with kernel strategies called one by one with different time limits. different time limits.

Page 13: The Space Efficiency of OSHL

DCTP Strategy SelectionDCTP Strategy SelectionDCTP 1.31 has been implemented as a DCTP 1.31 has been implemented as a

monolithic system in the Bigloo dialect monolithic system in the Bigloo dialect of the Scheme language.of the Scheme language.

DCTP 1.31 is a single strategy prover. DCTP 1.31 is a single strategy prover. Individual strategies are started by Individual strategies are started by DCTP 10.21p using the schedule based DCTP 10.21p using the schedule based resource allocation scheme known from resource allocation scheme known from the E-SETHEO system. Of course, the E-SETHEO system. Of course, different schedules have been different schedules have been precomputed for the syntactic problem precomputed for the syntactic problem classes. The problem classes are more classes. The problem classes are more or less identical with the sub-classes of or less identical with the sub-classes of the competition organisers.the competition organisers.

In CASC-J2 DCTP 10.21p performed In CASC-J2 DCTP 10.21p performed substantially better.substantially better.

Page 14: The Space Efficiency of OSHL

Goal of OSHLGoal of OSHL

First-order logicFirst-order logic

Clause formClause form

Propositional efficiencyPropositional efficiency

SemanticsSemanticsRequires ground decidabilityRequires ground decidability

Page 15: The Space Efficiency of OSHL

Structure of OSHLStructure of OSHL

Goal sensitivity if semantics chosen Goal sensitivity if semantics chosen properlyproperlyChoose initial semantics to satisfy axiomsChoose initial semantics to satisfy axioms

Use of natural semanticsUse of natural semanticsFor group theory problems, can specify a For group theory problems, can specify a

groupgroup

Sequential search through possible Sequential search through possible interpretationsinterpretationsThus similar to Davis and Putnam’s methodThus similar to Davis and Putnam’s methodPropositional EfficiencyPropositional Efficiency

Constructs a semantic treeConstructs a semantic tree

Page 16: The Space Efficiency of OSHL

Ordered Semantic Hyperlinking (Oshl)Ordered Semantic Hyperlinking (Oshl)

Reduce first-order logic problem to Reduce first-order logic problem to propositional problem propositional problem

Imports propositional efficiency into first-Imports propositional efficiency into first-order logicorder logic

The algorithmThe algorithmImposes an ordering on clausesImposes an ordering on clausesProgresses by generating ground instances Di Progresses by generating ground instances Di

of input clauses and refining interpretationsof input clauses and refining interpretations

unsatisfiable

I0 I1 I2 I3 …

D0 D1 D2 T

Page 17: The Space Efficiency of OSHL

SemanticsSemantics

Trivial semantics:Trivial semantics:Positive: Choose IPositive: Choose I00 to falsify all to falsify all

atoms, first D is all positive. atoms, first D is all positive. Forward chaining.Forward chaining.

Negative: Choose INegative: Choose I00 to satisfy all to satisfy all atoms, first D is all negative. atoms, first D is all negative. Backward chaining.Backward chaining.

Natural semantics: INatural semantics: I00 chosen by chosen by useruser

Page 18: The Space Efficiency of OSHL

Semantics OrderingSemantics Ordering<<t t a well founded ordering on atoms, a well founded ordering on atoms,

extended to literalsextended to literals

Extend <Extend <t t to interpretations as follows:to interpretations as follows:

I and J agree on L if they interpret L the I and J agree on L if they interpret L the samesame

Suppose ISuppose I00 is given is given

I <I <tt J if I and J are not identical, A is the J if I and J are not identical, A is the minimal atom on which they disagree, minimal atom on which they disagree, and I agrees with Iand I agrees with I00 on A on A

Page 19: The Space Efficiency of OSHL

Rules of OSHL

Start with empty sequence

(C1,C2, …, Cn), D minimal ground instance of an input clause that contradicts I, I minimal model of sequence

(C1,C2, …, Cn,D)

(C1,C2, …, Cn, D), Cn “out of order”

(C1,C2, …, Cn-1,D)

(C1,C2, …, Cn,D), max resolution possible

(C1,C2, …, Cn-1,res(Cn,D,L))

Proof if empty clause derived

Page 20: The Space Efficiency of OSHL

Propositional Example (p I0 p)

()

({-p1, -p2, -p3}) I0[-p3]

({-p1, -p2, -p3}, {-p4, -p5, -p6}) I0 [-p3,-p6]

({…}, {…}, {-p7}) I0 [-p3,-p6,-p7]

({…}, {…}, {-p7}, {p3, p7})

({…}, {-p4, -p5, -p6}, {p3})

({-p1, -p2, -p3},{p3})

({-p1, -p2 }) I0 [-p2]

Page 21: The Space Efficiency of OSHL

U RulesU RulesChoose clauses instances to match Choose clauses instances to match

existing literals. Look for a existing literals. Look for a contradiction.contradiction.

Basic clauses and U clausesBasic clauses and U clausesBasic clauses are used in three rules givenBasic clauses are used in three rules givenSequence can also have U clauses on the Sequence can also have U clauses on the

endendU clauses have a selected literalU clauses have a selected literalIn basic clauses the max. lit. is selectedIn basic clauses the max. lit. is selectedIn U clauses other literals can be selected.In U clauses other literals can be selected.Significant performance enhancement.Significant performance enhancement.

Page 22: The Space Efficiency of OSHL

UR Resolution ExampleUR Resolution Example

Given the sequenceGiven the sequence ({s(a), ({s(a), p(b) p(b) }, {t(a),}, {t(a), q(b) q(b)})})

and the clauseand the clause {{p(X), p(X), q(X), r(X)}q(X), r(X)}

create the sequencecreate the sequence ({s(a), ({s(a), p(b)p(b)}, {t(a), }, {t(a), q(b)q(b)}, {}, {p(b), p(b), q(b), q(b), r(b)r(b)} )} )

X b

Page 23: The Space Efficiency of OSHL

Filtering ExampleFiltering Example

Given the sequence ({s(a), Given the sequence ({s(a), p(b)p(b)}, {t(a), }, {t(a), q(b)q(b)}) })

and the clause {and the clause {p(X), p(X), q(X)} q(X)}

create the sequence create the sequence

({s(a), ({s(a), p(b)p(b)}, {t(a), }, {t(a), q(b)q(b)}, {}, {p(b), p(b), q(b)q(b)} )} )

X b

Page 24: The Space Efficiency of OSHL

Case Analysis ExampleCase Analysis Example

Given the sequence ({s(a), Given the sequence ({s(a), p(b)p(b)}, {t(a), }, {t(a), q(b)q(b)}) })

and the clause {and the clause {q(X), r(X), s(X)} q(X), r(X), s(X)}

create the sequence create the sequence

({s(a), ({s(a), p(b)p(b)}, {t(a), }, {t(a), q(b)q(b)}, {}, {q(b), r(b), q(b), r(b), s(b)s(b)} )} )

X b

Page 25: The Space Efficiency of OSHL

Example Proof Using U Example Proof Using U RulesRules

All positive semanticsAll positive semanticsClauses:Clauses:A1. A1. XXY, Y, YYX, X=YX, X=YA2. A2. ZZX, X, XXY, ZY, ZYYA3. g(X,Y)A3. g(X,Y)X, XX, XYYA4. A4. g(X,Y)g(X,Y)Y, XY, XYYA5. A5. ZZX, ZX, ZX X Y Y A6. A6. ZZY, ZY, ZX X Y YA7. A7. ZZX X Y, Z Y, ZX, ZX, ZYYT. T. A A B = B B = B A A

Page 26: The Space Efficiency of OSHL

Example Proof Using U Example Proof Using U RulesRules

1. {1. {A A B = B B = B A} A} (T)(T)2. {2. {A A B B B B A A, , B B A A A A B, A B, A B = B =

B B A} (Case Analysis, A1) A} (Case Analysis, A1)3. {3. {g(A g(A B, B B, B A) A) B B A A, A , A B B B B A} A}

(UR resolution, A4)(UR resolution, A4)4. {g(A 4. {g(A B, B B, B A) A) B B A, A, g(…) g(…) B B} (UR } (UR

resolution, A5)resolution, A5)5. {g(A 5. {g(A B, B B, B A) A) B B A, A, g(…) g(…) A A} (UR } (UR

resolution, A6)resolution, A6)6. {g(…) 6. {g(…) B, g(…) B, g(…) A, A, g(…) g(…) A A B B} }

(UR resolution, A7)(UR resolution, A7)7. {A 7. {A B B B B A, A, g(…) g(…) A A B B} (Filtering, } (Filtering,

A3)A3)

Page 27: The Space Efficiency of OSHL

Example Proof Using U Example Proof Using U RulesRules

1. {1. {A A B = B B = B A} A}2. {2. {A A B B B B A A, , B B A A A A B, A B, A B = B =

B B A} (Case Analysis) A} (Case Analysis)3. {3. {g(A g(A B, B B, B A) A) B B A A, A , A B B B B

A} (UR resolution)A} (UR resolution)4. {g(A 4. {g(A B, B B, B A) A) B B A, A, g(…) g(…) B B} }

(UR resolution)(UR resolution)5. {g(A 5. {g(A B, B B, B A) A) B B A, A, g(…) g(…) A A} }

(UR resolution)(UR resolution)8. {g(…) 8. {g(…) B, B, g(…) g(…) A A, A , A B B B B A,} A,}

(Resolution of 6. and 7.)(Resolution of 6. and 7.)

Page 28: The Space Efficiency of OSHL

Example Proof Using U Example Proof Using U RulesRules

1. {1. {A A B = B B = B A} A}2. {2. {A A B B B B A A, , B B A A A A B, A B, A B = B =

B B A} (Case Analysis) A} (Case Analysis)3. {3. {g(A g(A B, B B, B A) A) B B A A, A , A B B B B

A} (UR resolution)A} (UR resolution)4. {g(A 4. {g(A B, B B, B A) A) B B A, A, g(…) g(…) B B} }

(UR resolution)(UR resolution)9. {g(A 9. {g(A B, B B, B A) A) B B A, A, g(…) g(…) B B, A , A

B B B B A} (Resolution of 8. and 5.) A} (Resolution of 8. and 5.)

Page 29: The Space Efficiency of OSHL

Example Proof Using U Example Proof Using U RulesRules

1. {1. {A A B = B B = B A} A}2. {2. {A A B B B B A A, , B B A A A A B, A B, A B = B =

B B A} (Case Analysis) A} (Case Analysis)3. {3. {g(A g(A B, B B, B A) A) B B A A, A , A B B B B

A} (UR resolution)A} (UR resolution)10. {10. {g(A g(A B, B B, B A) A) B B A A} (Resolution } (Resolution

of 9. and 4.)of 9. and 4.)

Page 30: The Space Efficiency of OSHL

Example Proof Using U Example Proof Using U RulesRules

1. {1. {A A B = B B = B A} A}2. {2. {A A B B B B A A, , B B A A A A B, A B, A B = B B = B

A} (Case Analysis)A} (Case Analysis)11. {11. {A A B B B B A A} (Resolution of 10. and 3.)} (Resolution of 10. and 3.)

Page 31: The Space Efficiency of OSHL

Example Proof Using U Example Proof Using U RulesRules

1. {1. {A A B = B B = B A} A}12. {12. {B B A A A A B B, A , A B = B B = B A} A}

(Resolution of 11 and 2)(Resolution of 11 and 2)

Now the other half of the proof will be done. Note that there is only one ascending sequence of clauses constructed by OSHL and we are only indicating part of it.

Page 32: The Space Efficiency of OSHL

Implementation ResultsImplementation ResultsSlower implementation speed of OSHLSlower implementation speed of OSHL

Uniform strategy versus strategy Uniform strategy versus strategy selectionselection

The choice of OtterThe choice of Otter

Influence of U rules on an earlier Influence of U rules on an earlier version:version:None: 233 proofs in 30 seconds on TPTP None: 233 proofs in 30 seconds on TPTP

problemsproblems

Using them: 900 proofs in 30 secondsUsing them: 900 proofs in 30 seconds

All results for trivial semanticsAll results for trivial semantics

Page 33: The Space Efficiency of OSHL

Implementation ResultsImplementation Results

OSHL has no special data OSHL has no special data structures.structures.

Implemented in OCaMLImplemented in OCaML

No special equality methodsNo special equality methods

Semantics was implemented but Semantics was implemented but frequently only trivial semantics frequently only trivial semantics was used.was used.

Thus significant performance Thus significant performance improvements are possible.improvements are possible.

Page 34: The Space Efficiency of OSHL

Various ProversVarious Provers

PTTP solved 999 of 2200 tested problems.PTTP solved 999 of 2200 tested problems.

Otter proved 1595.Otter proved 1595.

leanCoP proved 745.leanCoP proved 745.

Source:Source:

Jens Otten and Wolfgang Bibel.Jens Otten and Wolfgang Bibel.leanCoP: Lean Connection-Based leanCoP: Lean Connection-Based Theorem Proving. Theorem Proving. Journal of Symbolic Journal of Symbolic Computation, Volume 36, pages 139-161. Computation, Volume 36, pages 139-161. Elsevier Science, 2003.Elsevier Science, 2003.

Vampire 6.0: 3286 refutations of 7267 Vampire 6.0: 3286 refutations of 7267 problems, more solvedproblems, more solved

Page 35: The Space Efficiency of OSHL

Total Number of ProofsTotal Number of Proofs##

PP

RR

OO

BB

SS

# Otter Proofs# Otter Proofs # OSHL-U # OSHL-U ProofsProofs

AllAll HH

OO

RR

NN

Non-Non-HornHorn

AllAll HH

OO

RR

NN

Non-Non-HornHorn

AlAlll

RR

==

00

RR

>>

00

AlAlll

RR

==

00

RR

>>

00

AllAll 44144177

16916977

764764 939333

636636 297297 10210277

311311 717166

451451 265265

FLDFLD 143143 2828 00 2828 1717 1111 6868 00 6868 2121 4747

SETSET 604604 168168 22 161666

126126 4040 211211 22 202099

114114 9797

R denotes the TPTP difficulty rating

30 second time limit on each problem with each prover

Page 36: The Space Efficiency of OSHL

Implementation ResultsImplementation Results

Shows that a prover working entirely Shows that a prover working entirely at the ground level can come into at the ground level can come into the range of performance of a the range of performance of a respectable resolution theorem respectable resolution theorem prover.prover.

DCTP and FDPLL probably perform DCTP and FDPLL probably perform better than OSHL.better than OSHL.

DCTP and FDPLL do not work entirely DCTP and FDPLL do not work entirely at the ground level and do not use at the ground level and do not use natural semantics.natural semantics.

Page 37: The Space Efficiency of OSHL

Search spaceSearch space

AllAll HornHorn Non-Non-HornHorn

R=0R=0 R>0R>0 Non-Horn, Non-Horn, R>0R>0

OtterOtter 708708 9090 618618 357357 351351 348348

OSHL-UOSHL-U 104104 3939 6565 7878 2626 2626

Number of clauses generated (in 1,000s) computed on 827 problems that were proved by both provers

RatioRatio AllAll HornHorn Non-Non-HornHorn

R=0R=0 R>0R>0 Non-Horn, Non-Horn, R>0R>0

OSHL-UOSHL-U

OtterOtter0.1470.147 0.4330.433 0.1050.105 0.210.21

880.070.07

550.0750.075

Ratio of number of clauses generated

Page 38: The Space Efficiency of OSHL

Storage spaceStorage space

AllAll HornHorn Non-Non-HornHorn

R=0R=0 R>0R>0 Non-Horn, Non-Horn, R>0R>0

OtterOtter 423423 8181 342342 230230

193193 192192

OSHL-UOSHL-U 9191 3737 5555 6767 2525 2525Max. number of clauses stored (in 1,000s) computed on

827 problems that were proved by both provers

RatioRatio AllAll HornHorn Non-Non-HornHorn

R=0R=0 R>0R>0 Non-Horn, Non-Horn, R>0R>0

OSHL-UOSHL-U

OtterOtter0.2150.215 0.4570.457 0.1610.161 0.290.29

110.130.13

000.1300.130

Ratio of number of clauses stored

Page 39: The Space Efficiency of OSHL

Implementation ResultsImplementation Results

In a given number of inferences In a given number of inferences OSHL finds more proofs than OSHL finds more proofs than Otter for non Horn problemsOtter for non Horn problems