46
Logic Programming LispNYC June 9 th , 2015 Pierre de Lacaze Shareablee [email protected]

Logic Programming and ILP

Embed Size (px)

Citation preview

Page 1: Logic Programming and ILP

Logic Programming LispNYC June 9th, 2015

Pierre de Lacaze

Shareablee [email protected]

Page 2: Logic Programming and ILP

The Logic Programming Model

• Logic Programming is an abstract model of computation. • Lambda Calculus is another abstract model of

computation. • Prolog is a particular implementation of the logic

programming model in much the same way that Clojure and Haskell are particular implementation of the lambda calculus.

• OPS5 is another implementation of the logic programming model.

• The use of mathematical logic to represent and execute computer programs is also a feature of the lambda calculus

• Prolog is classified as a functional language (wikipedia)

Page 3: Logic Programming and ILP

Prolog Introduction

• Prolog is a declarative language

• Prolog is logic programming language

• Invented in 1972 by Colmerauer & Roussel – Edinburg Prolog

– Marseilles Prolog

• Initially used for Natural Language Processing

• Programs consist of fact & rules

• Fact is a clause in FOPC

• Rule is an inference: B A1,…,An

• Use queries to run programs and perform retrievals

Page 4: Logic Programming and ILP

Logic Programming Paradigm

• A program is logical description of your problem from which a solution is logically derivable

• The execution of a program is very much like the mathematical proof of a theorem

• Where’s my program?

– N! is (N-1)! times N

– 0! is 1

Page 5: Logic Programming and ILP

Horn Clauses

• A Horn clause is a disjunction of literals with at most one positive literal

• In mathematical logic and logic programming, a Horn clause is a logical formula of a particular rule-like form which gives it useful properties for use in logic programming, formal specification, and model theory.

• Horn clauses are named for the logician Alfred Horn (1951)

• (u ← p ∧ q ∧ ... ∧ t) is equivalent to (u ∨ ¬p ∨ ¬q ∨ ... ∨ ¬t)

• In the non-propositional case, all variables in a clause are implicitly universally quantified with scope the entire clause. Thus, for example:

1. ¬ human(X) ∨ mortal(X)stands for:

2. ∀X( ¬ human(X) ∨ mortal(X) )which is logically equivalent to:

3. ∀X ( human(X) → mortal(X) )

Page 6: Logic Programming and ILP

Warren Abstract Machine

• In 1983, David H. D. Warren designed an abstract machine for the execution of Prolog consisting of a memory architecture and an instruction set.

• This design became known as the Warren Abstract Machine (WAM) and has become the de facto standard target for Prolog compilers.

• Prolog code is reasonably easy to translate to WAM instructions which can be more efficiently interpreted.

• Also, subsequent code improvements and compilation to native code are often easier to perform on the more low-level representation.

• In order to write efficient Prolog programs, a basic understanding of how the WAM works can be advantageous.

• Some of the most important WAM concepts are first argument indexing and its relation to choice-points, tail call optimization and memory reclamation on failure

• http://en.wikipedia.org/wiki/Warren_Abstract_Machine

Page 7: Logic Programming and ILP

Prolog Facts

• Facts: <predicate>(<arg1>,…,<argN>)

• Example: likes(mary, john)

• Constants must be in lowercase

• Variables must be in uppercase or start with an underscore.

• Example: eats(mikey, X)

• Example: believes(peter, likes(mary, john))

Page 8: Logic Programming and ILP

Basic Inferences & Variables

likes(john, cheese) likes(mary, cheese) likes(bob, meat) similar(X,Y) :- likes(X,Z), likes(Y,Z) Note: You can use [‘<filename/pathname>’]. to compile and load files GNU Prolog 1.4.1 By Daniel Diaz Copyright (C) 1999-2012 Daniel Diaz | ?- ['C:\\Projects\\Languages\\Prolog\\similar.pl']. compiling C:/Projects/Languages/code/Prolog/similar.pl for byte code... C:/Projects/Languages/code/Prolog/similar.pl compiled, 4 lines read - 935 bytes written, (16 ms) yes

Page 9: Logic Programming and ILP

Filling in the Blanks

| ?- similar(john, mary).

yes

| ?- similar(john, bob).

no

| ?- similar(mary, X).

X = john ?

yes

| ?- similar(X, Y). X = john Y = john ? ; X = john Y = mary ? ; X = mary Y = john ? ; X = mary Y = mary ?

Note: can use ; and a to get next or all answers

Page 10: Logic Programming and ILP

Unification

• The Unification Algorithm is a famous algorithm from the field of AI, often used in theorem proving, game playing, planning, etc…

• It can loosely be thought of as an algorithm that tries to make to non-grounded terms the same.

• P(X, 2) = P(1, Y) X=1 & Y=2 P(1, 2)

• P(X, X) = P(Y, 5) X=5 & Y=5 P (5, 5)

• P(X, Y) = P(2, Z) X=2 & Y=Z P (2, Z)

• See Artificial Intelligence (Russell & Norvig)

Page 11: Logic Programming and ILP

Prolog Rules

• Rules: <head> :- <body>

• Head: Single clause typically with variables

• Body: Conjunction of goals with variables

• Examples:

ancestor(X,Y) :- parent(X,Y)

ancestor(X,Y) :- parent(X,Z), parent(Z,Y)

ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y)

Page 12: Logic Programming and ILP

A Recursive Example(1)

parent(p1, p2). parent(p2, p3). parent(p3, p4). ancestor(X, Y) :- parent(X, Y). ancestor(X, Y) :- parent(X, Z), ancestor(Z, Y). | ?- ['c:\\projects\\languages\\code\\prolog\\ancestor.pl']. compiling c:/projects/languages/code/prolog/ancestor.pl for byte code... c:/projects/languages/code/prolog/ancestor.pl compiled, 5 lines read - 818 bytes written, 13 ms yes

Page 13: Logic Programming and ILP

A Recursive Example(2) | ?- trace. The debugger will first creep -- showing everything (trace) yes {trace} | ?- ancestor(p1, p4). 1 1 Call: ancestor(p1,p4) ? 2 2 Call: parent(p1,p4) ? 2 2 Fail: parent(p1,p4) ? 2 2 Call: parent(p1,_80) ? 2 2 Exit: parent(p1,p2) ? 3 2 Call: ancestor(p2,p4) ? 4 3 Call: parent(p2,p4) ? 4 3 Fail: parent(p2,p4) ? 4 3 Call: parent(p2,_129) ? 4 3 Exit: parent(p2,p3) ? 5 3 Call: ancestor(p3,p4) ? 6 4 Call: parent(p3,p4) ? 6 4 Exit: parent(p3,p4) ? 5 3 Exit: ancestor(p3,p4) ? 3 2 Exit: ancestor(p2,p4) ? 1 1 Exit: ancestor(p1,p4) ? true ? (63 ms) yes

Page 14: Logic Programming and ILP

Using Rules in Both Directions

// Find all ancestors

| ?- ancestor(X, p4).

X = p3 ? ;

X = p1 ? ;

X = p2 ? ;

no

// Find all descendants

| ?- ancestor(p1, X).

X = p2 ? ;

X = p3 ? ;

X = p4 ? ;

no

Page 15: Logic Programming and ILP

Accessing Elements of a List

| ?- [1, 2, 3] = [X | Y].

X = 1

Y = [2,3]

| ?- [1, 2, 3] = [_, X | Y].

X = 2

Y = [3]

Page 16: Logic Programming and ILP

Lists and Math (1)

count(0, []). count(Count, [Head|Tail]) :- count(TailCount, Tail), Count is TailCount + 1. sum(0, []). sum(Total, [Head|Tail]) :- sum(Sum, Tail), Total is Head + Sum. average(Average, List) :- sum(Sum, List), count(Count, List), Average is Sum/Count.

Page 17: Logic Programming and ILP

Solving Sudoku (1)

valid([]).

valid([Head|Tail]) :-

valid(Tail), fd_all_different(Head).

sudoku(S11, S12, S13, S14,

S21, S22, S23, S24,

S31, S32, S33, S34,

S41, S42, S43, S44,

Board) :-

Board = [S11, S12, S13, S14,

S21, S22, S23, S24,

S31, S32, S33, S34,

S41, S42, S43, S44],

fd_domain(Board, 1, 4),

Row1 = [S11, S12, S13, S14],

Row2 = [S21, S22, S23, S24],

Row3 = [S31, S32, S33, S34],

Row4 = [S41, S42, S43, S44],

Col1 = [S11, S21, S31, S41],

Col2 = [S12, S22, S32, S42],

Col3 = [S13, S23, S33, S43],

Col4 = [S14, S24, S34, S44],

Square1 = [S11, S12, S21, S22],

Square2 = [S13, S14, S23, S24],

Square3 = [S31, S32, S41, S42],

Square4 = [S33, S34, S43, S44],

valid([Row1, Row2, Row3, Row4]),

valid([Col1, Col2, Col3, Col4 ]),

valid([Square1, Square2, Square3, Square4]).

Page 18: Logic Programming and ILP

Solving Sudoku (2)

| ?- sudoku(_, _, 2, 3,

_, _, _, _,

_, _ ,_, _,

3, 4, _, _,

Solution).

Solution = [4, 1, 2, 3,

2, 3, 4, 1,

1, 2, 3, 4,

3, 4, 1, 2]

Page 19: Logic Programming and ILP

Solving Sudoku (3)

• Finite Domain variables: A new type of data is introduced: FD variables which can only take values in their domains. The initial domain of an FD variable is 0..fd_max_integer where fd_max_integer represents the greatest value that any FD variable can take.

• fd_domain(Board, 1, 4). Used to specify the range of values of each Sudoku cell.

• fd_all_different(X). Used to specify that all elements in the list must have distinct values.

Page 20: Logic Programming and ILP

Structure Inspection

| ?- functor(father(tom, harry), P, A).

A = 2

P = father

Yes

| ?- arg(1,father(tom, harry), A1).

A1 = tom

Yes

| ?- arg(2,father(tom, harry), A2).

A2 = harry

yes

| ?- functor(X, father, 2).

X = father(_, _)

Yes

| ?- father(tom, harry) =.. [X, Y, Z]. X = father Y = tom Z = harry yes | ?- X =.. [father, tom, harry]. X = father(tom, harry) yes | ?- X =.. [father, tom, harry], assertz(X). X = father(tom, harry) yes | ?- father(tom, harry). yes

functor, arg and =..

Page 21: Logic Programming and ILP

Meta-Logical Predicates

• Outside scope of first-order logic • Query and affect the state of the proof • Treat variables as objects • Convert data structures to goals

• Type Predicates: • var(<term>) • nonvar(<term>)

• Variables as objects: freeze & melt • Dynamically Affecting the Knowledge Base

• assert(<goal>) • retract(<goal>)

• The Meta-Variable Facility: call(<goal>) • Memoization: lemma(<goal>)

Page 22: Logic Programming and ILP

OPS5 • OPS5 is a rule-based or production system computer language, notable as the

first such language to be used in a successful expert system, the R1/XCON system used to configure VAX computers.

• The OPS family was developed in the late 1970s by Charles Forgy while at Carnegie Mellon University.

• Allen Newell's research group in artificial intelligence had been working on production systems for some time,

• Forgy's implementation, based on his Rete algorithm, was especially efficient, sufficiently so that it was possible to scale up to larger problems involving hundreds or thousands of rules.

• OPS5 uses a forward chaining inference engine.

• programs execute by scanning "working memory elements" (which are vaguely object-like, with classes and attributes) looking for matches with the rules in "production memory".

• Rules have actions that may modify or remove the matched element, create new ones, perform side effects such as output, and so forth. Execution continues until no more matches can be found.

Page 23: Logic Programming and ILP

miniKanren • miniKanren is an embedded Domain Specific Language for logic programming

• miniKanren is a simplified version of KANREN.

• First Introduced in The Reasoned Schemer by Daniel P. Friedman, William E. Byrd and Oleg Kiselyov

(MIT Press, 2005) • KANREN: is a declarative logic programming system with first-class relations, embedded in a pure

functional subset of Scheme. • KANREN has a set-theoretical semantics, true unions, fair scheduling, first-class relations, lexically-

scoped logical variables, depth-first and iterative deepening strategies. The system achieves high performance and expressivity without cuts.

• The core miniKanren language is very simple, with only three logical operators and one interface operator.

• miniKanren has been implemented in a growing number of host languages, including Scheme, Racket, Clojure, Haskell, Python, JavaScript, Scala, Ruby, OCaml, and PHP, among many other languages.

• miniKanren is designed to be easily modified and extended; extensions include Constraint Logic Programming, probabilistic logic programming, nominal logic programming, and tabling.

• http://minikanren.org/

Page 24: Logic Programming and ILP

Core.Logic

• Core.Logic is a Clojure based implementation of miniKanren written by David Nolen.

• https://github.com/clojure/core.logic

• Core.logic supports the following Logic Programming paradigms

– CLP: Constraint Logic Programming (CLP)

– CLP(FD): constraint logic programming over finite domains

– Tabling: Certain kinds of logic programs that would not terminate in Prolog will terminate in core.logic if you create a tabled goal.

– Nominal Logic Programming: Nominal logic programming makes it easier to write programs that must reason about binding and scope.

Page 25: Logic Programming and ILP

Unification

• Original Algorithm: Robinson (1965)

• Efficient Algorithm: Montanari (1982)

• Intuition: Make two terms the same

• Input: Two terms.

• Output: A set of bindings (aka substitutions)

• Example: P(x,

Page 26: Logic Programming and ILP

Unification Algorithm (from Wikipedia)

• A variable which is uninstantiated can be unified with an atom, a term, or another uninstantiated variable, thus effectively becoming its alias.

• In many modern Prolog dialects and in first-order logic, a variable cannot be unified with a term that contains it; this is the so-called occurs check.

• Two atoms can only be unified if they are identical.

• Similarly, a term can be unified with another term if the top function symbols and arities of the terms are identical and if the parameters can be unified simultaneously. Note that this is a recursive behavior.

Page 27: Logic Programming and ILP

Unification Applications: Type Inferencing (from Wikipedia)

• Unification is used during type inference, for instance in the functional programming language Haskell.

• Used for both type inferencing and type error detection. • The Haskell expression 1:['a','b','c'] is not correctly typed

– the list construction function ":" is of type a->[a]->[a] – the first argument "1" the polymorphic type variable "a" has to denote the type Int – "['a','b','c']" is of type [Char], – "a" cannot be both Char and Int at the same time.

• Unification for Type Inferencing

– Any type variable unifies with any type expression, and is instantiated to that expression. A specific theory might restrict this rule with an occurs check.

– Two type constants unify only if they are the same type. – Two type constructions unify only if they are applications of the same type constructor and all of

their component types recursively unify. – Due to its declarative nature, the order in a sequence of unifications is (usually) unimportant.

• Algorithm W: Hindley-Milner Type Inferencing. Unification + Constraint Satisfaction

Page 28: Logic Programming and ILP

Example Type Inferencing in Haskell

GHCi, version 7.4.2: http://www.haskell.org/ghc/ :? for help Loading package ghc-prim ... linking ... done. Loading package integer-gmp ... linking ... done. Loading package base ... linking ... done. ghci> 1:[2,3,4] [1,2,3,4] ghci> 1:['a', 'b', 'c'] <interactive>:3:1: No instance for (Num Char) arising from the literal `1' Possible fix: add an instance declaration for (Num Char) In the first argument of `(:)', namely `1' In the expression: 1 : ['a', 'b', 'c'] In an equation for `it': it = 1 : ['a', 'b', 'c'] instance Num Char where fromInteger x = chr (fromIntegral x) That would cast the '1' into the character with the ascii code 1. This is of course a terrible idea :-) Note: Can't declare a union type to do exactly what you want, as in Haskell we have tagged unions, not untagged ones. Tagged union: sum type corresponds to intuitionistic logical disjunction under the Curry–Howard correspondence.

Page 29: Logic Programming and ILP

Origins of Theorem Proving

• Roots of formalized logic go back to Aristotle.

• Frege's Begriffsschrift (1879) introduced both a complete propositional calculus and what is essentially modern predicate logic.

• His Foundations of Arithmetic, published 1884,[ expressed (parts of) mathematics in formal logic.

• This approach was continued by Russell and Whitehead in their influential Principia Mathematica, first published 1910–1913,[ and with a revised second edition in 1927.

• Russell and Whitehead thought they could derive all mathematical truth using axioms and inference rules of formal logic, in principle opening up the process to automatisation.

• In 1920, Thoralf Skolem simplified a previous result by Leopold Löwenheim, leading to the Löwenheim–Skolem theorem

• And in 1930, to the notion of a Herbrand universe and a Herbrand interpretation that allowed (un)satisfiability of first-order formulas (and hence the validity of a theorem) to be reduced to (potentially infinitely many) propositional satisfiability problems

Page 30: Logic Programming and ILP

Resolution Theorem Proving (from Wikipedia)

• In mathematical logic and automated theorem proving, resolution is a rule of inference leading to a refutation theorem-proving technique for sentences in propositional logic and first-order logic.

• In other words, iteratively applying the resolution rule in a suitable way allows for telling whether a propositional formula is satisfiable and for proving that a first-order formula is unsatisfiable.

• Attempting to prove a satisfiable first-order formula as unsatisfiable may result in a nonterminating computation; this problem doesn't occur in propositional logic.

• The resolution technique uses proof by contradiction and is based on the fact that any sentence in propositional logic can be transformed into an equivalent sentence in conjunctive normal form.

• In Boolean logic, a formula is in conjunctive normal form (CNF) or clausal normal form if it is a conjunction of clauses, where a clause is a disjunction of literals; otherwise put, it is an AND of ORs. As a normal form, it is useful in automated theorem proving. It is similar to the product of sums form used in circuit theory

Page 31: Logic Programming and ILP

Unification based Theorem Proving (from Wikipedia)

• One approach: Proofs by resolution refutation.

• Resolution Rule: Elimination of complementary literals

– e.g. (a V ¬b V c V b) produces (a V b)

• Modus ponens can be seen as a special case of resolution of a one-literal clause and a two-literal clause.

• The resolution rule can be traced back to Davis and Putnam (1960). however, their algorithm required to try all ground instances of the given formula

• This source of combinatorial explosion was eliminated in 1965 by John Alan Robinson's syntactical unification algorithm, which allowed one to instantiate the formula during the proof "on demand" just as far as needed to keep refutation completeness

Page 32: Logic Programming and ILP

Skolem Normal Form (from Wikipedia)

• In mathematical logic, reduction to Skolem normal form (SNF) is a method for removing existential quantifiers from formal logic statements, often performed as the first step in an automated theorem prover.

• Skolemization works by applying a second-order equivalence in conjunction to the definition of first-order satisfiability. The equivalence provides a way for "moving" an existential quantifier before a universal one.

• Intuitively, the sentence "for every x there exists a y such that " is converted into the equivalent form "there exists a function f mapping every x into y a such that, for every x it holds R(x, f(x)).

• Thoralf Skolem was a Norwegian mathematician in the late 1800’s.

Page 33: Logic Programming and ILP

First Order Logic to Normal Form (from Artificial Intelligence, Russell & Norvig, 1995)

• Eliminate implication

a ⇒ b becomes ¬a V b

• Move ¬ Inside

¬(a V b) becomes (¬a ∧ ¬b)

• Standardize variables

(∃x p(x)) V (∀x (g(x)) becomes (∃x1 p(x1)) V (∀x2 (g(x2))

• Move quantifiers left

p V ∀x q becomes ∀x p V q

• Skolemize

∀x person(x) => ∃y heart(y) ∧ has(x, y) becomes

∀x person(x) => heart(F(x)) ∧ has(x, F(x))

• Distribute ∧ over V

(a ∧ b) V c becomes (a V c) ∧ (b V c)

• Flatten nested conjunctions and disjunctions

(a V b) V c becomes (a V b V c)

• Convert disjunctions to implications

(¬a V ¬b V c V d) becomes (a ∧ b) => (c V d)

• See Russel&Norvig Chapter 9 for a resolution refutation proof that first converts to normal form.

Page 34: Logic Programming and ILP

Inductive Logic Programming

• Inductive logic programming (ILP) is a subfield of machine learning

• Uses logic programming representation uniformly

– hypotheses

– examples

– background knowledge

• Input: an encoding of the known background knowledge and a set of examples represented as a logical database of facts

• Output: hypothesized logic program which entails all the positive and none of the negative examples.

Page 35: Logic Programming and ILP

Decision Tree Learning

• Quinlan, J. R., (1986). Induction of Decision Trees.

• A tree can be "learned" by splitting the source set into subsets based on an attribute value test, then recursively repeated.

• This process of top-down induction of decision trees (TDIDT) is an example of a greedy algorithm, and it is by far the most common strategy for learning decision trees from data.

• Partitioning is based on Attribute Gain as measured by then Entropy of an attribute.

Page 36: Logic Programming and ILP

The ID3 Algorithm (from Wikipedia)

• Calculate the entropy of every attribute using the training set

• Split the set into subsets using the attribute for which entropy is minimum (maximum information gain)

• Make a decision tree node containing that attribute

• Recur on subsets using remaining attributes.

• C4.5 Algorithm (Quinlan) extends ID3 – Handling both continuous and discrete attributes

– Handling training data with missing attribute

– Handling attributes with differing costs.

– Pruning trees after creation - C4

Page 37: Logic Programming and ILP

Play Tennis Training Data Tom Mitchell, Machine Learning, Chapter 3

Outlook Temperature Humidity Wind Play Tennis

Sunny Hot High Weak No

Sunny Hot High Strong No

Overcast Hot High Weak Yes

Rain Mild High Weak Yes

Rain Cool Normal Weak Yes

Rain Cool Normal Strong No

Overcast Cool Normal Strong Yes

Sunny Mild High Weak No

Sunny Cool Normal Weak Yes

Rain Mild Normal Weak Yes

Sunny Mild Normal Strong Yes

Overcast Mild High Strong Yes

Overcast Hot Normal Weak Yes

Rain Mild High Strong No

Page 38: Logic Programming and ILP

Play Tennis Example

ML(15): (defvar dt (id3 'play-tennis a e))

Attribute-gains ((OUTLOOK 0.24674982) (HUMIDITY 0.1518355)

(WIND 0.048126996) (TEMPERATURE 0.029222548))

Attribute: OUTLOOK = RAIN,

Attributes (WIND HUMIDITY TEMPERATURE)

Attribute-gains ((WIND 0.9709506) (HUMIDITY 0.01997304)

(TEMPERATURE 0.01997304))

Attribute: WIND = WEAK,

Attributes (HUMIDITY TEMPERATURE)

Attribute: WIND = STRONG,

Attributes (HUMIDITY TEMPERATURE)

Attribute: OUTLOOK = OVERCAST,

Attributes (WIND HUMIDITY TEMPERATURE)

Attribute: OUTLOOK = SUNNY,

Attributes (WIND HUMIDITY TEMPERATURE)

Attribute-gains ((HUMIDITY 0.9709506) (TEMPERATURE 0.5709506)

(WIND 0.01997304))

Attribute: HUMIDITY = HIGH,

Attributes (WIND TEMPERATURE)

Attribute: HUMIDITY = NORMAL,

Attributes (WIND TEMPERATURE)

DT

ML(16): (util::print-tree dt) OUTLOOK rain WIND weak + strong - overcast + sunny HUMIDITY high - normal +

Page 39: Logic Programming and ILP

Decision Trees as Prolog Programs

play-tennis () :- outlook(rain), wind(weak)

play-tennis () :- outlook(overcast)

play-tennis () :- outlook(sunny), humidity(normal)

OUTLOOK rain WIND weak + overcast + sunny HUMIDITY normal +

Take each positive branch of the decision tree and add it as Prolog rule for the target attribute.

Page 40: Logic Programming and ILP

Sequential Covering Algorithms

Sequential-covering (target-attribute, attributes, examples, threshold)

learned-rules {}

rule learn-one-rule (target-attribute, attributes, examples)

while performance (rule, examples) > threshold

learned-rules learned-rules + rule

examples examples – (examples correctly classified by rule)

rule learn-one-rule (target-attribute, attributes, examples)

learned-rules <- sort learned-rules based on performance

return (learned rules)

Performance(h, target-attribute, examples)

entropy(subset of examples that match h wrt to target-attribute)

Page 41: Logic Programming and ILP

Learn-one-Rule1

Learn-one-Rule (target-attribute, attributes, examples) Initialize best-hypothesis to {} and candidate-hypotheses {best-hypothesis} While candidate-hypotheses not empty 1. Generate next most specific candidate-hypotheses all-constraints all constraints of the form (a = v) in examples new-candidate-hypotheses

for each h in candidate-hypotheses and each c in all-constraints create a specialization of h by adding c to it remove from new-candidate-hypotheses duplicates, inconsistent hypotheses and non-maximally specific

2. Update best-hypothesis for all h in new-candidate-hypotheses

If performance (h, examples, target-attribute) > performance(best-hypothesis, examples, target-attribute) then best-hypothesis h

3. Update candidate-hypotheses set candidate-hypotheses best k hypotheses according performance metric.

Return a rule of the form “if <best-hypothesis> then <prediction>” where prediction is the most frequently occurring value for target attribute amongst covered examples. 1Based on CN2 algorithm by Clark & Niblett (1989)

Page 42: Logic Programming and ILP

Learning Horn Clauses: FOIL

FOIL (target-predicate, predicates, examples) pos those examples for target-predicate is true

neg those examples for target-predicate is true

learned-rules {}

while pos is not empty do

learn a new rule

new-rule rule that predicts target predicate with no preconditions.

new-rule-neg neg

while new-rule-neg do

add a new literal to specialize new rule

candidate-literals new literal candidates based on predicates

best-literal argmax (FOIL-Gain (literal, new-rule)) over candidate-literals

add best-literal to the preconditions of new-rule

new-rule-neg subset of new-rule-neg which satisfy new-rule preconditions.

learned-rules learned-rules + new-rule

pos pos – (members of pos covered by new-rule)

return learned-rules

Page 43: Logic Programming and ILP

FOIL Information Gain Criteria

Gain(R0, R1) := t * ( log2(p1/(p1+n1)) - log2(p0/(p0+n0)) )

• R0 denotes a rule before adding a new literal.

• R1 is an extesion of R0.

• p0 denotes the number of positive examples, covered by R0,

• p1 the number of positive examples covered by R1.

• n0 and n1 are the number of negative examples, covered by the according rule.

• t is the number of positive examples, covered by both R0 and R1.

• http://www-ai.cs.uni-dortmund.de/kdnet/auto?self=$81d91e8ddbd8094353

Page 44: Logic Programming and ILP

Prolog Related References

• http://www.cuceinetwork.net/archivos/prolog/The_Art_of_Prolog.pdf

• http://www.gprolog.org/

• http://www.swi-prolog.org/

• http://en.wikipedia.org/wiki/OPS5

• https://mitpress.mit.edu/index.php?q=books/reasoned-schemer

• http://minikanren.org/

• https://github.com/clojure/core.logic

• https://github.com/clojure/core.logic/wiki/A-Core.logic-Primer

• https://github.com/swannodette/logic-tutorial

Page 45: Logic Programming and ILP

Unification Related References

• http://en.wikipedia.org/wiki/Unification_(computer_science)

• https://http://en.wikipedia.org/wiki/Automated_theorem_proving

• wiki.haskell.org/Type_inference