Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul

Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul

Machine Learning

Chapter 2. Concept Learning and The General-to-specific Ordering

Gun Ho Lee

Soongsil University, Seoul

Learning from examples General-to-specific ordering over hypotheses Version spaces and candidate elimination

algorithm Picking new examples The need for inductive bias

Note: simple approach assuming no noise,

illustrates key concepts

A Concept

Examples of Concepts– “birds”, “car”, “situations” in which I should

study more in order to pass the exam”

Concept – Some subset of objects or events defined over

a larger set, or – A boolean-valued function defined over this

larger set.– Concept “birds” is the subset of animals that

constitute birds.

A Concept Let there be a set of objects, X.

X = { 백구 , 야옹이 , 도그 , 강세이 }

A concept C is…

– A subset of X

C = dogs = { 백구 , 도그 , 강세이 }

– A function that returns 1 only for elements in the concept

C( 백구 ) = 1, C( 야옹이 ) = 0


Instance Representation

Represent an object (or instance) as an n-tuple of attributes

Example: Days (6-tuples)


Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport

1 Sunny Warm Normal Strong Warm Same No 2 Sunny Warm High Strong Warm Same Yes 3 Rainy Cold High Strong Warm Change No 4 Sunny Warm High Strong Cool Change Yes

Concept Learning

Learning – Inducing general functions from specific training


Concept learning– Acquiring the definition of a general category

given a sample of positive and negative training examples of the category

– Inferring a boolean-valued function from training examples of its input and output.

A Concept Learning Task

Target concept EnjoySport– “days on which 박지성 enjoys water sport”

Hypothesis– A vector of 6 constraints, specifying the values of the six


Sky, AirTemp, Humidity, Wind, Water, Forecast.

– <?, Cold, High, ?, ?, ?>

the hypothesis : 박지성은 일기상태 (cold, high humidity) 에서 수상 스포츠를 즐긴다 .

Sky, AirTemp, Humidity, Wind, Water, Forecast

Representing Hypotheses

Many possible representationsHere, h is conjunction of constraints on attributes

Each constraint can be a specific value (e.g., Water = Warm) don’t care (e.g., “Water =?”) no value allowed (e.g., “Water=Φ”)

For example, Sky AirTemp Humid Wind Water Forecst <Sunny ? ? Strong ? Same>

Task: Learn a hypothesis from a dataset

Example Concept Function

“Days on which my friend 박지성 enjoys his favorite water sport”

Sky Temp Humid Wind Water Forecast C(x)

sunny warm normal strong warm same 1

sunny warm high strong warm same 1

rainy cold high strong warm change 0

sunny warm high strong cool change 1



The Learning Task

Given: – Hypotheses space H: conjunction of constraints on attributes. E.g. conjunction of literals: < Sunny ? ? Strong ? Same >

– Target concept c: E.g., EnjoySport X {0,1}

– Instances X: set of items over which the concept is defined. E.g., days decribed by attributes: Sky, Temp, Humidity, Wind, Water, Forecast

• Training examples (positive/negative): <x,c(x)>• Training set D: positive, negative examples of the target function: <x1,c(x1)>,…, <xn,c(xn)>

Determine:– A hypothesis h in H such that h(x) = c(x), for all x in X

Assumption 1

We will explore the space of all conjunctions. We assume the target concept falls within this space.

Target concept c

H, Hypotheses space

Assumption 2

A hypothesis close to target concept c obtained after

seeing many training examples will result in high

accuracy on the set of unobserved examples.

Training set DHypothesis h is good

Complement set D’Hypothesis h is good

Inductive learning hypothesis

Inductive Learning Hypothesis Learning task is to determine h identical to c over the

entire set of instances X.

But the only information about c is its value over D (training set).

Inductive learning algorithms can at best guarantee that the induced h fits c over D.

Inductive learning hypothesis– Any good hypothesis over a sufficiently large set of training

examples will also approximate the target function well over unseen examples.

Concept Learning as Search Search

– Find a hypothesis that best fits training examples– Efficient search in hypothesis space (finite/infinite)

Search space in EnjoySport• Sky has 3 (Sunny, Cloudy, and Rainy)• Temp has 2 (Warm and Cold)• Humidity has 2 (Normal and High)• Wind has 2 (Strong and Weak)• Water has 2 (Warm and Cool)• Forecast has 2 (Same and Change)

– 3x2x2x2x2x2 = 96 distinct instances – 5x4x4x4x4x4 = 5120 syntactically distinct hypotheses within H (considering Φ

and ? in addition)

<Sky AirTemp Humid Wind Water Forecst>

Concept Generality

A concept P is more general than or equal to another concept Q iff the set of instances represented by P includes the set of instances represented by Q.


Wolf Pig Dog

White_fang Lassie




General to Specific Order

Consider two hypotheses:

– h1=< Sunny,?,?,Strong,?,?>

– h2=< Sunny,?,?, ?, ?,?>

Definition: hj is more general than or equal to hk iff:

This imposes a partial order on a hypothesis space.

1)(1)( xhxhxhh jkkj


Instance, Hypotheses, and More-General-Than

The Most General Hypothesis : < ?, ?, ?, ?, ?, ? >

The Most Specific Hypothesis : < Ø, Ø, Ø, Ø, Ø, Ø >

x1=< Sunny,Warm,High,Strong,Cool,Same>

x2=< Sunny,Warm,High,Light,Warm,Same>

h1=< Sunny, ?, ?, Strong, ? ,?>

h2=< Sunny, ?, ?, ?, ?, ?>

h3=< Sunny, ?, ?, ?, Cool,?>







h2 h1

h2 h3



Find-S Algorithm

1. Initialize h to the most specific hypothesis in H

2. For each positive training instance x– For each attribute constraint ai in h

If the constraint ai in h is satisfied by x

Then do nothing

Else replace ai in h by the next more general constraint

satisfied by x

3. Output hypothesis h

Finding a Maximally Specific Hypothesis

Hypothesis Space Search by Find-S

Instances Hypotheses




h0=< Ø, Ø, Ø, Ø, Ø, Ø,>


x1=<Sunny,Warm,Normal,Strong,Warm,Same> +


h1=< Sunny,Warm,Normal,Strong,Warm,Same>

x3=<Rainy,Cold,High,Strong,Warm,Change> -



x2=<Sunny,Warm,High,Strong,Warm,Same> +


h2,3=< Sunny,Warm, ?, Strong,Warm,Same>


x4=<Sunny,Warm,High,Strong,Cool,Change> +


h4=< Sunny,Warm, ?, Strong, ?, ?>

Properties of Find-SFind-S

Ignores every negative example (no revision to h required in response to negative examples).

Guaranteed to output the most specific hypothesis consistent with the positive training examples (for conjunctive hypothesis space).

Final h also consistent with negative examples provided the target c is in H and no error in D.

Weaknesses of Find-S

Has the learner converged to the correct target concept ? No way to know whether the solution is unique.

Why prefer the most specific hypothesis? How about the most general hypothesis?

Are the training examples consistent ? Training sets containing errors or noise can severely mislead the algorithm Find-S.

What if there are several maximally specific consistent hypotheses? No backtrack to explore a different branch of partial ordering.

Partial order of hypotheses I

Partial order of hypotheses II

Partial order of hypotheses III

The space of hypotheses

The space of hypotheses I

The space of hypotheses II

The space of hypotheses III

A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example <x, c(x)> in D.

Consistent(h, D) ≡ ( <∀ x, c(x)> D) ∈ h(x) = c(x)

The version space, VSH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D.

VSH,D ≡ {h ∈ H | Consistent(h, D)}

Version Space

Version Spaces

A hypothesis h is consistent with a set of training examples D of target concept c if and only if h(x) = c(x) for each training example <x, c(x)> in D.

Consistent(h, D) ≡ ( <∀ x, c(x)> D) ∈ h(x) = c(x)

The version space, V SH,D, with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with all training examples in D.

VSH,D ≡ {h ∈ H | Consistent(h, D)}

The List-Then-Eliminate Algorithm:

1. VersionSpace a list containing every hypothesis in H

2. For each training example, <x, c(x)>

remove from VersionSpace any hypothesis h for which

h(x) c(x)

3. Output the list of hypotheses in VersionSpace

Drawbacks of List-Then-Eliminate

The algorithm requires exhaustively enumerating all hypotheses in H– An unrealistic approach ! (full search)

If insufficient (training) data is available, the algorithm will output a huge set of hypotheses consistent with the observed data– 학습 data 가 불충분할때 지나치게 많은 hypotheses 를 양산


Example Version Space


{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }G:

<Sunny,?,?,Strong,?,?> <Sunny,Warm,?,?,?,?> <?,Warm,?,Strong,?,?>

x1 = <Sunny Warm Normal Strong Warm Same> +x2 = <Sunny Warm High Strong Warm Same> +x3 = <Rainy Cold High Strong Warm Change> -x4 = <Sunny Warm High Strong Cool Change> +

Representing Version Spaces

The General boundary, G, of version space VSH,D is the set of

its maximally general members

The Specific boundary, S, of version space VSH,D is the set of

its maximally specific members

Every member of the version space lies between these


VSH,D = {h ∈ H | (∃s ∈ S)(∃g ∈ G) (g ≥ h ≥ s)}

where x ≥ y means x is more general or equal to y

Relevant bounds

Basic Idea of Candidate Elimination Algorithm

1. Initialize G to the set of maximally general hypotheses in H

2. Initialize S to the set of maximally specific hypotheses in H

3. For each training example x, do If x is positive: generalize S if necessary If x is negative: specialize G if necessary

Candidate Elimination Algorithm (1/2)

G ← maximally general hypotheses in HS ← maximally specific hypotheses in H

For each training example d, do If d is a positive example

– G 로 부터 학습 data d 와 일관하지 않는 가설은 제거한다 . – 학습 data d 와 일관하지 않는 각 가설 s(s ∈ S) 에 대하여

• Remove s from S• 다음과 같이 최소로 일반화된 가설 h(all minimal generalizations h) 를 S 에

추가한다 .

1. 가설 h 가 학습 data d 와 일관하고 2. G 의 일부가 가설 h 보다 일반화 (general) 되어 있는 경우 만약 , S 의 다른 가설 보다 더 일반화 된 어떤 가설이 있다면 삭제한다 . (Specific boundary 유지 )



inconsistent with d from G

Add minimal generalizations

Candidate Elimination Algorithm (2/2)

If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)

• Remove g from G

• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .

1. 가설 h 가 학습 data d 에 일관하고 ,

2. S 의 일부가 가설 h 보다 더 특수화 되어 있는• 만약 , G 의 다른 가설 보다 덜 일반화 ( less general) 된 어떤 가설이

있다면 G 에서 삭제한다 . (General Boundary 유지 )




inconsistent with d

Add minimal specializations

Candidate-Elimination Algorithm

– When does this halt?– If S and G are both singleton sets, then:

• if they are identical, output value and halt. • if they are different, the training cases were inconsistent.

Output this and halt.

– Else continue accepting new training examples.

Example Candidate Elimination

• Instance space: integer points in the x,y plane with 0 x,y 10• hypothesis space : rectangles, that means hypotheses are of the form a x b , c y d , assume

a b




Example Candidate Elimination

• examples = {ø} • G= {a, b, c, d} = {0,10,0,10}• S={ø}





Page 44: Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul


Example Candidate Elimination

• examples = {(3,4),+} • G={(0,10,0,10)}• S={(3,3,4,4)}






0 10

Example Trace

First initialize the S and G sets:

S0 : 0,0,0,0,0,0>

G0 : { < ?,?,?,?,?,?> }

Given 1st Example

The first example is positive:

< <Sunny, Warm, Normal, Strong, Warm, Same>, Yes>

h: <Sunny, Warm, Normal, Strong, Warm, Same>,

S0 : 0,0,0,0,0,0>

S1 : { <Sunny, Warm, Normal, Strong, Warm, Same> }

G0, G1 : { < ?,?,?,?,?,?> }

For each training example d, do If d is a positive example

– G 로 부터 학습 data d 와 일관하지 않는 가설은 제거한다 . – 학습 data d 와 일관하지 않는 각 가설 s(s ∈ S) 에 대하여

• Remove s from S• 다음과 같이 최소로 일반화된 가설 h(all minimal generalizations h) 를 S 에 추가한다 .

1. 가설 h 가 학습 data d 와 일관하고 2. G 의 일부가 가설 h 보다 일반화 (general) 되어 있는 경우 만약 , S 의 다른 가설 보다 더 일반화 된 어떤 가설이 있다면 삭제한다 .(Specific boundary 유지 )

(g ≥ h ≥ s) ?

Candidate Elimination Algorithm

Given 2nd Example

The second example is positive:

< <Sunny,Warm,High, Strong,Warm, Same>, Yes>

S1 : { <Sunny, Warm, Normal, Strong, Warm, Same> }

S2 : { <Sunny, Warm, ?, Strong, Warm, Same> }

G1, G2 : { < ?,?,?,?,?,?> }

For each training example d, do If d is a positive example

– G 로 부터 학습 data d 와 일관하지 않는 가설은 제거한다 . – 학습 data d 와 일관하지 않는 각 가설 s(s ∈ S) 에 대하여

• Remove s from S• 다음과 같이 최소로 일반화된 가설 h(all minimal generalizations h) 를 S 에 추가한다 .

1. 가설 h 가 학습 data d 와 일관하고 2. G 의 일부가 가설 h 보다 일반화 (general) 되어 있는 경우 만약 , S 의 다른 가설 보다 더 일반화 된 어떤 가설이 있다면 삭제한다 .(Specific boundary 유지 )

h: <Sunny, Warm, ?, Strong, Warm, Same>

(g ≥ h ≥ s) ?

Candidate Elimination Algorithm

Given 3rd Example

The third example is negative:

< <Rainy, Cold, High, Strong, Warm, Change>, No>

S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }

G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}

G2: { < ?,?,?,?,?,?> }

If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)

• Remove g from G

• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .

1. 가설 h 가 학습 data d 에 일관하고 ,

2. S 의 일부가 가설 h 보다 더 특수화 되어 있는• 만약 , G 의 다른 가설 보다 덜 일반화 ( less general) 된 어떤 가설이 있다면 G

에서 삭제한다 . (General Boundary 유지 )

(g ≥ h ≥ s) ?

Candidate Elimination Algorithm

Given 3rd Example


The third example is negative:

< <Rainy, Cold, High, Strong, Warm, Change>, No>

S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }

가능한 가설 h 들 {<Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}, <?,?, Normal,?,?,? >, <?, ?, ?, Strong, ?, ?>, <?, ?, ?, Warm, ?>}

G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}, 은 가능 ?

<?,?, Normal,?,?,? >, <?, ?, ?, Strong, ?, ?>, <?, ?, ?, Warm, ?> 은 불가능 ?

G2: { < ?,?,?,?,?,?> }

Candidate Elimination Algorithm

Given 3rd Example


따라서 S 로부터 삭제할 가설은 없다 !!

If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다

Candidate Elimination Algorithm

Given 3rd Example


If d is a negative example– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)

• Remove g from G

• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .

1. 가설 h 가 학습 data d 에 일관하고 ,

2. S 의 일부가 가설 h 보다 더 특수화 되어 있는

Candidate Elimination Algorithm

Why is <?,?, Normal,?,?,? > not included in G ?

Given 3rd Example

학습 data d : < <Rainy, Cold, High, Strong, Warm, Change>, No>

S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }

가설 h: <?,?, Normal,?,?,? >, yes 의 경우 h(x)=no, d(x) =no

<?,?, Normal,?,?,? > ≥ <Sunny, Warm, ?, Strong, Warm, Same> 인가 ?

VSH,D = {h ∈ H | (∃s ∈ S)(∃g ∈ G) (g ≥ h ≥ s)}

where x ≥ y means x is more general or equal to y

Candidate Elimination Algorithm

Given 3rd Example

The third example is negative:

< <Rainy, Cold, High, Strong, Warm, Change>, No>

S2, S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }

G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same>}

G2: { < ?,?,?,?,?,?> }

If d is a negative example– S 로 부터 학습 data d 와 일관하지 않는 가설은 삭제한다– 학습 data d 와 일관하지 않는 가설 g(g ∈ G)

• Remove g from G

• 다음과 같이 최소로 특수화된 모든 가설 h(all minimal specializations h) 를 G 에 추가 한다 .

1. 가설 h 가 학습 data d 에 일관하고 ,

2. S 의 일부가 가설 h 보다 더 특수화 되어 있는• 만약 , G 의 다른 가설 보다 덜 일반화 ( less general) 된 어떤 가설이 있다면 G

에서 삭제한다 . (General Boundary 유지 )

Candidate Elimination Algorithm

Given 4th Example

The 4th example is negative:

< <Sunny,Warm,High, Strong,Cool,Change ,Yes>

S3 : {<Sunny, Warm, ?, Strong, Warm, Same> }

S4 : {<Sunny, Warm, ?, Strong, ?, ?> }

G4 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>}

G3 : { <Sunny, ?, ?, ?, ?, ?>, <?, Warm, ?, ?, ?, ?>, <?, ?, ?, ?, ?, Same> }

(g ≥ h ≥ s) ?

Candidate Elimination Algorithm

<Sunny, Warm, ?, Strong, Warm, Same>S2

= G2

Example Trace

<Ø, Ø, Ø, Ø, Ø, Ø>S0

<?, ?, ?, ?, ?, ?>G0

d1: <Sunny, Warm, Normal, Strong, Warm, Same, Yes>

d2: <Sunny, Warm, High, Strong, Warm, Same, Yes>

d3: <Rainy, Cold, High, Strong, Warm, Change, No>

d4: <Sunny, Warm, High, Strong, Cool, Change, Yes>= S3

<Sunny, ?, ?, ?, ?, ?> <?, Warm, ?, ?, ?, ?> <?, ?, ?, ?, ?, Same>G3

<Sunny, ?, ?, Strong, ?, ?> <Sunny, Warm, ?, ?, ?, ?> <?, Warm, ?, Strong, ?, ?>

<Sunny, Warm, ?, Strong, ?, ?>S4

G4 <Sunny, ?, ?, ?, ?, ?>

<?, Warm, ?, ?, ?, ?>

<Sunny, Warm, Normal, Strong, Warm, Same>S1

= G1

Page 56: Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul


The hypothesis space after four cases

Candidate Elimination Algorithm

The hypothesis space after four cases

Candidate Elimination Algorithm

Remarks on Candidate Elimination

What training example should the learner request next?

Will the CE algorithm converge to the correct hypothesis ?

How can partially learned concepts be used?

Candidate Elimination Algorithm

What Next Training Example?

Who Provides Examples?

Two methods– Fully supervised learning: External teacher provides all

training examples (input + correct output)– Learning by query: The learner generates instances

(queries) by conducting experiments, then obtains the correct classification for this instance from an external oracle (nature or a teacher).

Negative training examples specializes G, positive ones generalize S.

When Does CE Converge?

Will the Candidate-Elimination algorithm converge to the correct hypothesis?

Prerequisites– 1. No error in training examples– 2. The target hypothesis exists which correctly describes c(x).

If S and G boundary sets converge to an empty set, this means there is no hypothesis in H consistent with observed examples.

(S 와 G 모두 empty set 에 이르게 된다면 학습된 예제들과는 부합하는 hypothesis 이 없다는 것을 의미한다 .)

Can a partially learned classifier be used?

How to Use Partially Learned Concepts?

Suppose the learner is asked to classify the four new instances

shown in the following table.

Instance Sky AirTemp Humidity Wind Water Forecast EnjoySport

A Sunny Warm Normal Strong Cool Change + 6/0 B Rainy Cold Normal Light Warm Same - 0/6 C Sunny Warm Normal Light Warm Same ? 3/3 D Sunny Cold Normal Strong Warm Same ? 2/4


{<Sunny,?,?,?,?,?>, <?,Warm,?,?,?>, }G:

<Sunny,?,?,Strong,?,?> <Sunny,Warm,?,?,?,?> <?,Warm,?,Strong,?,?>

Page 64: Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul


Can a partially learned classifier be used?

A Biased Hypothesis Space


x1 = <Sunny Warm Normal Strong Cool Change> +x2 = <Cloudy Warm Normal Strong Cool Change> +

x3 = <Rainy Warm Normal Strong Cool Change> -

S2 : { <?, Warm, Normal, Strong, Cool, Change> } overly gerenal !!, incorrectly covers x3

S3 : {} The third example x3 contradicts the already overly

general hypothesis space specific boundary S2.

We have Biased the learner to consider only conjunctive hypothesis !!

An UnBiased Learner

Idea: Choose H that expresses every teachable concept (i.e., H is the

power set of X) Consider H' = disjunctions, conjunctions, negations over previous H. E.g., <Sunny Warm Normal ? ? ?> ∨<? ? ? ? ? Change>

What are S, G in this case? S ← G ←

target concept 들이 H 내에 존재하기

위해서는 모든 teachable concept 을

표현이 가능하도록 해야 함 !!

Unbiased Learner

Assume positive examples (x1, x2, x3) and negative examples (x4, x5)

S : { (x1 v x2 v x3) } G : { (x4 v x5) }

How would we classify some new instance x6 ?

For any instance not in the training exampleshalf of the version space says +the other half says –

=> To learn the target concept, one would have to present every single instance in X as a training example (Rote learning)

What Justifies this Inductive Leap?

New examples

d1: <Sunny Warm Normal Strong Cool Change>, Yes

d2: <Sunny Warm Normal Light Warm Same>, No

S : {<Sunny Warm Normal ? ? ?> +}

Overly general hypothesis


Inductive bias

• Our hypothesis space is unable to represent a simple disjunctive target concept : (Sky=Sunny) v (Sky=Cloudy)

x1 = <Sunny Warm Normal Strong Cool Change> +S1 : { <Sunny, Warm, Normal, Strong, Cool, Change> }

x2 = <Cloudy Warm Normal Strong Cool Change> +S2 : { <?, Warm, Normal, Strong, Cool, Change> }

x3 = <Rainy Warm Normal Strong Cool Change> -S3 : {}

The third example x3 contradicts the already overly general hypothesis space specific boundary S2.

Overly general hypothesis


Why believe we can classify the unseen ?

S : {<Sunny Warm Normal ? ? ?> +}

Unseen example: <Sunny Warm Normal Strong Warm Same>, ….

Why Inductive learning hypothesis ?

Why believe we can classify the unseen ?

S : {<Sunny Warm Normal ? ? ?> +}

Unseen example: <Sunny Warm Normal Strong Warm Same>, ….

Why Inductive learning hypothesis ?

Inductive learning hypothesis: “If the hypothesis works for enough data then it will work on new examples.”

Page 72: Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul


Inductive bias The inductive bias of a learning algorithm is the set

of assumptions that the learner uses to predict outputs given inputs that it has not encountered (Mitchell, 1980).


– Occam’s Razor

– Target concept c ∈ H (hypothesis space) of candidate-elimination algorithm

Inductive Bias

Consider concept learning algorithm L instances X, target concept c training examples Dc = {<x, c(x)>} let L(xi, Dc) denote the classification assigned to the instance xi by

L after training on data Dc.

Definition:The inductive bias of L is any minimal set of assertions B suchthat for any target concept c and corresponding training

examples Dc

(∀xi ∈ X)[(B ∧ Dc ∧ xi) ├ L(xi, Dc)]where A├ B means A logically entails B

Page 74: Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Gun Ho Lee Soongsil University, Seoul


Inductive bias II

Inductive bias II

Inductive Systems and EquivalentDeductive Systems

Candidate EliminationAlgorithm

Using HypothesisSpace H

Inductive System

Theorem Prover

Equivalent Deductive System

Training Examples

New Instance

Training Examples

New Instance

Assertion { c H }

Inductive bias made explicit

Classification of New Instance(or “Don’t Know”)

Classification of New Instance(or “Don’t Know”)

Three Learners with Different Biases

Rote Learner– Weakest bias: anything seen before, i.e., no bias

– Store examples

– Classify x if and only if it matches previously observed example

Version Space Candidate Elimination Algorithm– Stronger bias: concepts belonging to conjunctive H

– Store extremal generalizations and specializations

– Classify x if and only if it “falls within” S and G boundaries (all members agree)

Find-S– Even stronger bias: most specific hypothesis

– Prior assumption: any instance not observed to be positive is negative

– Classify x based on S set

Summary Points

1. Concept learning as search through H

2. General-to-specific ordering over H

3. Version space candidate elimination algorithm

4. S and G boundaries characterize learner’s uncertainty

5. Learner can generate useful queries

6. Inductive leaps possible only if learner is biased

7. Inductive learners can be modelled by equivalent

deductive systems