Training Examples. Entropy and Information Gain Information answers questions The more clueless I am...

Preview:

Citation preview

Training Examples

NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1

Play GolfWindHumidityTemp.OutlookDay

Entropy and Information Gain

• Information answers questions• The more clueless I am about the answer initially,

the more information is contained in the final answer.

• Scale: – 1 = completely clueless – the answer to Boolean

question with prior <0.5,0.5>– 0 bit = complete knowledge – the answer to Boolean

question with prior <1.0,0.0>– ? = answer to Boolean question with prior <0.75,0.25>– The concept of Entropy

Entropy

• S is a sample of training examples

• p+ is the proportion of positive examples

• p- is the proportion of negative examples

• Entropy measures the impurity of S

Entropy(S) = -p+ log2 p+ - p- log2 p-

Information Gain• Gain(S,A): expected reduction in entropy due to

sorting S on attribute A

Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

Information Gain• Gain(S,A): expected reduction in entropy due to

sorting S on attribute A

Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

Training Examples

NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1

Play GolfWindHumidityTemp.OutlookDay

Selecting the First Attribute

Humidity

High Normal

[3+, 4-] [6+, 1-]

S=[9+,5-]E=0.940

Gain(S,Humidity)=0.940-(7/14)*0.985 – (7/14)*0.592=0.151

E=0.985 E=0.592

Wind

Weak Strong

[6+, 2-] [3+, 3-]

S=[9+,5-]E=0.940

E=0.811 E=1.0

Gain(S,Wind)=0.940-(8/14)*0.811 – (6/14)*1.0=0.048

Humidity provides greater info. gain than Wind, w.r.t target classification.

Selecting the First Attribute

Outlook

Sunny Rain

[2+, 3-] [3+, 2-]

S=[9+,5-]E=0.940

Gain(S,Outlook)=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.971=0.247

E=0.971

E=0.971

Overcast

[4+, 0]

E=0.0

Selecting the First AttributeThe information gain values for the 4 attributes are:• Gain(S,Outlook) =0.247• Gain(S,Humidity) =0.151• Gain(S,Wind) =0.048• Gain(S,Temperature) =0.029

where S denotes the collection of training examples

Selecting the Next AttributeOutlook

Sunny Overcast Rain

Yes

[D1,D2,…,D14] [9+,5-]

Ssunny=[D1,D2,D8,D9,D11] [2+,3-]

? ?

[D3,D7,D12,D13] [4+,0-]

[D4,D5,D6,D10,D14] [3+,2-]

Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

ID3 AlgorithmOutlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

[D3,D7,D12,D13]

[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]

Which attribute should we start with?

ID# Texture Temp Size Classification

1 Smooth Cold Large Yes

2 Smooth Cold Small No

3 Smooth Cool Large Yes

4 Smooth Cool Small Yes

5 Smooth Hot Small Yes

6 Wavy Cold Medium No

7 Wavy Hot Large Yes

8 Rough Cold Large No

9 Rough Cool Large Yes

10 Rough Hot Small No

11 Rough Warm Medium Yes

Which node is the best?

• Texture (smooth,wavy,rough)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +

2/11 * (-1/2*log1/2 – ½ *log1/2) +

4/11 * (-2/4*log2/4 – 2/4*log2/4)

= 5/11*(.722) + 2/11*1 + 4/11*1

= .874

Which node is the best?

• Temperature(cold,cool,hot,warm)4/11* ( -1/4*log1/4 – 3/4*log3/4) +

3/11 * (-3/3*log3/3 – 0/3 *log0/3) +

3/11 * (-2/3*log2/3 – 1/3 *log1/3) +

1/11 * (-1/1*log1/1 – 0/1*log0/1)

= 4/11*(.811) + 0 + 3/11*(.918) + 0

= .545

Which node is the best?

• Size (large,medium,small)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +

2/11 * (-1/2*log1/2 – ½ *log1/2) +

4/11 * (-2/4*log2/4 – 2/4*log2/4)

= 5/11*(.722) + 2/11*1 + 4/11*1

= .874

Learning over time

• How do you evolve knowledge over time when you learn little bit by little bit?– Abstract version – the “Frinkle”

The Question

• The Question– How can we build this kind of representation

over time?

• The Answer– Rely on the concepts of false positives and false

negatives

The idea

• False Positive– An example which is predicted to be positive but whose

known outcome is negative– The problem is that our hypothesis is too general.– The solution is to add another condition to our hypothesis.

• False Negative– An example which is predicted to be negative but whose

known outcome is positive– The problem is that our hypothesis is too restrictive.– The solution is to remove a condition to our hypothesis [or

to add disjunction]

Creating a model one “case” at a time

ID# Texture Temp Size Classification

1 Smooth Cold Large Yes

2 Smooth Cold Small No

3 Smooth Cool Large Yes

4 Smooth Cool Small Yes

5 Smooth Hot Small Yes

6 Wavy Cold Medium No

7 Wavy Hot Large Yes

8 Rough Cold Large No

9 Rough Cool Large Yes

10 Rough Hot Small No

11 Rough Warm Medium Yes

Recommended