20
Training Examples No Strong High Mild Rain D14 Yes Weak Normal Hot Overcast D13 Yes Strong High Mild Overcast D12 Yes Strong Normal Mild Sunny D11 Yes Weak Normal Mild Rain D10 Yes Weak Normal Cool Sunny D9 No Weak High Mild Sunny D8 Yes Weak Normal Cool Overcast D7 No Strong Normal Cool Rain D6 Yes Weak Normal Cool Rain D5 Yes Weak High Mild Rain D4 Yes Weak High Hot Overcast D3 No Strong High Hot Sunny D2 No Weak High Hot Sunny D1 Play Golf Wind Humidity Temp. Outlook Day

Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Embed Size (px)

Citation preview

Page 1: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Training Examples

NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1

Play GolfWindHumidityTemp.OutlookDay

Page 2: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Entropy and Information Gain

• Information answers questions• The more clueless I am about the answer initially,

the more information is contained in the final answer.

• Scale: – 1 = completely clueless – the answer to Boolean

question with prior <0.5,0.5>– 0 bit = complete knowledge – the answer to Boolean

question with prior <1.0,0.0>– ? = answer to Boolean question with prior <0.75,0.25>– The concept of Entropy

Page 3: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Entropy

• S is a sample of training examples

• p+ is the proportion of positive examples

• p- is the proportion of negative examples

• Entropy measures the impurity of S

Entropy(S) = -p+ log2 p+ - p- log2 p-

Page 4: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Information Gain• Gain(S,A): expected reduction in entropy due to

sorting S on attribute A

Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

Page 5: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Information Gain• Gain(S,A): expected reduction in entropy due to

sorting S on attribute A

Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

Page 6: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Training Examples

NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1

Play GolfWindHumidityTemp.OutlookDay

Page 7: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Selecting the First Attribute

Humidity

High Normal

[3+, 4-] [6+, 1-]

S=[9+,5-]E=0.940

Gain(S,Humidity)=0.940-(7/14)*0.985 – (7/14)*0.592=0.151

E=0.985 E=0.592

Wind

Weak Strong

[6+, 2-] [3+, 3-]

S=[9+,5-]E=0.940

E=0.811 E=1.0

Gain(S,Wind)=0.940-(8/14)*0.811 – (6/14)*1.0=0.048

Humidity provides greater info. gain than Wind, w.r.t target classification.

Page 8: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Selecting the First Attribute

Outlook

Sunny Rain

[2+, 3-] [3+, 2-]

S=[9+,5-]E=0.940

Gain(S,Outlook)=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.971=0.247

E=0.971

E=0.971

Overcast

[4+, 0]

E=0.0

Page 9: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Selecting the First AttributeThe information gain values for the 4 attributes are:• Gain(S,Outlook) =0.247• Gain(S,Humidity) =0.151• Gain(S,Wind) =0.048• Gain(S,Temperature) =0.029

where S denotes the collection of training examples

Page 10: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Selecting the Next AttributeOutlook

Sunny Overcast Rain

Yes

[D1,D2,…,D14] [9+,5-]

Ssunny=[D1,D2,D8,D9,D11] [2+,3-]

? ?

[D3,D7,D12,D13] [4+,0-]

[D4,D5,D6,D10,D14] [3+,2-]

Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019

Page 11: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

ID3 AlgorithmOutlook

Sunny Overcast Rain

Humidity

High Normal

Wind

Strong Weak

No Yes

Yes

YesNo

[D3,D7,D12,D13]

[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]

Page 12: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Which attribute should we start with?

ID# Texture Temp Size Classification

1 Smooth Cold Large Yes

2 Smooth Cold Small No

3 Smooth Cool Large Yes

4 Smooth Cool Small Yes

5 Smooth Hot Small Yes

6 Wavy Cold Medium No

7 Wavy Hot Large Yes

8 Rough Cold Large No

9 Rough Cool Large Yes

10 Rough Hot Small No

11 Rough Warm Medium Yes

Page 13: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Which node is the best?

• Texture (smooth,wavy,rough)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +

2/11 * (-1/2*log1/2 – ½ *log1/2) +

4/11 * (-2/4*log2/4 – 2/4*log2/4)

= 5/11*(.722) + 2/11*1 + 4/11*1

= .874

Page 14: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Which node is the best?

• Temperature(cold,cool,hot,warm)4/11* ( -1/4*log1/4 – 3/4*log3/4) +

3/11 * (-3/3*log3/3 – 0/3 *log0/3) +

3/11 * (-2/3*log2/3 – 1/3 *log1/3) +

1/11 * (-1/1*log1/1 – 0/1*log0/1)

= 4/11*(.811) + 0 + 3/11*(.918) + 0

= .545

Page 15: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Which node is the best?

• Size (large,medium,small)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +

2/11 * (-1/2*log1/2 – ½ *log1/2) +

4/11 * (-2/4*log2/4 – 2/4*log2/4)

= 5/11*(.722) + 2/11*1 + 4/11*1

= .874

Page 16: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information
Page 17: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Learning over time

• How do you evolve knowledge over time when you learn little bit by little bit?– Abstract version – the “Frinkle”

Page 18: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

The Question

• The Question– How can we build this kind of representation

over time?

• The Answer– Rely on the concepts of false positives and false

negatives

Page 19: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

The idea

• False Positive– An example which is predicted to be positive but whose

known outcome is negative– The problem is that our hypothesis is too general.– The solution is to add another condition to our hypothesis.

• False Negative– An example which is predicted to be negative but whose

known outcome is positive– The problem is that our hypothesis is too restrictive.– The solution is to remove a condition to our hypothesis [or

to add disjunction]

Page 20: Training Examples. Entropy and Information Gain Information answers questions The more clueless I am about the answer initially, the more information

Creating a model one “case” at a time

ID# Texture Temp Size Classification

1 Smooth Cold Large Yes

2 Smooth Cold Small No

3 Smooth Cool Large Yes

4 Smooth Cool Small Yes

5 Smooth Hot Small Yes

6 Wavy Cold Medium No

7 Wavy Hot Large Yes

8 Rough Cold Large No

9 Rough Cool Large Yes

10 Rough Hot Small No

11 Rough Warm Medium Yes