Training Examples
NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1
Play GolfWindHumidityTemp.OutlookDay
Entropy and Information Gain
• Information answers questions• The more clueless I am about the answer initially,
the more information is contained in the final answer.
• Scale: – 1 = completely clueless – the answer to Boolean
question with prior <0.5,0.5>– 0 bit = complete knowledge – the answer to Boolean
question with prior <1.0,0.0>– ? = answer to Boolean question with prior <0.75,0.25>– The concept of Entropy
Entropy
• S is a sample of training examples
• p+ is the proportion of positive examples
• p- is the proportion of negative examples
• Entropy measures the impurity of S
Entropy(S) = -p+ log2 p+ - p- log2 p-
Information Gain• Gain(S,A): expected reduction in entropy due to
sorting S on attribute A
Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)
Information Gain• Gain(S,A): expected reduction in entropy due to
sorting S on attribute A
Gain(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)
Training Examples
NoStrongHighMildRainD14YesWeakNormalHotOvercastD13YesStrongHighMildOvercastD12YesStrongNormalMildSunnyD11YesWeakNormalMildRainD10YesWeakNormalCoolSunnyD9NoWeakHighMildSunnyD8YesWeakNormalCoolOvercastD7NoStrongNormalCoolRainD6YesWeakNormalCoolRainD5YesWeakHighMildRain D4 YesWeakHighHotOvercastD3NoStrongHighHotSunnyD2NoWeakHighHotSunnyD1
Play GolfWindHumidityTemp.OutlookDay
Selecting the First Attribute
Humidity
High Normal
[3+, 4-] [6+, 1-]
S=[9+,5-]E=0.940
Gain(S,Humidity)=0.940-(7/14)*0.985 – (7/14)*0.592=0.151
E=0.985 E=0.592
Wind
Weak Strong
[6+, 2-] [3+, 3-]
S=[9+,5-]E=0.940
E=0.811 E=1.0
Gain(S,Wind)=0.940-(8/14)*0.811 – (6/14)*1.0=0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.
Selecting the First Attribute
Outlook
Sunny Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]E=0.940
Gain(S,Outlook)=0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.971=0.247
E=0.971
E=0.971
Overcast
[4+, 0]
E=0.0
Selecting the First AttributeThe information gain values for the 4 attributes are:• Gain(S,Outlook) =0.247• Gain(S,Humidity) =0.151• Gain(S,Wind) =0.048• Gain(S,Temperature) =0.029
where S denotes the collection of training examples
Selecting the Next AttributeOutlook
Sunny Overcast Rain
Yes
[D1,D2,…,D14] [9+,5-]
Ssunny=[D1,D2,D8,D9,D11] [2+,3-]
? ?
[D3,D7,D12,D13] [4+,0-]
[D4,D5,D6,D10,D14] [3+,2-]
Gain(Ssunny , Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970Gain(Ssunny , Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570Gain(Ssunny , Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
ID3 AlgorithmOutlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
[D3,D7,D12,D13]
[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]
Which attribute should we start with?
ID# Texture Temp Size Classification
1 Smooth Cold Large Yes
2 Smooth Cold Small No
3 Smooth Cool Large Yes
4 Smooth Cool Small Yes
5 Smooth Hot Small Yes
6 Wavy Cold Medium No
7 Wavy Hot Large Yes
8 Rough Cold Large No
9 Rough Cool Large Yes
10 Rough Hot Small No
11 Rough Warm Medium Yes
Which node is the best?
• Texture (smooth,wavy,rough)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +
2/11 * (-1/2*log1/2 – ½ *log1/2) +
4/11 * (-2/4*log2/4 – 2/4*log2/4)
= 5/11*(.722) + 2/11*1 + 4/11*1
= .874
Which node is the best?
• Temperature(cold,cool,hot,warm)4/11* ( -1/4*log1/4 – 3/4*log3/4) +
3/11 * (-3/3*log3/3 – 0/3 *log0/3) +
3/11 * (-2/3*log2/3 – 1/3 *log1/3) +
1/11 * (-1/1*log1/1 – 0/1*log0/1)
= 4/11*(.811) + 0 + 3/11*(.918) + 0
= .545
Which node is the best?
• Size (large,medium,small)5/11 * ( -4/5*log4/5 – 1/5*log1/5) +
2/11 * (-1/2*log1/2 – ½ *log1/2) +
4/11 * (-2/4*log2/4 – 2/4*log2/4)
= 5/11*(.722) + 2/11*1 + 4/11*1
= .874
Learning over time
• How do you evolve knowledge over time when you learn little bit by little bit?– Abstract version – the “Frinkle”
The Question
• The Question– How can we build this kind of representation
over time?
• The Answer– Rely on the concepts of false positives and false
negatives
The idea
• False Positive– An example which is predicted to be positive but whose
known outcome is negative– The problem is that our hypothesis is too general.– The solution is to add another condition to our hypothesis.
• False Negative– An example which is predicted to be negative but whose
known outcome is positive– The problem is that our hypothesis is too restrictive.– The solution is to remove a condition to our hypothesis [or
to add disjunction]
Creating a model one “case” at a time
ID# Texture Temp Size Classification
1 Smooth Cold Large Yes
2 Smooth Cold Small No
3 Smooth Cool Large Yes
4 Smooth Cool Small Yes
5 Smooth Hot Small Yes
6 Wavy Cold Medium No
7 Wavy Hot Large Yes
8 Rough Cold Large No
9 Rough Cool Large Yes
10 Rough Hot Small No
11 Rough Warm Medium Yes