Upload
harold-jordan
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Ch10 Machine Learning: Symbol-Based
Dr. Bernard Chen Ph.D.University of Central Arkansas
Spring 2011
Machine Learning Outline
The book present four chapters on machine learning, reflecting four approaches to the problem: Symbol Based Connectionist Genetic/Evolutionary Stochastic
Ch.10 Outline
A framework for Symbol-Based Learning
ID3 Decision Tree Unsupervised Learning
The Framework for Symbol-Based Learning
The Framework Example
Data
The representation: Size(small)^color(red)^shape(round) Size(large)^color(red)^shape(round)
The Framework Example A set of operations:
Based on Size(small)^color(red)^shape(round)
replace a single constant with a variable produces the generalizations:
Size(X)^color(red)^shape(round) Size(small)^color(X)^shape(round)Size(small)^color(red)^shape(X)
The Framework Example
The concept space The learner must search this space to
find the desired concept. The complexity of this concept space
is a primary measure of the difficulty of a learning problem
The Framework Example
The Framework Example Heuristic search:
Based on Size(small)^color(red)^shape(round)
The learner will make that example a candidate “ball” concept; this concept correctly classifies the only positive instance
If the algorithm is given a second positive instance
Size(large)^color(red)^shape(round)
The learner may generalize the candidate “ball” concept to Size(Y)^color(red)^shape(round)
Learning process
The training data is a series of positive and negative examples of the concept: examples of blocks world structures that fit category, along with near misses.
The later are instances that almost belong to the category but fail on one property or relation
Examples and near misses for the concept arch
Examples and near misses for the concept arch
Examples and near misses for the concept arch
Examples and near misses for the concept arch
Learning process This approach is proposed by Patrick
Winston (1975) The program performs a hill climbing
search on the concept space guided by the training data
Because the program does not backtrack, its performance is highly sensitive to the order of the training examples
A bad order can lead the program to dead ends in the search space
Ch.10 Outline
A framework for Symbol-Based Learning
ID3 Decision Tree Unsupervised Learning
ID3 Decision Tree ID3, like candidate elimination, induces
concepts from examples
It is particularly interesting for Its representation of learned knowledge Its approach to the management of
complexity Its heuristic for selecting candidate concepts Its potential for handling noisy data
ID3 Decision Tree
ID3 Decision Tree
The previous table can be represented as the following decision tree:
ID3 Decision Tree In a decision tree, each internal node represents a test on some
property Each possible value of that property corresponds to a branch of the
tree Leaf nodes represents classification, such as low or moderate risk
ID3 Decision Tree
A simplified decision tree for credit risk management
ID3 Decision Tree ID3 constructs decision trees in a top-down
fashion.
ID3 selects a property to test at the current node of the tree and uses this test to partition the set of examples
The algorithm recursively constructs a sub-tree for each parturition
This continues until all members of the partition are in the same class
ID3 Decision Tree
For example, ID3 selects income as the root property for the first step
ID3 Decision Tree
ID3 Decision Tree How to select the 1st node? (and the
following nodes)
ID3 measures the information gained by making each property the root of current subtree
It picks the property that provides the greatest information gain
ID3 Decision Tree
If we assume that all the examples in the table occur with equal probability, then: P(risk is high)=6/14 P(risk is moderate)=3/14 P(risk is low)=5/14
ID3 Decision Tree
I[6,3,5]=
Based on
531.1)14
5(log
14
5)
14
3(log
14
3)
14
6(log
14
6)5,3,6()( 222 IDInfo
n
iii mpmpMI
12 ))((log)()(
ID3 Decision Tree
ID3 Decision Tree
564.0)5,1,0(14
6
)0,2,2(14
4)0,0,4(
14
4)(
I
IIDInfoincome
0)4
0(log
4
0)4
0(log
4
0)4
4(log
4
4)0,0,4()( 222 IDInfo
0.1)4
0(log
4
0)4
2(log
4
2)4
2(log
4
2)0,2,2()( 222 IDInfo
650.0)6
5(log
6
5)6
1(log
6
1)6
0(log
6
0)5,1,0()( 222 IDInfo
ID3 Decision Tree
The information gain form income is:Gain(income)= I[6,3,5]-E[income]= 1.531-
0.564=0.967
Similarly, Gain(credit history)=0.266 Gain(debt)=0.063 Gain(colletral)=0.206
ID3 Decision Tree
Since income provides the greatest information gain, ID3 will select it as the root of the tree
Attribute Selection Measure: Information Gain (ID3/C4.5) Select the attribute with the highest
information gain Let pi be the probability that an
arbitrary tuple in D belongs to class Ci, estimated by |Ci, D|/|D|
Expected information (entropy) needed to
classify a tuple in D:)(log)( 2
1i
m
ii ppDInfo
Attribute Selection Measure: Information Gain (ID3/C4.5) Information needed (after using A
to split D into v partitions) to classify D:
Information gained by branching on attribute A
)(||
||)(
1j
v
j
jA DI
D
DDInfo
(D)InfoInfo(D)Gain(A) A
ID3 Decision Tree Pseudo Code
Another Decision Tree Example
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
Decision Tree Example Info(Tenured)=I(3,3)=
log2(12)=log12/log2=1.07918/0.30103=3.584958.
Teach you what is log2 http://www.ehow.com/how_5144933_calculate-log.html
Convenient tool: http://web2.0calc.com/
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
Decision Tree Example InfoRANK (Tenured)=
3/6 I(1,2) + 2/6 I(1,1) + 1/6 I(1,0)=3/6 * ( ) + 2/6 (1) + 1/6 (0)= 0.79
3/6 I(1,2) means “Assistant Prof” has 3 out of 6 samples, with 1 yes’s and 2 no’s.
2/6 I(1,1) means “Associate Prof” has 2 out of 6 samples, with 1 yes’s and 1 no’s.
1/6 I(1,0) means “Professor” has 1 out of 6 samples, with 1 yes’s and 0 no’s.
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
Decision Tree Example InfoYEARS (Tenured)=
1/6 I(1,0) + 2/6 I(0,2) + 1/6 I(0,1) + 2/6 I (2,0)= 0
1/6 I(1,0) means “years=2” has 1 out of 6 samples, with 1 yes’s and 0 no’s.
2/6 I(0,2) means “years=3” has 2 out of 6 samples, with 0 yes’s and 2 no’s.
1/6 I(0,1) means “years=6” has 1 out of 6 samples, with 0 yes’s and 1 no’s.
2/6 I(2,0) means “years=7” has 2 out of 6 samples, with 2 yes’s and 0 no’s.
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
Ch.10 Outline
A framework for Symbol-Based Learning
ID3 Decision Tree Unsupervised Learning
Unsupervised Learning The learning algorithms discussed so far
implement forms of supervised learning
They assume the existence of a teacher, some fitness measure, or other external method of classifying training instances
Unsupervised Learning eliminates the teacher and requires that the learners form and evaluate concepts their own
Unsupervised Learning
Science is perhaps the best example of unsupervised learning in humans
Scientists do not have the benefit of a teacher.
Instead, they propose hypotheses to explain observations,
Unsupervised Learning The clustering problem starts with (1) a
collection of unclassified objects and (2) a means for measuring the similarity of objects
The goal is to organize the objects into classes that meet some standard of quality, such as maximizing the similarity of objects in the same class
Unsupervised Learning
Numeric taxonomy is one of the oldest approaches to the clustering problem
A reasonable similarity metric treats each object as a point in n-dimensional space
The similarity of two objects is the Euclidean distance between them in this space
Unsupervised Learning Using this similarity metric, a common clustering
algorithm builds clusters in a bottom-up fashion, also known as agglomerative clustering:
Examining all pairs of objects, select the pair with the
highest degree of similarity, and mark that pair a cluster
Defining the features of the cluster as some function (such as average) of the features of the component members and then replacing the component objects with this cluster definition
Repeat this process on the collection of objects until all objects have been reduced to a single cluster
Unsupervised Learning
The result of this algorithm is a Binary Tree whose leaf nodes are instances and whose internal nodes are clusters of increasing size
We may also extend this algorithm to objects represented as sets of symbolic features.
Unsupervised Learning
Object1={small, red, rubber, ball} Object1={small, blue, rubber, ball} Object1={large, black, wooden, ball}
This metric would compute the similary values: Similarity(object1, object2)= ¾ Similarity(object1, object3)=1/4
Partitioning Algorithms: Basic Concept Given a k, find a partition of k clusters that optimizes
the chosen partitioning criterion
Global optimal: exhaustively enumerate all partitions
Heuristic methods: k-means and k-medoids algorithms k-means (MacQueen’67): Each cluster is represented by the
center of the cluster k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects in the cluster
The K-Means Clustering Method
Given k, the k-means algorithm is implemented in four steps:
1. Partition objects into k nonempty subsets
2. Compute seed points as the centroids of the clusters of the current partition (the centroid is the center, i.e., mean point, of the cluster)
3. Assign each object to the cluster with the nearest seed point
4. Go back to Step 2, stop when no more new assignment
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
The K-Means Clustering Method
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 100
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
0
1
2
3
4
5
6
7
8
9
10
0 1 2 3 4 5 6 7 8 9 10
K=2
Arbitrarily choose K object as initial cluster center
Assign each objects to most similar center
Update the cluster means
Update the cluster means
reassign
reassign
Example
Run K-means clustering with 3 clusters (initial centroids: 3, 16, 25) for at least 2 iterations
Example
Centroids:3 – 2 3 4 7 9 new centroid: 5
16 – 10 11 12 16 18 19 new centroid: 14.33
25 – 23 24 25 30 new centroid: 25.5
Example
Centroids:5 – 2 3 4 7 9 new centroid: 5
14.33 – 10 11 12 16 18 19 new centroid: 14.33
25.5 – 23 24 25 30 new centroid: 25.5
In class Practice
Run K-means clustering with 3 clusters (initial centroids: 3, 12, 19) for at least 2 iterations